Mass content adding

From Meta, a Wikimedia project coordination wiki

Mass content adding is the project which main goals are (1) semi-automatic content adding to small Wikipedias (but not limited on small Wikipedias) and (2) making centralized data source for semi-automatic or automatic change of content on all Wikipedias. It can be used for other Wikimedian and other wiki projects (or anything else which is able to accept project methods).

Some of the examples for this project may be using different templates from English Wikipedia, like Infobox Country or templates for movies, minerals, species etc. for primitive translation of template content into templates and sentences into other languages.

Goals[edit]

  1. Adding content related to geographic data: places, rivers, mountains etc..
  2. Primitive translation of articles on English Wikipedia (form English to other languages).
  3. Keeping data up to date.
  4. ... (add your goal, too) ...

Sources[edit]

  1. National Geospatial Agency of US military
  2. Data which is possible to get from various institutions which are getting data (for example, institutes for statistics of various countries; but other organizations, too).
  3. English Wikipedia, as well as other big Wikipedias.
  4. The Common Locale Data Repository, an active project hosted at Unicode.org, has XML files with localized language and country names for many languages
  5. ... (add your source, too) ...

Methods[edit]

  1. The main method should be wiki (maybe separate Wikimedia wiki in the future). Data, localization etc. should be kept on the wiki (in this case Meta Wikimedia) and should be used by different program platforms.
  2. Programs:
    1. pywikipediabot
  3. Localization: Sentences, templates etc. should be localized.
  4. We should make standard methods for all of the issues through the time.
  5. There are a lot of very useful projects on English Wikipedia (such as w:Wikipedia:Naming conventions (Cyrillic) etc.) which results can be used in this project.
  6. Translation of general articles - this could be done using OmegaT providing it with a particular feature that retrieves the text from one wikipedia, it is being translated creating a translation memory, and the translated text is then saved with it destination name on the other wikipedia. This helps when it comes to repetitive text: templates for example will be translated correctly, the translation memories can be exchanged, so others will re-use the same template names (we cases of double names already) and the translation memory, besides the glossary, will help to find terminology quite fast - so the time of any contributor is re-evaluated and people will be able to do more in the same timeframe. OmegaT is already used for translation of contents from Italian to Neapolitan and as much as I know also for Sicilian.

Legal issues[edit]

  1. Used content should be in public domain or licensed with some free license.

Localization[edit]

Main page: Mass content adding/Localization.

Software[edit]

Main article Mass content adding/Software.

Subprojects[edit]

Geographic data[edit]

Main article Mass content adding/Geographic data.

Subprojects[edit]

General subprojects[edit]
  1. Using NGA data
Specific subprojects[edit]
  1. Places and municipalities in Macedonia
  2. Places and municipalities in Serbia and Montenegro

Using English Wikipedia[edit]

  1. Countries of the world

Using whatever Wikipedia[edit]

  1. Calendar translations - IT-NAP, EN-??

Participants[edit]

If you want to participate in this project, please, sign here.

  1. --Millosh 16:00, 23 February 2006 (UTC) (General issues, Countries of the world, Places and municipalities in Macedonia, Places and municipalities in Serbia and Montenegro)[reply]
  2. --misos 16:03, 23 February 2006 (UTC) (Places and municipalities in Macedonia).[reply]
  3. --Babbage 17:54, 23 February 2006 (UTC) (Learn to use pywikipediabot to import CLDR data?)[reply]
  4. --Sabine 12:20, 24 February 2006 (UTC) Use of OmegaT for translation.[reply]
  5. - Slavik IVANOV 12:41, 9 March 2006 (UTC) providing a place for a good practice at os:, cv: and probably other smaller but growing Wikipedias.[reply]
  6. --Bonzo 23:31, 27 August 2006 (UTC)[reply]
  7. Representing the Tajik Wikipedia. - FrancisTyers 23:50, 30 August 2006 (UTC)[reply]

Licenses[edit]

Available bots[edit]

Please, add your bot name here if it can be available for this project.

  1. User:Millbot (operated by Millosh)
  2. IronBot (operated by Slavik IVANOV in Ossetic Wikipedia)
  3. sh:User:Wiki kombajn · en:User:Article Grinder · tg:User:Bishbot. (operated by User:FrancisTyers)

See also[edit]