Wikimedia Blog/Drafts/Content Translation helped create 30000 new Wikipedia articles this year

From Meta, a Wikimedia project coordination wiki

Published 11/11 2015

Title ideas[edit]

  • Content Translation helped create 30,000 new Wikipedia articles this year
  • ...

Summary[edit]

A brief, one-paragraph summary of the post's content, about 20-80 words. On the blog, this will be shown in the chronological list of posts or in the featured post carousel on top, next to a "Read more" link.

  • Content Translation has helped create 30,000 new Wikipedia articles since the beginning of 2015. In our latest update, we've covered some important milestones.

Body[edit]

Weekly article creation, deletion and in-progress trends for Content Translation for the English Wikipedia. Photo by Runa, freely licensed under CC0 1.0

The number of articles created with the Content Translation tool recently crossed the 30,000 mark.[1] This tool is being used by more than 7000 editors to translate Wikipedia articles into many languages.

As per our recent observations, on average more than 1000 new articles are created each week using Content Translation. The number of articles deleted, as part of the normal article review process, comes to around 7% per week. Compared to new articles created using usual editing tools, this figure is considerably low. As the tool is designed to create a good article by reusing existing content (in another language), this is an encouraging outcome and confirms the assumption that the initial translated version is significantly better in terms of content quality to merit retention.

Challenges about content syntax and improvement

Ever since the tool was first deployed as a beta feature in January 2015, the development team has made an active attempt to monitor the articles being created and examine how well do they fit into the respective wikis, primarily in terms of their internal structure and code—categories, links, templates, footnotes, general wiki syntax handling, and so on. Content Translation, by its inherent nature, is transforming text between diverse Wikipedia projects and this can lead to some issues caused by the differences between the projects in the use of templates, references and markup.

As the article creation and deletion statistics demonstrate, general observation is that the new articles appear to fit well. However, the wiki syntax's cleanliness is a considerable challenge and new issues are being uncovered through regular use of the tool. Over the year we fixed numerous bugs in the handling of categories, templates and footnotes. While we have fixed many of these bugs already, we know that many still remain. We are thankful to the editors who report bugs in this area and help us understand and fix them.

Improvements to machine translation

Machine Translation improvements have been a recurring request from many users of Content Translation ever since the tool was made available. Until recently, Apertium was the only MT service that was available for Content Translation. Since 4 November 2015, however, Yandex machine translation service has been available for users of the Russian Wikipedia—where Content Translation is especially popular—and can be used when translating Wikipedia pages from English to Russian using Content Translation (see the announcement).

The translation service will be accessible via a freely available API, and the translated content returned by the service is freely licensed according to Wikipedia policy for use in Wikipedia articles. As the interaction between Content Translation and the translation service happens on the server side, no personal information from the user’s device is sent to Yandex. The translated content can be modified by users, just like usual content on wiki pages. The information about the modifications is also available publicly under a free license through the Content Translation API for anyone to develop and improve translation services (from University research groups, open source projects to commercial companies, anyone!). More information about this translation service is available on Mediawiki and in the Content Translation FAQ. For more details about the interactions between Content Translation and Yandex translation service, please see this image.

Enhancements have also been made to the Apertium machine translation service. As a result of recent changes, eight new language pairs are now covered by Apertium. These are, alongside the complete list:

  • Arabic - Maltese (both directions)
  • Breton - French
  • Catalan - Esperanto
  • French - Esperanto
  • Romanian - Spanish
  • Spanish - Esperanto
  • Spanish - Italian (both directions)
  • Swedish - Icelandic (both directions)

Upcoming plans and office hour

In our last update, we informed our readers about article suggestion that provides users a list of articles that can be translated for a certain language. Sometime soon, it will be possible for users to create collections that can be used for translathons or similar shared editing activity. If you have participated in such an event or organized one that involved article creation through translations, we would like to learn from you (via this form) more details about how Content Translation’s article lists can support this activity.

Please join us for the next online office hour on 25 November 2015 at 1300 GMT. We welcome your comments and feedback on the Content Translation project talk page and Phabricator.


Runa Bhattacharjee, Amir Aharoni, Language Engineering, Wikimedia Foundation

Notes[edit]

Ideas for social media messages promoting the published post:

Twitter (@wikimedia/@wikipedia):

(Tweet text goes here - max 117 characters)
---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|------/

Facebook/Google+

  • ...
  1. The number of total articles is one of many statistics that can be seen on the Special:ContentTranslationStats page present on every Wikipedia (for example, English). Initially, this page was used by the development team to monitor the gradual adoption of Content Translation. During the last few months, more data has been added to better analyse statistics like the most-used source languages for translations, weekly translation numbers, article deletion rates, and more. This information helps us observe trends across individual Wikipedias and make informed decisions about the changes that we may have to make in specific features.