Wikimedia Blog/Drafts/Content translation

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Title ideas[edit]

  • Semi-automated content translations comes to Scandinavian Wikipedias
  • ...


A brief, one-paragraph summary of the post's content, about 20-80 words. On the blog, this will be shown in the chronological list of posts or in the featured post carousel on top, next to a "Read more" link.

  • ...


Content Translation, a tool that simplifies translation of Wikipedia articles between different-language wikis, is being improved for Scandanavian languages.

In use since January 2015, Content Translation automates many of the menial tasks of translating and, in some languages, uses open-source Apertium machine translation to create rough drafts. In the case of the closely related major languages of Scandinavia (Danish, Swedish, Nynorsk, and Bokmål, the last two being the written variants of Norwegian), the resulting draft text usually requires a minimal amount of effort before publication.

However, its use in these languages was limited because Apertium only supported a two Scandinavian language combinations: Swedish to Danish, and Nynorsk to and from Bokmål.

To create machine translation support for missing Scandinavian language pairs, language hacker Kevin Brubeck Unhammer applied for and received one of the Wikimedia Foundation's Individual Engagement Grants in September 2015, with part of the cost being contributed by Apertium.

"I'm developing machine translation systems between Danish, Swedish, Bokmål and Nynorsk," Unhammer says. "These systems will also be available to people who want to translate text outside of Wikipedia, or who wish to build on our work and create new kinds of language technology later. Wikipedia's Content Translation is especially effective when users can work from MT suggestions, so this should lead to more articles for the Scandinavian instances of Wikipedia, and to Scandinavian Wikipedia articles being read by a lot more people."

What makes this important? Wikipedia is a large project with over two hundred languages, with Bokmål, Nynorsk, and Northern Sami—a language spoken in Norway's far northern regions—being among them. Put another way, Norway's five million people have three Wikipedias to work on. Each project has different content, in part because of the formerly laborious and tedious task of translating articles from one language to another. Unhammer wrote in his grant application that a "lack of machine-assisted Content Translation ... means fewer localized articles. So for example, a Norwegian user searching the web for a certain term might not even see that there is a great Swedish article on that subject, since search engines tend to prefer localized hits (and if not, the English version would likely win the search ranking due to number of inbound links). Wikipedia readers thus become accustomed to searching in English, and not seeing the knowledge that exists in their neighboring countries."

Soon, however, editors will be able to easily translate between these languages. Unhammer's machine translation tool will allow every contributor and editor on Wikipedia to help expand free knowledge around Scandanavia, regardless of their spoken language.

Astrid Carlsen, Wikimedia Norway (Norge)

Kevin Brubeck Unhammer

Kevin Brubeck Unhammer. Photo by WMNOastrid, freely licensed under CC BY-SA 4.0.

Content translation nynorsk-bokmål

Content translation. Photo by Jeblad, freely licensed under CC BY-SA 4.0. ...