Research:Measuring the Usefulness of Automated Translation of Science / Technology / Math sections starting with Swahili

From Meta, a Wikimedia project coordination wiki
Wikibabel Olya
Duration:  2017-10 – 2019-10
This page documents a completed research project.

The English Wikipedia contains 5.5 Million articles and is extremely useful across a wide range of topics. The same, however, is not true for other languages. For example, the Swahili Wikipedia (Swahili has 50-100 million speakers) contains 38K pages, many of them much shorter than the English equivalent. This is true in general of content availability on the internet in Swahili—it is much more sparse than the English equivalent.

Machine Translation has made tremendous progress, fueled by improvements in neural nets. We ask ourselves if the quality of translation is now good enough to be useful despite the imperfections in language, particularly when other information sources are not available at all. To test this hypothesis, we would like to automatically translate (using Google Translate) large sections of Wikipedia (starting from the vital articles about science / technology / math sections) and track user interaction with those pages. To preserve the quality of Wikipedia pages, we will host those pages on a different site:


Methods involve watching aggregated usage of wikibabel pages, as well as occasional surveys on wikibabel as to the quality of translation and the usefulness of the page, irrespective of the very imperfect language.


  • 03/2018 - Initial code for a translation complete
  • 05/2018 - Funds / translate resources secured and translate the initial 3.5 K pages in the vital articles science / technology / math sections
  • 06/2018 - Build Analytics Tracking that works with and without JavaScript
  • 07/2018 - Drive traffic to analyze by releasing on Free Basics and Accessible through Google
  • 09/2018 - If sufficient traffic, build survey tool
  • 02/2019 - Analyze traffic data and survey to determine usefulness of the translated content.

Policy, Ethics and Human Subjects Research[edit]

To avoid disrupting Wikipedians work and potentially have bad grammar on the site, this experiment will be hosted on a separate domain.