Research:Measuring the Usefulness of Automated Translation of Science / Technology / Math sections starting with Swahili
The English Wikipedia contains 5.5 Million articles and is extremely useful across a wide range of topics. The same, however, is not true for other languages. For example, the Swahili Wikipedia (Swahili has 50-100 million speakers) contains 38K pages, many of them much shorter than the English equivalent. This is true in general of content availability on the internet in Swahili - it is much more sparse than the English equivalent.
Machine Translation has made tremendous progress fueled by improvements in neural nets. We ask ourselves if the quality of translation is now good enough to be useful despite the imperfections in language, particularly when other information sources are not available at all. To test this hypothesis, we would like to automatically translate (using Google Translate) large sections of wikipedia (starting from the vital articles about science / technology / math sections) and track user interaction with those pages. To preserve the quality of Wikipedia pages, we will host those pages on a different site: http://www.wikibabel.com
Methods involve watching aggregated usage of wikibabel pages, as well as occasional surveys on wikibabel as to the quality of translation and the usefulness of the page irrespective of the very imperfect language.
- 03/2018 - Initial code for a translation complete
- 05/2018 - Funds / translate resources secured and translate the initial 3.5 K pages in the vital articles science / technology / math sections
- 07/2018 - Drive traffic to analyze by releasing on Free Basics and Accessible through Google
- 09/2018 - If sufficient traffic, build survey tool
- 02/2019 - Analyze traffic data and survey to determine usefulness of the translated content.
Policy, Ethics and Human Subjects Research
To avoid disrupting Wikipedians work and potentially have bad grammar on the site, this experiment will be hosted on a separate domain.