Research:Can Machine Translation Improve Knowledge Equity? A Large-scale Study of Wikipedias across more than 300 language editions

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Duration:  2022-August – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

With the objective of providing "free access to the sum of all human knowledge", Wikipedia is now part of the essential infrastructure of free knowledge. However, as pointed out by 2030 Wikimedia strategic direction, it is becoming increasingly critical to address the knowledge gap on Wikipedia so that it can better serve audiences, communities, and cultures that have been traditionally "left out by structure of power and privilege". Knowledge gap across languages is a notoriously difficult issue as it is challenging to recruit volunteers in low resource languages. In this project, we set out to study whether machine translation can also contribute to improving knowledge equity.

In the proposed research, we investigate the impact of machine translation on knowledge dissemination across different language editions of Wikipedia. In January of 2019, Wikimedia Foundation integrated Google Translate to its in-house Content Translation tool. The roll-out of Google Translate as another machine translation system enables Wikipedia editors to transfer knowledge to more target languages and with translations of higher quality. Despite the mixed sentiments toward the role of machines in content production among Wikipedia editors, we observe a large and sharp increase in the number of articles created with machine translation shortly after the introduction of Google Translate in our preliminary analysis. This partnership between Wikipedia and Google Translate presents us with a great opportunity to gain a deeper understanding of if and how a state-of-art neural machine translation service can mitigate the knowledge gap across different language editions of Wikipedia. Leveraging this unique natural experiment, we will use techniques from econometrics modeling, causal inference, and natural language processing to investigate three sets of closely related questions on the impact of machine translation on Wikipedia. First, how does a better machine translation service enable knowledge transfer between different languages. Second, how does Google Translate change the collaboration and coordination pattern between human editors and machine intelligence? Third, a large portion of each Wikipedia language edition is locally relevant and culture specific content. Does machine translation also help the exchange of local content?




Mark Graham, Bernie Hogan, Ralph K Straumann, and Ahmed Medhat. 2014. Uneven geographies of user-generated information: Patterns of increasing informational poverty. Annals of the Association of American Geographers 104, 4 (2014), 746–764. Kai Zhu, Dylan Walker, and Lev Muchnik. 2020. Content Growth and Attention Contagion in Information Networks: Addressing Information Poverty on Wikipedia. Information Systems Research 31, 2 (2020), 491–509. Leila Zia, Isaac Johnson, Bahodir Mansurov, Jonathan Morgan, Miriam Redi, Diego Saez-Trumper, and Dario Taraborelli. 2019. Knowledge Gaps – Wikimedia Research 2030. (2019).