Grants:Programs/Wikimedia Research Fund/Can Machine Translation Improve Knowledge Equity

From Meta, a Wikimedia project coordination wiki
Can Machine Translation Improve Knowledge Equity? A Large-scale Study of Wikipedias across more than 300 language editions
start and end datesJune 2022 ~ May 2023
budget (USD)40,000-50,000 USD
applicant(s)• Kai Zhu



Applicant's Wikimedia username. If one is not provided, then the applicant's name will be provided for community review.

Kai Zhu

Project title

Can Machine Translation Improve Knowledge Equity? A Large-scale Study of Wikipedias across more than 300 language editions

Entity Receiving Funds

Provide the name of the individual or organization that would receive the funds.

Kai Zhu

Research proposal[edit]


Description of the proposed project, including aims and approach. Be sure to clearly state the problem, why it is important, why previous approaches (if any) have been insufficient, and your methods to address it.

With the objective of providing ""free access to the sum of all human knowledge"", Wikipedia is now part of the essential infrastructure of free knowledge. However, as pointed out by 2030 Wikimedia strategic direction, it is becoming increasingly critical to address the knowledge gap on Wikipedia so that it can better serve audiences, communities, and cultures that have been traditionally ""left out by structure of power and privilege"". Knowledge gap across languages is a notoriously difficult issue as it is challenging to recruit volunteers in low resource languages. In this project, we set out to study whether machine translation can also contribute to improving knowledge equity.

In the proposed research, we investigate the impact of machine translation on knowledge dissemination across different language editions of Wikipedia. In January of 2019, Wikimedia Foundation integrated Google Translate to its in-house Content Translation tool. The roll-out of Google Translate as another machine translation system enables Wikipedia editors to transfer knowledge to more target languages and with translations of higher quality. Despite the mixed sentiments toward the role of machines in content production among Wikipedia editors, we observe a large and sharp increase in the number of articles created with machine translation shortly after the introduction of Google Translate in our preliminary analysis. This partnership between Wikipedia and Google Translate presents us with a great opportunity to gain a deeper understanding of if and how a state-of-art neural machine translation service can mitigate the knowledge gap across different language editions of Wikipedia. Leveraging this unique natural experiment, we will use techniques from econometrics modeling, causal inference, and natural language processing to investigate three sets of closely related questions on the impact of machine translation on Wikipedia. First, how does a better machine translation service enable knowledge transfer between different languages. Second, how does Google Translate change the collaboration and coordination pattern between human editors and machine intelligence? Third, a large portion of each Wikipedia language edition is locally relevant and culture specific content. Does machine translation also help the exchange of local content?


Approximate amount requested in USD.


Budget Description

Briefly describe what you expect to spend money on (specific budgets and details are not necessary at this time).

Total amount: 46,000 USD
  • - Salary or stipend: 30,000 USD. This will be used to support a PhD student for one year of her study and two month of summer support for myself.
  • - Benefits: 3,000 USD. Health insurance for a PhD student
  • - Equipment: 7,000 USD. Purchase of high computing power research machine for conducting the data analysis of the project.
  • - Open access publishing costs: 6,000 USD. The open access fee for two publications in the target journals.


Address the impact and relevance to the Wikimedia projects, including the degree to which the research will address the 2030 Wikimedia Strategic Direction and/or support the work of Wikimedia user groups, affiliates, and developer communities. If your work relates to knowledge gaps, please directly relate it to the knowledge gaps taxonomy.

Our proposed research directly address one of the key issues outlined by 2030 Wikimedia strategic direction - knowledge gap. We aim to provide insight into how we can bridge knowledge gap across language editions by leveraging the recent advance in technology - neural machine translation. This is particular relevant for low-resourced language where it is notoriously difficult to attract and retain volunteers. In addition, our research will provide insight for Wikipedia developer communities in order to achieve a better design of machine translation service and related user interfaces. This can facilitate more effective and appropriate use of machine translation in Wikipedia communities.


Plans for dissemination.

We plan to summarize the findings in the form of research papers and submit for publication in top academic journals (e.g. Management Science or Proceedings of National Academic of Science). In addition to disseminating our findings through academic publication and presentations, we also aim to have a practical impact on the health and development of Wikipedia. As described in ""Impact"" section, we will also communicate our findings with Wikipedia developer communities.

Past Contributions[edit]

Prior contributions to related academic and/or research projects and/or the Wikimedia and free culture communities. If you do not have prior experience, please explain your planned contributions.

I am an early career researcher and Wikipedia has been a focus of my research work. My prior study on addressing information poverty in Wikipedia knowledge network has been recognized by Wikimedia Foundation and won the research award of the year 2021. Since then, I have been continuing working in the area of knowledge equity and related topics of Wikipedia. In addition, I have being a frequent contributor/participants in the Wikimedia communities and activities (Wiki workshop, Wikimedia Research Showcase, etc.)

Reference: Zhu, Kai, Dylan Walker, and Lev Muchnik. ""Content growth and attention contagion in information networks: Addressing information poverty on Wikipedia."" Information Systems Research 31, no. 2 (2020): 491-509.

I agree to license the information I entered in this form excluding the pronouns, countries of residence, and email addresses under the terms of Creative Commons Attribution-ShareAlike 4.0. I understand that the decision to fund this Research Fund application, the application itself along with all the information entered by my in this form excluding the pronouns, country of residences, and email addresses of the personnel will be published on Wikimedia Foundation Funds pages on Meta-Wiki and will be made available to the public in perpetuity. To make the results of your research actionable and reusable by the Wikimedia volunteer communities, affiliates and Foundation, I agree that any output of my research will comply with the WMF Open Access Policy. I also confirm that I have read the privacy statement and agree to abide by the WMF Friendly Space Policy and Universal Code of Conduct.