Jump to content

Grants:Programs/Wikimedia Research Fund/From Mongolia to Brazil: LiveLanguage fostering lexical diversity worldwide

From Meta, a Wikimedia project coordination wiki
statusnot funded
From Mongolia to Brazil: LiveLanguage fostering lexical diversity worldwide
start and end datesJuly 2023 - July 2024
budget (USD)50,000 USD
fiscal year2022-23
applicant(s)• Beatrice Bonami, Paula Helm, Gabor Bella, Fausto Giunchiglia, Adriano Clayton da Silva, Khuyagbaatar Batsuren and Gertraud Koch




Beatrice Bonami, Paula Helm, Gabor Bella, Fausto Giunchiglia, Adriano Clayton da Silva, Khuyagbaatar Batsuren and Gertraud Koch

Affiliation or grant type

UNESCO Media and Information Literacy Program; University of Amsterdam; University of Trento; Amazon Federal University; National University of Mongolia; University of Hamburg


Beatrice Bonami, Paula Helm, Gabor Bella, Fausto Giunchiglia, Adriano Clayton da Silva, Khuyagbaatar Batsuren and Gertraud Koch

Wikimedia username(s)

Project title

From Mongolia to Brazil: LiveLanguage fostering lexical diversity worldwide

Research proposal




Description of the proposed project, including aims and approach. Be sure to clearly state the problem, why it is important, why previous approaches (if any) have been insufficient, and your methods to address it.

With over 300 million articles in 300 languages, Wikipedia's vastness is a combination of its collaborative nature and the technology behind it. Most minority and endangered languages, however, lack Wiki content. We propose combining fieldwork and reusing existing digital resources to extend Wiktionary (and, to a lesser extent, Wikipedia) coverage for such languages. This project is a partnership among European, Asian, and South American universities to create foundations on Wiktionary and Wikipedia in native languages and dialects. We selected eight endangered or under-resourced indigenous/native languages in Brazil (Tikúna, Hixkariána, Sateré-Mawé, and Mundurukú) and Mongolia (Buryat, Manchu, Dagur, and Kalmyk) exemplifying different challenges and opportunities. We will mobilize communities to co-create content and promote indigenous knowledge on Wiktionary. In addition, via LiveLanguage data catalog providing lexicons for 2,000 languages (http://livelanguage.org), we will extend Wiktionary by new entries in more than 100 languages. As such, our approach covers both scientific and social impact and involves

Desk Research: systematic review of datafication of minorities and diversity of language data, focus groups about the importance of language preservation through Wikimedia, and interviews with key bridge-building persons about ethical conditions of trust-building;

Pilot Projects: establish scalable lexicographic practices for collecting Wiktionary entries with native speakers. These workshops (with literacy upskilling where needed) will work on producing new entries, address ethical challenges of datafying and making openly available indigenous knowledge, and merging technical inputs from LiveLanguage, UKC and Wiktionary;

Wiktionary Inputs: create 6.000 new lexical entries based on the pilot languages; and offer at least 10,000 new Wiktionary entries in at least 100 additional minority languages extracted from the LiveLanguage data catalog, including all necessary metadata (provenance links, copyright). We offer open-source automated tools for the "wikification" of lexical entries, relying on existing Wiktionary tools (Pywikibot).

As a result, we will deliver a final report and an executive summary to be launched in an event to discuss findings with networks. In addition, we will produce communication materials based on the pilot projects, supporting dissemination. This project will have a duration of 11 months, starting from May 2023.






Approximate amount requested in USD.

50,000 USD

Budget Description

Briefly describe what you expect to spend money on (specific budgets and details are not necessary at this time).

For the pilot projects with Mongolia and Brazil, we will invest 22,000 USD for mobilizing communities. An amount of 20,000 USD will be invested in the research team to employ researchers to work throughout the project. The final report and event will require 4,000 USD, and 4,000 US dollars will be dedicated to producing communication materials and products. The total amount requested is, therefore, 50,000 USD.



Address the impact and relevance to the Wikimedia projects, including the degree to which the research will address the 2030 Wikimedia Strategic Direction and/or support the work of Wikimedia user groups, affiliates, and developer communities. If your work relates to knowledge gaps, please directly relate it to the knowledge gaps taxonomy.

With its focus on language diversity and representation of minority languages on primarily Wiktionary but also, in a second step, Wikipedia, the proposed project will directly contribute to Wikimedia’s strategic direction of knowledge equity: it expands Wikimedia content towards including knowledge and communities that have been left out so far due to their marginalized position in global knowledge infrastructures. By working closely with local communities and jointly developing indigenous languages on Wikimedia, the project will contribute to the goal of building “the foundations for creating and accessing trusted knowledge in many shapes and colors” more generally and “to grow knowledge that represents human diversity” more specifically.



Plans for dissemination.

We intend to build a detailed report, with a comprehensive executive summary translated into the languages present in the project. Furthermore, during the workshops, we will produce communication pieces such as short videos and photography that, once edited, will compose a short documentary to be released during the report launching event. The final event is planned to be in a hybrid format to discuss findings with correlated networks, with the location to be decided.

Past Contributions


Prior contributions to related academic and/or research projects and/or the Wikimedia and free culture communities. If you do not have prior experience, please explain your planned contributions.

This project is composed by the University of Trento, the University of Amsterdam, the University of Hamburg, UNESCO, the Federal University of Amazon and the National University in Mongolia. This group of research institutions and universities is a solid cohort to perform this project, having the international outlook, maturity, and experience required to deliver the listed deliverables. They have worked in cross continental research funding grants, such as: LiveLanguage (http://livelanguage.org), WeNet (computer-mediated diversity data), POEM (Participatory Memory Practices), and the the Universal Knowledge Core (UKC). Together, they possess the expertise to make this initiative a great success case in language preservation worldwide.

I agree to license the information I entered in this form excluding the pronouns, countries of residence, and email addresses under the terms of Creative Commons Attribution-ShareAlike 4.0. I understand that the decision to fund this Research Fund application, the application itself along with all the information entered by my in this form excluding the pronouns, country of residences, and email addresses of the personnel will be published on Wikimedia Foundation Funds pages on Meta-Wiki and will be made available to the public in perpetuity. To make the results of your research actionable and reusable by the Wikimedia volunteer communities, affiliates and Foundation, I agree that any output of my research will comply with the WMF Open Access Policy. I also confirm that I have read the privacy statement and agree to abide by the WMF Friendly Space Policy and Universal Code of Conduct.