Wikimedia Language Diversity Hub/Research

From Meta, a Wikimedia project coordination wiki

Movement Strategy Implementation Grant 2022[edit]

In the last half of 2022, the Language Diversity Hub did interviews with 13 different language communities to explore the barriers they experienced when editing Wikipedia in their language. The goal was to document the challenges and barriers that exist, and explore how the hub can help reduce them. Here we will present findings from the research.

The full report is available on Wikimedia Commons.

  • The Full report includes: Abstract, Introduction, Method, Findings, Proposed next steps, and Concluding Remarks.
  • Below are copied the Findings and Proposed next steps.


Here you can watch our presentation of the report from Friday March 10.



(click to expand or collapse)
We set out on this project with an ambition to get clear pointers to where we could improve the platform technically to facilitate for the new language communities. The technical barriers we expected were not mentioned that much, instead the need for more training was a recurring need.

Language Technology
A limitation for the presence of many languages on the Internet in general, and by consequence also Wikipedia, is the availability of tools to write in that particular language. Appropriate keyboards and online dictionaries are unavailable for many of the communities we spoke with. This was mentioned as a clear challenge in seven of the language communities. There are reasons to believe that the tools for spell and grammar-checks are underdeveloped in most of the languages, but they might not consider it as a barrier for contributing to Wikipedia. Due to the context of that language, they might not even have the expectations to find those tools.

However, the lack of such tools is a barrier for using and producing digital, written content, and so this lack is a clear barrier for building a written encyclopedia.

Our major challenge is how to type the special keys in Dagbani. Most of our contributors are using mibile phones, while others are using laptops. With android phones you can use the Ghanian keyboard, but with other devices you have to install a keyboard for a fee. Please, we need a lot of help with regards to this challenge.
Contributor to Wikipedia in Dagbani.

Finding the right translations for some of the technical terms has also been a challenge in several of the communities. The Paiwan community describes meetings where language users from different regions come together and discuss words and expressions. Many of the technical terms have no equivalent in these languages, and good processes for establishing new terminology on the conditions of the language will be of value for the languages in total.

To translate some of the technical terms on the main page was very difficult, for example terms like file, folder, edit, edit source and others are not easy to translate to DAgbani. It has consumed a lot of time and energy discussing these terms, and coming to agreement was not easy at all.
Contributor to Wikipedia in Dagbani

Inari Sámi can celebrate a successful revitalization, going from no children speaking the language 30 years ago, to having schools and language nests and a growing number of speakers today. They are fortunate enough to have linguists that are also native speakers. Within the community, they have solid knowledge to make good translations and create new terminology on the premises of the language. The Inari Sámi community uses digital tools developed by Giellatekno, center for language technology at the Arctic University of Tromsø.

Wish for more training
We entered this research hoping to get clear directions for technical problems to solve, but almost all the communities' clearest wish is to receive more training. The ask for more training comes from different perspectives. On one end people wish to learn more about templates or other on wiki-related skills, on the other side communities are in need of basic computer skills.

It might be that for the contributors it is unnatural to point out limitations to the technology. They might be more focused on increasing their own knowledge and skills to be able to use the platforms better. In those cases it is the responsibility of the more advanced users and the owners of the technology to understand the behavior and the needs of the users, and adapt the platform accordingly.

We can also talk about the challenges we have encountered, such as the lack of equipment and connectivity. It is difficult for volunteers not only to have the equipment, but also learn how to operate the computer equipment. Many volunteers lack digital literacy before even thinking about wiki editing.
Contributor to Wikipedia in Wayuunaiki

On this note it is also worth mentioning how young potential contributors to Wikipedia in general are feeling ill-equipped, or with too low technical skills to edit Wikipedia. Thus increasing universal design, and reducing the technical skills needed for editing will probably benefit the movement beyond the smaller and new language versions.

What I want to say is I know something about technique, but I want to learn how to write a article of good quality. What kind of sources we can use, and then how to edit. When we don't have training, we feel that we are not good enough and that we don't have confidence on what we have done.
Contributor to Wikipedia in Mon

Many communities report a combination of wishes for more training, a lack of equipment, and challenging internet conditions. A way to solve that in some communities has been offering a space where people can gather, learn together, access the internet, and sometimes also computers, as a good service to the community. This can be helpful for attracting more contributors in general.

Mostly when we have in-person workshops or trainings in the office, routers are made available for contributors or participants to connect to their phones.
Contributor to Wikipedia in Dagbani

Oral culture
The oral culture of many indigenous cultures, and the challenges regarding oral sources on Wikipedia have been discussed for a long time already in some parts of the movement. It continues to be an important challenge to solve when working with underrepresented knowledge. The Paiwan community describes a fast and efficient pipeline for recording and writing down oral culture, and then updating it to Wiki. The Wayuunaiki community specifically mentions meeting the criteria of Wikipedia as the most challenging part of dealing with an oral culture. New language versions of Wikipedia have their own power to establish their own requirements for their content.

Although many communities mentioned it as a challenge, no one pointed out oral culture as one of the top three barriers to contributing to Wikipedia in their language.

Our primary source come from the elders of our community and the colelctive memory. There we have the challenge of aligning ourselves to the criteria of the traditional Wikipedia.
Contributor to Wikipedia in Wayuunaiki.

The reading and writing skills of the old people in the tribe is almost zero, but they will listen and talk, and we can combine that with the community associations and groups. For example, we can record the stories of the elders, write them down and upload it. Updating this to the wiki is pretty fast, and the pipeline is good. I think this combination is pretty good.
Contributor to Wikipedia in Paiwan

The Incubator
We have interviewed language communities in the Wikimedia Incubator and those that have graduated to a full fledged Wikipedia. Everyone that has contributed to both versions expresses that the full fledged version is much easier to contribute to, and also that the motivation to contribute increased by knowing that people now could find and start reading the articles.

In relation to the incubator, the technology becomes much more complex for people who are not digital natives. We are grateful for the indigenous language portal that we are using, that has been very friendly to these new editors.
Contributor to Wikipedia in Wayuunaiki

Access to equipment
Instead of finding challenges that can be solved by improving Wikipedia, we have again been reminded that many of the indigenous or small language communities have very different challenges:

  • lack of equipment
  • unstable or expensive Internet connections
  • insufficient technical skills
  • Relying on mobile devices

More than half of the communities we interviewed estimate that mobile devices is the main equipment for contributors.

Editing from a mobile device is rarely an active choice, but for used because it is the only available option.

Better mobile solutions would reduce the barriers for many contributors.

The mobile devices do no help members as we mostly use Wikiepdia in English as a refernce to edit in the incubator, and one has to be switching tabs. Also distractions from WhatsApp and other social media platforms are a challenge.
Contributor to Wikipedia in Gurene

I can't write and upload by phone, the phone screen is too small for me. If it is easier to contribute from the phone, I hope more people will do it.
Contributor to Wikipedia in Mon


(click to expand or collapse)
75% of the interviewed groups report that a lack of economic resources is a limiting factor for contributing to Wikipedia among their community.

Several reasons have been mentioned;

  • need to work on paid jobs,
  • weak/no culture for volunteering
  • lack of funds to buy necessary equipment or data bundles.

Communities that have previously accessed WMF-grants describe how it has helped them to grow the community by providing physical spaces with access to the internet and equipment to facilitate contributions. Several communities did not know of the different grants they can access through the WMF.

The fact that almost every group mentioned time as a limiting factor to their contributions is not a surprise – Wikipedia is endless work also for contributors in larger communities. However, with 3–4 contributors in total for many of these communities, the sheer workload on each one is enough to demotivate any new contributor to join.

Community members are not able to edit because of data bandwith. This includes users of both mobile devices and laptop.
Contributor to Wikipedia in Gurene

I contributed for some times, and had to halt it partly due to a lack of money to purchase internet data. Contributing to Wikipedia requires internet connectivity data, and that ahs to do with money. I spend money on purchasing internet data on order to contribute to Wikipedia.
Contributor to Wikipedia in Dagbani

The economic aspect is very important, we are talkinga bout an indigenous community on the Colombian-Venezuelan border, and we see the people who are constantly migrating from rural areas to urban areas in order to prepare themselves academically, so when they leave rural areas the indigeous people has to look for a way to survive.
Contributor to Wikipedia in Wayuunaiki

The tradition of volunteering might also be less established in some of the communities, such as described in the Angika community. For some, the skills learned when editing Wikipedia might be considered reward enough, but the region where Angika is spoken is also among the poorest regions of India. The ones contributing to Wikipedia in Angika seem to have higher education, and be in a financially more free position than most Angika speakers.

To be very honest, when people do something they want something back. Those who are into open knowledge and are inherently passionate about spreading knowledge might not. But in general people will ask "if we contribute, what do we get in return?"
Contributor to Wikipedia in Angika

Education and knowledge

(click to expand or collapse)
Common challenges mentioned by several groups:
  • The language is not taught in schools, or is only taught to a limited degree
  • There is no standardized written version of the language
  • Challenges with language knowledge in the community

Some of the small language contributors report that their language is not taught in schools, or even if it is, it is not encouraged to be used and is not considered a literate language. For some communities, the language is only used in the first few years of school.

However, many of the smaller languages report on collaborations with schools, and in some of the groups teachers or teacher-students are driving forces for the use of Wikipedia in that language.

Tachelicht and the variations of Tamazight was only oral languages. They are written forms, but the written form came when the language became official in 2011 with the script Tifinagh. Most people don't know how to write in their own language because they have not learned it in school. Only young children, and only some of them, learn it in school. And that is also a challenge, you know how to spell and you can write it in Arabic, Tifinagh or Latin letters, but there is no standard, so people will argue about how to write.
Contributor to Wikipedia in Tamazight

That the languages are not standardized is another linguistic challenge well beyond the limits of what we as Wikimedians can solve. Though some languages have developed strategies for negotiating issues of dialect difference, spelling divergence, and lack of an official language standardization guide (e.g. Scots).

We recorded over 4,000 Dagbani words and successfully uploaded them to Wikimedia Commons. Weplan ti use these words to create a DAgbani dictionary. This will be distributed to students for free. We intent do create a pool of references to augment the few sources of references that we have on the internet that relate to Dagbani by documenting our culture and traditions that are normally passed on orally.
Contributor to Wikipedia in Dagbani


(click to expand or collapse)
With social barriers, we wanted to understand the social context in which the contributors operate. Some key findings:
  • Too few contributors – a challenge for all communities
  • Gender imbalance already at early stages of many editions

With social barriers, we wanted to understand the social context in which the contributors operate. Some of the communities are very small because there are few language speakers, such as Inari Sámi. But many of the other languages have many speakers, but they are still struggling to build a significant community of contributors.

Two of the communities report an overweight of female contributors; the rest report that male contributors dominate. Most of them would like a better gender balance, however, with so few contributors in general, the gender issue is not a major preoccupation. The priority was getting more contributors in general.

In many of the language communities, the contributors have never even met, they only collaborate on Wikipedia. For some, there are long distances to travel to be able to meet. The challenge of motivating other people to contribute was mentioned in this phase of the interview.

Most of the contributors are men. Women are not so active in our community, even if they are educated. Some don't have time to contribute, others don't want to use their abilities to contribute.
Contributors to Wikipedia in Mon

The majority of contributors are actually women, not very common in the Wikipedia world.
Contributor to Wikipedia in Inari Sámi

I hae participated in the discussion of vocabulary translation. We invited other teachers of different dialects to discuss with us. We can use the local langauge of each place. Everyone h wil explain the meaning of one word, and where it comes from. I think this activity is very meaningful.
Contributor to Paiwan

Strengths of the small language communities

(click to expand or collapse)
Research based on finding problems and struggles might easily overlook the strengths and opportunities in the populations we are trying to understand. To be able to properly support and encourage the small language communities, it is important to understand what drives them. This is a topic that should be explored further.

Motivation In Wikipedia in English and probably other larger languages, contributors are motivated by their interest in a certain topic or that they are helping to bring knowledge to the world. Contributors to smaller language communities, however, are motivated by things like:

  • Bringing knowledge about the language and culture to the next generation and to the rest of the world.
  • Conserving the knowledge about culture, history and traditions.
  • Revitalizing their language.
  • Doing something for their culture and their people.

Within all of this there is a sense of ownership, and the idea of feeling part of a project that leaves a footprint that institutionalizes a historical memory is important.
Contributor to Wikipedia in Tachelchit

Wikipedia has become a huge name. When you look for something, and Wikipedia will appear in your own language, that validates you, right? And your language. That is what most people are looking forward to.
Contributor to Wikipedia in Angika

Proposed next steps[edit]

The barriers we have documented in this work are not new to this movement, and numerous efforts are being made to grow communities, to improve technology, to increase the capacity and the competence of contributors. One of the goals of this research was to get some experience on how the Hub can work together, to become a useful and functional structure within the Movement. The question we keep in mind at each step is: How can the Hub support this challenge, and are there other units within the Movement that can do this better than the Hub?

The language diversity community has to be involved in prioritizing how to direct the next steps of the Language Diversity Hub work. We have therefore suggested next steps under different headings based on the challenges we have documented, and conversations in the Steering Committee. It is a pressing issue for the Hub to create governance and build a democratic and inclusive structure. However, many of the barriers described here can be solved by others than the Hub. This can be considered an open invitation to the whole Wikimedia Movement to think on how you can include support for language diversity into your everyday work.

Growing and training the Communities

(click to expand or collapse)
A part of the goal of this research was community building among the small language communities, to create bridges where they can reach out and support each other. However, almost all the language communities reported that they also need more community building inside their own communities.

They want

  • more contributors
  • more people using Wikipedia in their language
  • more people that understand why the Wikimedia projects are important resources for their language and culture.
  • receiving more training

There are several initiatives in the movement for growing and maintaining strong and vibrant communities, as well as providing training sessions, edit-a-thons and so on. It is usually best that the ones offering support are affiliates close to the communities, for practical, sustainable and economical reasons.

The Language Diversity Hub can support the marginalized languages by supporting the local affiliates working on community building in the area. If there are no local affiliates, the Language Diversity Hub can look for ways to support the language communities directly.

Suggested next step We suggest doing a needs assessment among affiliates in regions of marginalized language communities, to understand their relation with those communities, and gain an understanding of what kind of support they would benefit or need from The Language Diversity Hub to be able to include more language communities in their local affiliate.

The Language Diversity Hub does not need to be directly involved in organizing events, or working actively in local communities. However, giving support for grant applications or planning of such events can be within the scope of the Hub.

Language technology – keyboards, spell checking and beyond

(click to expand or collapse)
A recurring issue among all the smaller language communities is that tools for supporting their language online are lacking; they have insufficient keyboards for typing in their own language, or there might be few or no online dictionaries or spell checking software.

There are some on-wiki solutions for keyboards for some languages, and a few dedicated people in the movement have put a lot of time and effort into supporting those languages. However, some of the solutions only work on wiki platforms, and they require maintenance. Besides, these languages deserve to have keyboards available everywhere, on all devices and platforms.

The lack of digital tools is not only a barrier for Wikimedia activities. It is also a barrier for digital presence in general, and it is just one of the existing structures that keep marginalized languages in that status.

Giellatekno (Center for Language Technology) at the University of Tromsø, Norway, has been working for years to build infrastructure to make technical tools for Sámi and other minority languages in Norway. They are interested in sharing this infrastructure, so that even more languages can use it. This would benefit the digital opportunities for those language communities in general, and also facilitate contributors to Wikipedia.

The collaboration has started with building contact between the Giellatekno team and some language communities. However, a systematic approach to making this a service offered jointly by the Wikimedia Language Diversity Hub and Giellatekno is interesting to explore.

Suggested next step
Following up on selected languages going through the Giellatekno-path, documenting the process, identifying and supporting other language communities wanting these tools.

Network of institutions and linguists

(click to expand or collapse)
Offering support to marginalized languages will at times require consultation and support from linguists and other language professionals. For example, when a language is to graduate from the Incubator, it needs to be confirmed by a linguist with expertise on that particular language. Or, when establishing new terminology for a language, linguists and experts on terminology can ensure better processes and end-results.

Many of the groups we spoke with had little digital exposure of their language, or the exposure was mostly in content (such as social media posts) but not in interface, which is needed to get a complete language experience. They also struggled with translating terminology.

Currently, there is an ongoing project to translate the whole of MediaWiki into Northern Sámi. The project is funded by external grants from two different organizations, the Sámi parliament and NUUG-foundation. Both Wikimedians and non-Wikimedians are involved in the translations.

Suggested next steps There are already many linguists active in the Wikimedia movement, in different roles and capacities. The Hub wishes to build a network of linguists within and outside the movement, as well as institutions with expertise on indigenous and marginalized languages. This network will be a resource for the language communities and the affiliates that are supporting them.

The model of seeking external funding for translation MediaWiki and the Wikipedia application can also be replicated in other languages. The Language Diversity Hub can provide support in the grant application processes, and share experiences from the Northern Sámi experience.

Improving the Incubator

(click to expand or collapse)
For many contributors to small language versions, the Wikimedia Incubator is the first platform they get to know. All the contributors that have tried both the Incubator and the full-fledged version describe the Incubator as being more difficult to use than the normal Wikipedia. We are essentially welcoming the new language communities to a more difficult platform, without providing much organized and systematic support.

Suggested next steps
We are suggesting two approaches related to the Incubator:

  • Improving the Incubator technically.
  1. Conduct Wishlist survey
  2. Applying for resources to do technical improvements
  3. Following up on Phabricator tasks regarding the Incubator that are not being prioritized
  • Improving the routines of the Incubator
  1. Creating routines for actively welcoming and following up the creators of new editions on the Incubator.
  2. Defining responsibilities between the Language Committee and the Language Diversity Hub

Today, the Language Committee has responsibilities in approving language versions. They also provide support for the new language versions. This can put them in a double role where they both are the helpers and the judges. The Language Committee are volunteers, while within the Wikimedia movement the Language Team are employees working on language related technology. Today some members/employees from the Language Team and the Language Committee are also part of the Steering Committee. While improving the incubator, this project would be a good opportunity to explore further what role the Hub can and should play in relation to these other units.

Expanding on this work

(click to expand or collapse)
This project has been a chance to get to know contributors, and for contributors to get to know how the movement works. Many smaller communities might experience problems where there already are solutions, they just don’t know about them. Through the outreach we were able to provide immediate support, or connections that could be useful for the future and bridges were built.

This has also provided us with a baseline to later evaluate if the situation has improved for the 13 language communities in this group, and also to evaluate how the situation has changed on a global scale for the new and smaller language communities in the future.

Suggested next step
As already described under training and community building, support for the marginalized communities should be offered by an organization close to them. The responsibilities of replicating this work could be with the local affiliate, while the Language Diversity Hub can provide support before the interviews and in following up on the challenges that are surfaced.

Many of the marginalized languages use other Wikimedia projects, such as Wikisource or Wikimedia Commons. Replicating interviews to focus on the use of these platforms can provide more insight into how Wikimedia projects beyond Wikipedia can be used in projects that aim to strengthen the digital presence of or revitalizing marginalized languages. Building Hub governance

Building hub governance

(click to expand or collapse)
The first Steering Committee was designed by invitation of people that have done important work for language communities in the past, and consisted of 10 people. Since then, the Steering Committee has been open for interested parties to join as observers. Ten more persons have been included as observers, either because they wanted the Hub to be a part of their project, or because they wanted to support the Hub with their knowledge and skills. This means the Hub has been led by people who already belong to the international part of the movement. For the future the Hub needs to better represent the marginalized communities.

So far, all the decisions have been made by consensus reached at the meetings, or in the Telegram group. Until now this has worked well, there is a common sentiment that the Hub is needed, and that any work to promote the new and the smaller language communities is important. However, for the future development of the Hub, it will need a Steering Committee with a stronger leadership that takes hard discussions, and has the mandate to set a direction and priorities for the work of the Hub. As we have seen in this work, there are many challenges the Hub can take a part in solving, but not all at the same time.

The Steering committee has been composed of people with a strong personal and professional interest in the field of smaller languages. Knowing that economy is a barrier for some to engage in the movement, it will be important that there are allocated resources to make sure people representing the new language communities have a real possibility to work with the Hub.

A challenge that has surfaced is how to organize a fair selection process of the new Steering Committee. Many of the groups that need the support of the Hub the most might not even have a User Group, and there might be few contributors from that language. Time, capacity and language skills might all be barriers for some to take part in the governance of the Hub. We wish to work on how to create a governance structure that ensures that the smaller communities can be involved and especially that their challenges are heard, but also that they get access to share their experiences and be a part of the global movement without draining on their already sparse resources.

Suggested next step
Setting down a group working on how to create governance of the hub:

  • Defining the role of the hub in relation to the other language groups within the movement.
  • Exploring how to compose a inclusive and representative steering committee