Jump to content

Wikipedia Diversity Observatory/Guidelines

From Meta, a Wikimedia project coordination wiki

This is a collection of 15 guidelines, facts, and challenges in order to improve cultural diversity in Wikipedia based on some analyses and research published during the past years by members of this project.

Respect your culture and context and them valuable content for Wikipedia.

Digital divide aside, one of the main problems that stop speakers from contributing to Wikipedia is the lack of self-respect or consideration towards their cultural context. There is no recognition of the value of traditions, the places, and the most relevant historical and present figures. If nobody contributes to this content, it will never be available on the Internet. Marginalized languages have cultural or psychological barriers that stop them from accessing Wikipedia, and it is related to the value they give to their culture.

We need to treat the problem of self-respect if we want these languages to grow. Because their language editions growth is not a language problem only, because language and knowledge are inextricable. I am certain that a way to help speakers of small and endangered languages enter Wikipedia is to send them the clear message that their knowledge matters, and that we need it in Wikipedia to complete the best depiction of human cultural diversity. One cannot tackle the language problem without tackling the recognition fo their speakers’ points of view and reasonings, encouraging their representation.

Be aware that building a Wikipedia is creating both global and local content.

Every Wikipedia has local content - whether it is about specific places in a city, writers, or elections results. Creating an encyclopedia does not mean having a collection of universal knowledge if that means a selection of articles that would be considered relevant from a Western point of view. Instead, it means having content that is useful to readers to understand the world, starting from their local surroundings. For this reason, creating local content is essential and should be a common practice in any Wikipedia.

Know the state of representation of your cultural context in Wikipedia.

We created a method to identify and collect local content (we named it Cultural Context Content, CCC)[1][2] for each Wikipedia. It is everything related to the context of the language. All the articles about the places, people, concepts, and events. In fact, the results show that it is an important part of every Wikipedia. It is around 25% on average among the biggest 40 language editions. In some languages like English, it is half of it, in others like Dutch it is 10%. This is content that is the center of activity of the Wikipedia - it is more developed (in terms of Bytes, references, and images), there are more discussions and more editors contributing to them.

Unfortunately, for 145 Wikipedia language editions, it is below 10% of their content, which means that they do not contribute enough to representing the knowledge on their context. For instance, in a group of African languages, the extent of cultural context content is very small - around 5%. 92 Wikipedia language editions do not have properly geolocated 100 articles in the territories where the language is spoken. This is a big problem for all the cultural diversity we can obtain in Wikipedia. One particular way to find gaps in representation in a language’s local content is to look for them in bigger Wikipedia language editions (the tool “Missing CCC” does that). All in all, Every Wikipedian should have a good understanding of how well is his or her context represented in Wikipedia, whether this means local, regional or country level, on one topic or another.

Create local content and represent your cultural context for the readers.

Contributing to represent the cultural context (cultural self-respect) may initiate a virtuous cycle in Wikipedia use. It is important to start this cycle in certain languages that have not spontaneously started it. For most of the language editions, the creation of Cultural Context Content is spontaneous and related to the editors’ appreciation of their context, as well as the need for information demanded by readers.

When a language has a healthy status, the proportion of page views received by CCC articles is larger than the proportion of CCC articles in a Wikipedia language edition[3]. When this does not happen it means that editors from a particular context do not value their context (e.g. the case of African Wikipedias). This has bad implications for education and community cohesion. We need to create a positive cycle that helps people internalize that their language and local knowledge matters and they can look for it in search engines and consult it in Wikipedia in their mother tongue.

Engage your education community in the creation of local content in Wikipedia.

Another way to try to give some sense of normality and self-respect to the language is to communicate it and teach it at school and any education facility. Using Wikipedia in education, and most importantly, teaching how to contribute to it with local content, can give reputation to the language and local knowledge. Not to mention the opportunity implied in the XXI century skills that Wikipedia teaches, such as source verification, online writing, etcetera. For this is it is important to help students recognize which are the topics that are more relevant to the community and later introduce them to the language edition in their mother tongue. It is essential to have a local content creation in education programs in order to foster cultural context recognition.

Ensure that local content is represented with articles about a wide diversity of topics.

As said before, local content encompasses a wide variety of topics that go from geographical places to prominent political figures to traditions and folklore. The selection of Cultural Context Content (CCC) is, in fact, a ‘local encyclopedia’ inside every Wikipedia. Usually, geography takes a relevant portion of CCC that goes from a 10% to 33% (mean 22%), followed by people (mean 19.4%), culture (mean 14.7%), society (mean 9.8%), among others that go from agriculture, business, education, environment, events, health, law, politics, religion, science, sports, and technology[3][4]. Editors should have an idea of how well each topic is represented in their local content.

When editors have interiorized the value of local content, they create articles on these topics every month. They know their value and continue creating them. The proportion of articles created about local content every month is very similar than the total proportion of local content in Wikipedia[2], and in these new articles created every month, the proportion of each topic is very similar to the final proportions for the entire group of articles that compound the CCC. We need to stimulate the creation of articles about every topic. Currently, more than half of the Wikipedias do not have even 100 articles about their most relevant men, women or places.

Create articles that summarize a general topic about your context (region, country or group of people).

One particular type of article that is very encyclopedic in the sense that it summarizes knowledge is the one that focuses on a particular topic and a particular context, for instance, “Catalan music” or “Catalan cuisine”. Even though there exist hundreds of musical organizations or compositions, an article dedicated to Catalan music gives a good understanding of what is more relevant and cannot be missed by a reader. These articles sometimes include the name of inhabitants of the territory or the territory itself (“Music of Catalonia”), but their scope is the same, a very well-documented article with more references and enhanced than usual.

In our research, we found that articles including such keywords (“name of the territory” or “name of the people”) are somehow special in terms of engagement. They are created by more editors but most importantly, they also receive many more page views than other articles considered local content[3]. These articles have also more interwiki links, which means that they are a good resource for other languages to have a good idea of a topic without the need to create every single article about it.

Set group goals and challenges to create local content and discuss what is more relevant.

One way to work better is to prioritize the decisions. In this case, we encourage communities to create lists of articles that may be more relevant than others in local content. In some Wikipedias, the lists of subjects and most important articles are known as “Vital articles”, “articles that every Wikipedia should have”, and so on. The creation of lists with specific goals of representation of local content can be very useful to stimulate a group of editors. The number 100 seems to be a very good challenge to remember and achievable by one motivated editor or a group.

Organize in the community to have a group of editors to follow the news and contextualize them by creating articles.

We previously highlighted the importance of creating local content, as it is content that generally obtains more pageviews than other types of content. It is possible to state that part of the readers' informational needs is directed towards their context, to follow the news and all sort of current events. The most-read articles correlate very well with the most searched topics in the news and are very related to the context. Therefore, every community needs to have some editors dedicated to recognizing which are the most relevant topics in the news and which articles are missing to create them.

Engage newcomers in creating local content right after they have registered in your language edition.

Local content is created both as the result of spontaneous actions of individuals editors and organized activities in every community with a certain capacity (e.g. with Wiki Loves campaigns). However, there are two facts on editor engagement in local content that are less known. First, a good portion of the edits that created the CCC in every language comes from anonymous editors. Anonymous editors are more triggered by this type of content than others[3]. Second, current administrators have usually started with a higher percentage of edits in cultural context content in their first days than other newcomers[3]. It seems then, that encouraging newcomers to create local content seems a good strategy to engage new editors in Wikipedia.

Export your local content to other language editions once there is a general representation in your language edition.

When a language community is healthy and there is a general engagement, the most active editors tend to become multilingual and contribute to other language editions (usually larger ones, like English, and those with some linguistic or geographical proximity). When in other languages, they tend also tend to contribute to articles that may be part of the local content in their language edition. In other words, they either ‘export’ their knowledge or patrol it to ensure that certain points of view are reflected the way they consider. “Exported articles” tend to be the main article on the country, the president, historical figures, or some celebrities, depending on the culture’s projection in the world[3]. It is important to notice that not all languages have the capacity to start activities of exporting their local content in other languages.

Fill the Wikidata properties for the items of the most valuable articles from your cultural context content.

There is one way to prioritize some efforts to spread their local content across language editions which is to create as many properties as possible in its Wikidata Qitem. Data shows that after the number of editors contributing to an article, those with more properties in Wikidata tend to be those most translated to other language editions. This is easy to understand, considering that many properties names have been translated across languages and, besides, they contain the most important facts about a topic. So if you want to increase the chances of an article to be translated to other language editions, work on it in Wikidata.

Start translating articles into specific languages to increase the chances of spreading cultural diversity.

When Wikipedia editors become multilingual and edit multiple language editions, they usually edit in English Wikipedia, followed by another big language that is spoken in that geographical region. The best way to help an article spread across the rest of the language editions is to create in English Wikipedia. However, if we think in strategic terms, it may also be interesting to create the article in specific other language editions - this those which are also relevant geographically on the planet: Russian, Spanish, among others. Creating the article with few sentences including few facts from Wikidata can be relatively easy with machine translation, and local editors can fix the possible mistakes. We are currently working on a tool to suggest which languages can leverage more impact in longer-term to spread an article to become multilingual, and another one to identify editors with an interest in local content from every other language.

Create events to represent and share your cultural context and exchange with other languages.

It is always better to create group strategies to represent and share the local content because editors both raise awareness on the need for this and fix the problem. For this reason, there exist some initiatives that are aimed at representing their own local content. One of them is the GLOW project, which stimulates some communities in a contest to create articles. Others like Intercultur or CEE Spring propose the creation of articles about the areas of the Iberian peninsula and Central and Eastern Europe among the language editions of the languages spoken in these areas. The Asian Month invites everyone to create articles about Asia, and the Catalan Culture Challenge invites Catalan and other language editions' editors to create articles about the Catalan speaking territories.

Engage in contests and events but create your own goals as communities.

The previous contests we suggested as excellent ways to improve the coverage of other languages’ cultures and contexts. However, following an agenda may not always be possible. For this, we really encourage the creation of long-term goals, like the creation of a minimum of articles about every other language context. If only 100 articles were created about every other language cultural context, there would be 30,000 in every Wikipedia (100 for each of the 300 language editions) that would guarantee a minimum of cultural diversity. For this, we propose the Top CCC articles, which are different lists of articles created according to different topical criteria (e.g. men, women, places, etc.) and relevance features (e.g. number of editors, number of edits, etc.) for every language local content.


  1. Miquel-Ribé, M., & Laniado, D. (2019). Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions. Proceedings of the 13th International AAAI Conference on Web and Social Media. ICWSM. ACM.
  2. a b Miquel-Ribé, M., & Laniado, D. (2018). Wikipedia Culture Gap: Quantifying Content Imbalances Across 40 Language Editions. Frontiers in Physics, 5, 12. (CC BY) Open Access.
  3. a b c d e f Miquel-Ribé. M. (2017). Identity-based motivation in digital engagement: the influence of community and cultural identity on participation in Wikipedia (Doctoral dissertation, Universitat Pompeu Fabra).
  4. Miquel-Ribé, M., & Laniado, D. (2016). Cultural identities in Wikipedias. In Proceedings of the 7th 2016 International Conference on Social Media & Society (p. 24). ACM.