Jump to content

Grants:Project/WCDO/Culture Gap Monthly Monitoring

From Meta, a Wikimedia project coordination wiki

WCDO / Culture Gap Monthly Monitoring
summaryThe Wikipedia Cultural Diversity Observatory (WCDO) proposes a set of solutions to regularly assist communities and individual editors to increase the cultural diversity in their language editions’ content.
contact• marcmiquel(_AT_) gmail.com
this project needs...
created on08:29, 27 November 2018 (UTC)

Project idea[edit]

What is the problem you're trying to solve?[edit]

Thanks to the Wikipedia Cultural Diversity Observatory (WCDO) we have a clear cartography of the different cultural and geographical contexts covered by each Wikipedia language edition, which is an essential starting point to work on the culture gap, or the problem of Wikipedia’s lack of content reflecting the existing world’s cultural diversity.

a) From a community point of view, the culture gap can be attributed to two issues: a) the lack of awareness of the depth of the gaps by the editors who could improve or create the content, b) and the lack of practical solutions in order to attain reasonable and fulfilling goals. Unlike other biases like the gender gap, it is a bit more difficult to guess where the culture gaps are located, their relevance and how to bridge them.

The data provided so far by the WCDO allows us to know better the depth of each content gap, and at the same time the Top CCC article lists have set a general solution encompassing the 300 languages. These lists contain the most relevant articles in terms of different features and topics (geography, men, women, etc.) from each culture and country associated to each Wikipedia language edition. The idea behind top articles (or vital articles) is to cover a minimum of each other’s cultural context content. General tables such as Top CCC article lists coverage or Top CCC article lists spread show an overview on how a language has covered other languages Top CCC lists or spread theirs across the other languages.

Such data-driven solutions may help editors to identify the most urgent articles to be created for inter-cultural enrichment from a given language, and can integrate and assist many community dynamics such as existing regional contests (CEE Spring, Intercultur), world challenges (Asian Month) and other events. Past events like Wikimania 2018 dedicated to “bridge knowledge gaps” and to raise awareness on the need to cover African content pushed in this same positive direction.

Nonetheless, the current visualizations and tables created in the first phase of the project fall short in order to answer the question on whether each language community has received the pledge to improve diversity given in Cape Town, and has swung into action to bridge the content culture gap. It is important to celebrate events and dedicate them to specific topics in order to raise awareness. However, we cannot be sure of the event messages’ impact and how they translate into action. Also, an event can only reach a limited number of participants, much smaller than the entire communities and active members.

In order to complement this, it would be necessary to have a clear and updated views on the gaps on a constant basis (e.g. on a language-based newsletter). Only this way editors can incorporate tasks and feed their motivations regularly. Because a monthly message answering whether a language edition is bridging the gap or not can shape the choices and have a more powerful impact in the long-term than contests and events.

Likewise, given that not all communities members can attend events and establish cross-language collaborations, it would be necessary to have a tool in order to find out which other multilingual members from the global community can be potential partners to collaborate with. In other words, a place where to query multilingual editors which have a previous interest in specific cultural context content in order to bridge the culture gap, either by importing one’s context content or to extend theirs in one’s language edition. This tool or space would provide editors’ usernames and talk page along with some of their features related to participation and cultural/geographical interests. Along with the newsletter or the monthly monitoring of the gap, this dashboard has the potential in order to strengthen the links in the global community and improve the organization in bridging the culture gap.

b) From a wider perspective, the culture gap stems also from the missing language editions, and missing local content from those that struggle to have presence in the Internet. There is the need to incorporate and sustain new projects from marginalized languages, which in turn can create and spread content related to their cultural context. This is also known as to “decolonizing the Internet”.

The WCDO wants to provide data with strategic value in order to face these two problems that affect cultural diversity: to assist and engage editors to freely browse the culture gap and work on it, and to identify marginalized languages which have a potential to become language editions and enrich the Wikipedia cultural diversity.

Figure 1. CCC Datasets are a necessary map and a starting point to fight for cultural diversity in each Wikipedia (video explaining them).

What is your solution to this problem?[edit]

The previous stage of the project allowed us to lay the project foundations and to create the WCDO website for data visualization, the Cultural Context Content (CCC) Datasets, and to disseminate it across communities and academia (journal paper).

As a quick reminder, Cultural Context Content is the group of articles in a Wikipedia language edition that relate to the editors' geographical and cultural context (places, traditions, language, politics, agriculture, biographies, events, etcetera.). The CCC datasets are a cartography and are fundamental in order to show the gaps and suggest further solutions such as lists of articles (Figure 1).

In this grant, in order to improve the problem of Wikipedia’s lack of cultural diversity we propose two particular types of solutions:

  • We want to use the CCC datasets to monitor the gaps on a monthly basis (showing the creation of articles for specific kinds of content to show whether and where editors are really bridging the gap) along with many other lists, solutions and improvements after all the feedback gathered in past Wikimedia events and from local communities. Likewise, we want to create a multilingual editors dashboard where to find potential collaborators. The editor must be able to query lists or visualizations where to see editors from other language editions or his and their cultural context interests.

Hence, in this phase the project mainly shifts from data analytics and machine learning to community engagement and data visualization.

  • We also want to provide strategic data to detect potential new Wikipedia language editions. Following our previous research and the language-territories mapping, we want to create an initial database of all the languages in a status of marginalization (similar to the language-territories mapping), in order to see their potential (in number of speakers and literacy) to become Wikipedia language editions, and to select content related to their cultural contexts that exists in other Wikipedia language editions (of other languages coexisting in these same territories) that should be created in their own native language in order to decolonize this knowledge.

This way, the solutions proposed are based on developing key resources for working on Wikipedia’s cultural diversity to the advantage of any language community or global initiative.

Project goals[edit]

Based on the previous diagnosis and envisioned solutions, we propose three distinct goals. They are the following:

Figure 2. Once the infrastructure is for WCDO is set (in black), it is possible to provide several solutions based on the data and estimate part of the impact of 2018 Wikimania theme "bridging the knowledge gap".

1. To monitor the content cultural diversity and gaps over time for each language edition (support editors).

1.1 Visualize the proportion and number of articles created on a monthly basis and the article segment they belong to (country, continent, or language cultural context content).

1.2 Create some new lists of valuable articles and tools to bridge very specific gaps.

1.3 Provide a tool to obtain the usernames of multilingual editors with whom to collaborate in bridging the gap.

2. To reach the entire Wikimedia movement with strategies for community engagement and stimulation (raise awareness).

2.1 Propose recommendations, challenges and guidelines for communities to work on the culture gap in events (short-term based strategy).

2.2 Create a newsletter for each Wikipedia language edition so they have a monthly reminder of the cultural context content articles created and the gaps (long-term based strategy).

3. To find new paths to increase diversity by studying marginalized languages (research languages).

3.1 Create a database with all the languages of the world and where they are spoken (similar to language-territories mapping) in order to a) detect content not represented by the native language b) find potential new Wikipedia projects.

These three goals will be detailed and expanded into specific actions in the project plan.

Project plan[edit]

The project is divided into three main activity lines or phases, which are 1) development, 2) dissemination and 3) research. The second phase depends on the first and will last for the entire project. The third is shorter and totally independent from the others. At the end of the grant period, some dissemination activities will continue. The entire duration of the project is 8 months (March – December 2019).

The project will comprise different activities which fall under the categories/roles of "research”, “development" and "activism/dissemination". The first phase ‘development’ will be planned and executed by the project manager/researcher, the second will require community leaders’ feedback and support, and the third will be executed following advisor’s ideas (hours counted are only for the project manager/researcher).

After every task, some explanation and motivation are given.

Phase 1: Development (Visualizations and Tools) - Month 1-6 (March to August)[edit]

Most of the resources detailed in 1.1 and 1.2 were suggested by community members in past Wikimedia events. All the dashboards will be developed under the Plotly data visualization library (check the chart gallery).

1.1 Awareness (Visualize and remember the gaps)

Figure 3. Currently, a Culture gap view for all 300 languages are depicted as a rich table. In 1.1 we propose to create several language-personalized dashboards with visualizations showing the particularities of each language coverage of each others cultural context content.

a) Create Language gaps dashboards

  • Culture gap (coverage and spread) dashboards for each language edition.

These are two dedicated dashboards for each language edition showing its coverage of the other language editions’ cultural context content (ccc) and spread the own cultural context content across the other language editions (Figure 3). It may be possible to group/sort languages by world continent, linguistic family, among others, in order to see better the gaps. This will improve the tables: https://wcdo.wmflabs.org/ccc_spread and https://wcdo.wmflabs.org/ccc_coverage by using graphs such as heat maps and treemaps.

  • Geolocated articles grouped by countries and continents for each language edition.

This dashboard will show the gaps for geolocated articles in a similar way than the previous one but complemented by maps. It will also include additionally extra measurements such as the number of articles per square meter (density) with the aim of showing how well each language covers the existing geolocated items.

  • Other groups of articles dashboards (Top CCC, Glam, Folk and Monuments) for each language edition.

Similar to the previous dashboards but centered on groups of articles from specific topics. There will be “calls to action” so editors are able to create new articles.

b) Create Monitoring progress overview dashboard

  • Last month article creation dashboard for each language edition.

This dashboard will show the proportion and number of articles created during the previous month classified according to different groups of articles (geolocated articles: countries, continents and subcontinent regions; cultural context content: own CCC and other languages CCC). For instance, this is already available for the gender gap. This will be visualized as a filled area plot.

  • Temporal creation for article groups (geolocated groups and CCC) dashboard for each language edition.

The dashboard will include several filled area plots to show a temporal perspective (months/years). A possible variation of this dashboard will include equivalent visualizations created with the number of edits instead of created articles.

  • Last month pageviews dashboard for each language edition.

This dashboard will show the proportion of pageviews from the previous month dedicated to each group of articles (geolocated articles: countries, continents and subcontinent regions; cultural context content: own CCC and other languages CCC). It will also include a list of the most seen articles and the group they belong too. This will be visualized as filled area plots, along with other types of visualizations.

1.2 Resources (Browse the gaps and bridge them)

These tools will improve the existing Top CCC article lists and add additional lists and features in order to show a wide variety of CCC gaps of valuable content (Figure 4).

  • Improving the usability of the Top CCC article lists.

Some of these issues/aspects have been suggested or perceived after seeing wikimedians use the lists:

  • Introduce a way to filter those articles which already exist in the target language (e.g. checkbox and a new parameter in the URL).
  • Introduce a print-button so it is easier for the editor to obtain the list and keep it.
  • Update the list with those articles being covered on real-time instead of the current delay of a month.
  • The number of inlinks/inlinks from CCC should be a link to the page “what links here” for the specific article.

If you have any other suggestion, please send it to tools.wcdo@tools.wmflabs.org.

c) New lists of Top CCC articles

Figure 4. Top CCC articles lists are different selection of articles from CCC (such as gender, geolocation, etc.) ranked according to a particular feature (number of pageviews, number of editors contributing to it, etc.).This is a useful way to find some relevant articules to bridge the gap.
  • List of CCC articles of topics such as “Earth”, “Folk”, “Monuments” and “Glam” for each language edition.

These tables will include relevant article lists for each Wikipedia language edition Cultural Context Content using Wikidata properties.

  • List of articles that should be in a language CCC but only exist in a bigger language (native language CCC gap) for each language edition.

These tables will include a list of articles that should exist in one language CCC (e.g. article about a Uganda topic in Wikipedia Luganda), but instead it only exists in a language edition from a language official in the same territory (e.g. English Wikipedia). This table with articles sorted by relevance (showing some specific features such as edits, pageviews, etc.) should encourage Luganda editors to write these articles themselves in their language. This was a suggestion by the Wikimedia Education group.

  • List of CCC articles with very few Interwiki Links but with a high editing activity/high number of contributors (CCC “Pearls”) for each language edition.

These tables will include a list of articles from a language CCC and its availability in a second language (like Top CCC article lists) based on a uniqueness criterion. According to one of our previous studies (Miquel, 2017), number of editors is the feature that correlates best with the number of interwiki links (the more editors intervening in an article creation, the easier it is that the article exists in other language editions). Hence, the table will include articles that matter a lot to many editors from a language edition but have interwiki links (a sort of “pearl”).

  • List of CCC articles with the most interwiki links for each language edition.

These tables will simply show the list of CCC articles with most interwiki links along with some relevance features.

  • List of CCC articles with few statements in Wikidata for each language edition.

These tables will show relevant articles from CCC (and the entire Wikipedia) with very few statements in Wikidata.

  • List of CCC articles with most edits during the past month for each language edition.

These tables will show relevant articles from CCC that received most edits during the previous month.

  • List of articles that are related to two languages CCC (overlapping of cultures).

These tables will show articles that either are part of two languages CCC or have relationship to two (e.g. a writer who has lived in both Poland and Czech Republic). This was a suggestion by the Wikimedia CEE group.

d) CCC articles search

  • Search results: groups of articles on a topic or category.

This will show the results from a search engine but limited to the content of a language CCC. Like Top CCC articles lists, the table will display the articles ordered by relevance and availability in a target language. In order to search for a topic in a foreign language (e.g. “Ancient Greece” in Greek Wikipedia), several strategies will be deployed using both the categories in common, Wikidata, the Search API and the Content Translation Tool.

e) Improving the CCC content

  • Content difference visualization for a specific Top CCC article list.

For a chosen list from the previous Top CCC article lists, a table would provide the articles already covered by a target language (e.g. Top 100 or Top 500) but sorted by the difference of the features number of Bytes, references and images in the target language and the original language. This way, it is possible to see where there may be more room for improvement.

  • Image galleries for a specific Top CCC article lists and overall CCC.

For a chosen list from the previous Top CCC article lists, a gallery of images from these articles would display those more prominent in terms of number of uses or those more spread across the language editions. The images would show whether they are used in a target language or not.

f) Multilingual editors dashboard

  • Dashboard showing editors according to several characteristics such as their multilingual participation and topical preferences (e.g. the cultures they write about).
Figure 5. These are the final functionalities we propose for the observatory. They all support the idea of helping editors "browse other cultures and places, and find the gaps they care about".

Editors will be able to query for editors to collaborate within the same Wikipedia or across languages (multilingually). It will be possible to query and retrieve a table with the most active editors in the CCC from each language, those most active exporting a specific language CCC into other language, those most active importing other languages CCC, etcetera.

This will be useful in order to ask for a foreign language editor for an article translation or even to have an idea of the possibilities of collaboration with certain communities. Obtaining a list of editors in such a way could remind us of a "leaderboard", which is a game element used when gamifying services. This dashboard was a suggestion originated in a workshop on regional collaborations in CEE spring.

Estimated time: 500h

Phase 2: Dissemination across communities, Academia and general reader - Month 1-10 (March to December)[edit]

2.1 Community engagement strategies

  • Short-term actions: disseminating the results (dashboards) and tools with regional, thematic and language-based contests (one-burst actions).

The previous tools and dashboards are useful for events such as Wikimedia CEE Spring, Asian Month, Wikiwomen, Celtic Knot, Wikimedia+Education, 100 Wikidays, among others.

Attending events and talking about the culture gap and the WCDO is essential to raise awareness on the problematic and engage them into action in the frame of their current one-time activity.

  • Figure 6. Similarly to what WEEKLYPEDIA does, a digest of the culture gap and the dashboards in a) will be provided on a monthly basis.
    Long-term action: sending a monthly newsletter.

In a similar way to Weeklypedia (Figure 6), the previous dashboards or links to them will be provided as a newsletter with different versions for each language edition including tailored results.

This newsletter will be generated for each language edition so editors can subscribe to the one they prefer.

2.2 Community Recommendations

For most of the language editions, the creation of Cultural Context Content is spontaneous and related to the editors’ appreciation of their context, as well as the need for information demanded by readers (“news”). The proportion of pageviews received by CCC articles is larger than the proportion of CCC articles in a Wikipedia (Miquel, 2017). When this does not happen it means that editors from a particular context do not value their context (e.g. the case of African Wikipedias). This has bad implications for education and community cohesion.

Several actions are proposed in order to communicate the value of representing the own cultural context.

  • Create a list of guidelines that every community should follow in order to represent the most valuable parts of a CCC.

Mainly, some guidelines would suggest communities to create articles representing their cultural context. These would be summary articles (e.g. Music in Swahili language or Writers from Kenya), important biographies and geolocated articles. Hundred articles for each category should be created so they appear in the Top CCC articles lists, which contain the most relevant articles from each culture. These guidelines would be directed to Wikipedia communities of marginalized languages and to other organizations such as Whose Knowledge. They could also be used in education programs in order to foster cultural context recognition.

  • Present a list of guidelines in line with the activities undertaken by the Wikimedia+Education team.

There is an opportunity to work on the content representation at an education level. Hence, it is necessary to transform these guidelines and knowledge into practical guidelines that can be used with the education partnerships. This may imply to disseminate the project in different venues like e-learning or African education conferences.

Contributing to represent the cultural context (cultural self-esteem) may initiate a virtuous cycle in Wikipedia use. Results from previous research (Miquel 2016) show that CCC are proportionally more edited and viewed than other types of content. Hence, it is important to start this cycle in certain languages which have not spontaneously started it.

2.3 Community feedback and continual improvement/data maintenance

  • Receive feedback from all communities on usability and content quality aspects and improve current interfaces and data processes.

The creation of CCC Datasets for 300 languages relies on some language-territory mapping created with the use of external databases. However, some communities may consider that some lines are not updated, and their contribution would improve the final result. Likewise, communities may demand extra lists or the improvement of some features (e.g. incorporating an Excel version of a list). It is necessary to consider this feedback in order to maintain and fix the current tools.

2.4 CCC and Academy engagement strategies

  • Publish a paper on the dataset and the WCDO goals in order to disseminate the lack of cultural diversity / culture gaps problematic.

WCDO is a joint space for researchers and editors. It is fundamental that the CCC Dataset is known in order to get the attention of academia to think of other practical uses of the dataset. Likewise, some researchers on cultural contextualization who have investigated controversies or linguistic points of view may contribute to the project.

Estimated time: 450h

Phase 3: Research: Wikipedia project gaps / Language strategy planning - Month 1-10 (March to December)[edit]

The second phase of the project is dedicated to “monitoring the gaps”, however, given that 2019 is the international year of Indigenous languages, as a complementary line of action, in the Phase 3 we include an exploratory study in order to find potential Wikipedia language editions and CCC that may enrich the existing Wikipedia cultural diversity and “decolonize the Internet”.

The preliminary data to be obtained at this phase is of strategic value for the future of the Wikipedia project.

Thus, this Phase 3 named Research aims at answering the following research question:

Figure 7. This new database will link all languages to countries and their status and use in order to detect possible new languages. This will be similar to Language-Territories Mapping, a database used to obtain CCC that links each language to the territories where is spoken as native or official.

RQ: What is the missing Cultural Context Content in Wikipedia project, either because it does not exist in its associated language edition or because the language edition does not even exist?

In order to do this, we need to set different tasks.

  • Detect which of the 300 Wikipedia language editions are from a language with a marginalization status (its presence is threatened in environments such as education, business or any public spaces).
  • Create a group of indicators to consider a language editing community / language edition readers group is relating to their context through Wikipedia in a healthy way (reader and editor engagement metrics, density of CCC articles by speaker, density of geolocated articles by speaker, density of geolocated articles by m2 at country region and continent levels, density of pageviews in CCC by number of speakers, among others).
  • Create a database with all the languages of the world and where they are spoken (in a similar way to the language-territories mapping database) including factors such as the number of speakers, language status, indigenous/legal use, among others (Figure 7). By examining these parameters we can detect whether these languages may escape marginalization and create their own Wikipedia language as well as self-representing themselves in the content.

Estimated time: 350h


The budget of the project is dedicated to covering the different research/development/dissemination activities plus the trip to Wikimania 2019. This adds to 1300 hours in 9 months.

Estimated total gross of 1300 hours = 26,000€

Estimated funding for Wikimania trip (Stockolm, Sweden - August 14-18) = 900€

Grand total: 26,900€ gross

Around 30% of this budget are taxes according to the Spanish legislation.

Community engagement[edit]

The data and tools provided by Wikipedia Cultural Diversity Observatory aims at providing strategic value in order to organize and fight for a better cultural diversity in Wikipedia contents.

  • Current stakeholders using and who have expressed and interest in using the data/tools provided by the WCDO are: Wikimedia CEE with the CEE spring contest, Education Program with indigeneous languages, Amical Wikimedia with the Catalan Culture Challenge, Wikimedia España with the Intercultur event, among others.

In fact, community engagement or dissemination is a crucial activity to disseminate the value of the conclusions derived from the data, and to transform them into positive action across the entire Wikimedia movement. These need to permeate and feed every area in order to fight for knowledge equity considering the 2030 Strategy direction.

There are different messages to be communicated depending on the precise actor within the movement. These are some we identify:


  • To the medium-big Wikipedia language editions communities, the most important idea is to engage them into covering a minimum of 100 articles of every other language edition related cultural context (this is roughly 30,000 articles). As said, this needs to be communicated in events and both raise awareness and provide solutions to editors so they can bridge these gaps.
  • To smaller Wikipedia language editions communities, the main idea to communicate is to convince them of the value of creating a minimum of articles of different topics (this was detailed in Phase 2.2 community recommendations). Unfortunately, many language editions do not reach the 100 articles for topics like biographies or geography.

Wikimedia Foundation areas:

  • Figure 6. This is a dissemination plan for the second stage of the project Wikipedia Cultural Diversity Observatory.
    To the groups in charge of community programs on education, GLAM and others, it will be useful to engage them into knowing better the value of Cultural Context Content in each language edition. Questions such as “What content should we promote in education? Does this particular Wikipedia have enough glam-related content from this other Wikipedia language context (e.g. Does Italian Wikipedia contain enough articles about British/American Museums?)?
  • To the community relations groups, the main idea to communicate is to stimulate content exchanges in particular gaps of interest (e.g. as a real case, dealing with Gender gap by creating Swedish Women in Persian Wikipedia).
  • To the advancement and global partnerships groups, the data regarding the potential new Wikipedias (marginalized languages) will be useful in order to consider new efforts. Likewise, institutions may be interested in the societal value of Wikipedia as a resource to represent the cultural context and supply the demand of information by readers of a language.
  • To the groups working on community engagement and editor retention, according to the project’s previous research (Miquel 2016), anonymous editors and administrators are both proportionally more engaged in creating content representing their cultural context than regular registered editors. Hence, campaigns including calls to action for newbies to contribute with pictures/articles about their surroundings may be effective to engage them in becoming editors. This is something that would require further investigations and prototype testing.

The exploration of future will collaborations with these actors will be a constant and central concern of the project team in order to make the observatory the most effective as possible. Figure 6 shows how the project integrates with the different areas of the Wikimedia movement.

In order to do this, we plan to attend regional and global events such as WikiArabia and Wikimania 2019 so it is possible to have conversations with both people from Wikipedia communities and Wikimedia Foundation staff members (see the diagram on how the project intersects with almost every area from the Wikimedia movement).

For the biggest impact possible it would require time to talk to members of every area described. Although it is important to remind that most of the time allocated for the project is aimed at data analytics and visualization, hence there will be time to work on it in the future.

Project impact[edit]

Target audience

Previous Outcomes (January-September 2018)
WCDO SiteWe created the portal Wikipedia Cultural Diversity Observatory and set the external website for visualizations with Dash/Plotly.
DatabaseWe created a Wikipedia Languages-Territories Mapping (csv in Github) database in order to study local content in every Wikipedia.
DatasetsWe established a method to obtain local content datasets (named Cultural Context Content) for all 300 Wikipedia language editions and made them public as csv and as sqlite (ccc_current.db).
Article ListsWe provided solutions to bridge the content culture gap with ten lists of top priority articles (Top CCC articles).
Culture Gap (Statistics)We created several statistics updated on a monthly basis that explain the depth of the gap for each language edition.
Community EngagementWe presented the culture gap and the WCDO goals to communities in several events (Wikiindaba 2018, Wikimania 2018, Wikimedia CEE Meeting 2018).
Academic PublicationWe published a paper (Open Access) on an indexed journal to disseminate the Wikipedia Culture Gap.

This project’s target audience is the entire Wikimedia movement. Academics and journalists may be interested in cultural diversity in Wikipedia considering the relevance of the object. Hence, we plan to mainly use the known communication channels used by Wikimedia groups of interest, and in addition release some notes or interviews with media. Some interesting sites that have published about Wikipedia and may be interested are Wired, The Next Web, Quartz, PopSci, HuffPost Science & Tech, among others.

Measure of success[edit]

The project success will be evaluated according to the fulfilment of the different goals and its related tasks. Raising awareness and providing tools are the path we identified in order to improve cultural diversity.

Nonetheless, it is important to remark that this is principally a data-oriented project rather than an activism one (less than a third of the time is planned into tasks to reach the communities). This means that the main focus is to create the resources that will be used to bridge the gaps, regardless of how the communities organize themselves (in the form of events, regional partnerships or online spontaneous activity).

This project is fundamentally producing datasets, statistics and visualizations. Metrics in order to measure their use will be employed.

Fit with strategy[edit]

All the messages mentioned in Community engagement are aligned with the Strategic Plan since improving the Wikipedia’s cultural diversity is in line with reaching knowledge equity. In fact, this term introduced in the 2030 Strategic direction refers to “the efforts on the knowledge and communities that have been left out by structures of power and privilege”. WCDO is necessary to evaluate the degree of success in reaching knowledge equity.

In other words, without an analytical project such as the WCDO it may be much harder in order to detect where the gaps are, their extent and their significance. Likewise, cultural diversity is a central issue that relate many others, and WCDO tools may benefit existing issues such as the content gender gap and stimulate synergies and joint efforts (see the included Top CCC article lists for Women for every language).


Like the prior materials delivered by the project, they will be extensively documented and published in open licenses – code, methodologies and datasets. The project is open to any collaboration in both development and dissemination.

Get involved[edit]


  • marcmiquel as grantee/researcher. I'm Marc Miquel-Ribé. I'm from Igualada, a small city near Barcelona. I did a non-paid PhD research on Wikipedia for multiple reasons, but mainly because I am very interested in understanding better the Editor Engagement and the Cultural Diversity of the project. I am member of Amical Wikimedia (Catalan Wikipedia) since 2011.
  • sdivad as research advisor. David Laniado, phD, senior researcher in computational social science. Sdivad has extensive experience in the study of online collaboration, in particular he has published over 15 academic papers on different aspects of social interactions in Wikipedia. He is co-creator of the Contropedia platform for the analysis and visualization of controversies in Wikipedia articles.
  • Diego as research advisor. Diego Saez-Trumper, phD, researcher in Social Networks Analysis, Graph Theory and applied Machine Learning. Part of the WMF Research Team since August 2017.

Community notification[edit]

In this part, you may find the communities and community discussion groups which have been (or are being) notified of this proposal.

WikiResearch Mailing list, Catalan Community Mailing list, Spanish Community Mailing list, English Wikipedia village pump, Wikimedia CEE Spring, Catalan Culture Challenge, Intercultur Wikimedia Spain, Bridges across Cultures and Wikiproject: Systemic Bias.

  • Volunteer Support as an admin on th.wikipedia, Thai language sister projects, and etc. B20180 (talk) 15:55, 4 December 2018 (UTC)


Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).

  1. Support Support Marc has been always coming up with brilliant ideas. --May Hachem93 (talk) 15:51, 27 November 2018 (UTC)
  2. Support Support I fully support this project especially that it will help many communities and "smaller" wikipedias to strengthen and to find ways and suggestions to grow.Anass Sedrati (talk) 22:47, 27 November 2018 (UTC)
  3. Support Support Tuvalkin (talk) 04:04, 28 November 2018 (UTC)
  4. Support Support I may have looked through too hastily, but I hope the Observatory is taking into account the fact that speakers of languages with smaller WPs usually speak other languages with bigger WPs, so not every WP has to cover everything.Michaelgraaf (talk) 18:42, 28 November 2018 (UTC)
  5. Support Support My support to a good idea that is delivering on visualising the cultural gap across wikis; and helping communities worried about this issue have a starting point from where take action. MuRe (talk) 11:22, 30 November 2018 (UTC)
  6. Support Support I saw the previous work of Marc and I think is a good input in our Diversity Working Group. Looking forward to see the first results. Because everything starts by monitoring our gaps. Camelia (talk) 13:07, 30 November 2018 (UTC)
  7. Support Support Marc and I spoke at CEE in Lviv Ukraine, and I thought his work was very relevant to supporting the needs of indigenous language communities, especially providing educational resources that can be used in mother-language education. I'm happy to provide further context if necessary. NSaad (WMF) (talk) 18:43, 30 November 2018 (UTC)
  8. Support Support I enjoyed Marc's lecture about Wikipedia Cultural Diversity Observatory at Wikimedia CEE Meeting in Lvov in October 2018 very much and this is why I am supporting his application to continue working at related projects. --Hladnikm (talk) 19:54, 30 November 2018 (UTC)
  9. Support Support This is serious work. -- Magioladitis (talk) 08:03, 1 December 2018 (UTC)
  10. Support Support As someone who is trying to fill the gap in the LGBTQI+ culture related content in the arabic wikipedia, I know such project will help a lot with diagnosing the roots of the problem and adressing it in a more efficient way. Houssem Abida (talk) 22:35, 1 December 2018 (UTC)
  11. Support Support - A more structural approach and support for local volunteers worldwide is very useful for promoting the goals of Wikimedia, and to enrich the skills and possibilities of volunteers. Romaine (talk) 19:07, 2 December 2018 (UTC)
  12. Support Support This projects is the sort of stuff Wikimedia is intended for. (BTW: writing from the very room where Intercultur was born) B25es (talk) 17:00, 3 December 2018 (UTC)
  13. Support Support A brilliant continuation for a brilliant project. I hope this project will be one of our best examples of how to address, measure, and solve the "gaps problem". Sannita - not just another it.wiki sysop 00:49, 6 December 2018 (UTC)
  14. Support Support It will help to bridge Culture Gaps. --Perohanych (talk) 10:42, 6 December 2018 (UTC)
  15. Support Support I have been endorsed the CCC project in the previous phase. I am glad that it has established some basic outcome, and I am willing to support it to see further study related to the diversity of culture in a global range. --Liang(WMTW) (talk) 04:16, 7 December 2018 (UTC)
  16. Support Support A project of great utility and impact to reduce cultural gaps in all Wikipedia versions. It will be an important source of information in order to organize Intercultur, an initiative to translate articles on cultural topics between the different languages of the Iberian Peninsula. Rodelar (talk) 16:59, 7 December 2018 (UTC)
  17. Support Support I think it is a very nice proposal and it can become a useful tool in order to help reducing cultural gaps in Wikimedia projects, and hopefully beyond that... On the other hand, if set up and documented with a broad approach, I think developed tools and methodologies could also be translated to other challenges and interest areas. Toniher (talk) 17:14, 9 December 2018 (UTC)
  18. Support Support this project! A useful tool for present and future projects focusing on bridging the gap. Looking forward to the results! Good luck Marc! CEllen (talk) 21:46, 10 December 2018 (UTC)
  19. Support Support Important tool in fostering cross-wiki unterstanding and important for next year's CEE Spring competition. Philip Kopetzky (talk) 10:48, 14 December 2018 (UTC)
  20. Strong support Strong support Marc delivers! I saw a demo of the first version of WCDO. It correctly identifies articles that are important to every culture and suggests which articles about this culture are most needed in other languages. It works between any pair of languages, as every good Wikimedia tool should. This is already becoming an important tool for bridging cultures. Marc has worked very thoroughly, starting his work from mapping all the relevant languages and building the necessary links. This tool is brilliant and much-needed, and I hope to see it developed further. --Amir E. Aharoni (talk) 19:26, 16 December 2018 (UTC)
  21. Support Support Marc introduced me to his project more than a year ago. His systematic analysis is what really impresses me about his project. Also, taking the findings to build a tool is great for the rest of the community. Have been a supporter and still am. In my humble opinion, research should get all of our support. Reem Al-Kashif (talk) 17:47, 21 December 2018 (UTC)
  22. Support Support. The project is absolutely useful. It was an honour for me to see it in WikiIndaba Conference 2018. --Csisc (talk) 09:23, 22 December 2018 (UTC)
  23. Support Support I support funding this project. Marc has demonstrated a thorough and careful approach, dedication, and has been doing a good job of communicating about the project. This tool will be key in pursuing our stated strategic objective of "knowledge equity". Asaf (WMF) (talk) 10:01, 22 December 2018 (UTC)
  24. Support Support --Micru (talk) 10:35, 22 December 2018 (UTC)7
  25. Support Support This must be even omre visible!-Theklan (talk) 11:52, 27 December 2018 (UTC)
  26. Support Support I support this project Dumbassman (talk) 16:12, 14 January 2019 (UTC)
  27. Support Support As per Asaf above (full disclosure: I am working together with sdivad on another - non WMF-funded - project, howerver I saw a presentation of the project by Marc, so I am basing my support to this project on that) --CristianCantoro (talk) 09:25, 1 March 2019 (UTC)