Jump to content

Grants:Project/WCDO/Culture Gap Monthly Monitoring/Final

From Meta, a Wikimedia project coordination wiki

Report under review
This Project Grant report has been submitted by the grantee, and is currently being reviewed by WMF staff. If you would like to add comments, responses, or questions about this grant report, you can create a discussion page at this redlink.

Welcome to this project's final report! This report shares the outcomes, impact and learnings from the grantee's project. For a more detailed report you can consult the midpoint or the timeline.

Part 1: The Project[edit]


The solutions proposed in this project grant are based on developing key resources for working on Wikipedia’s content diversity to the advantage of any language community or global initiative. These resources facilitate raising awareness on Wikipedia’s current state of diversity by providing datasets, visualizations, and statistics, as well as pointing out solutions and tools to communities to do it. This is what we achieved:


  • We expanded and automatized the method to generate the database wikipedia_diversity.db to include new types of data related to other content biases.
  • We created 10 new tools providing points of action in order to help the representation of content and its sharing across Wikipedia language editions.
  • We created 14 visualizations pages showing the gaps related to culture, gender, geography, among others.
  • We extended the portal Wikipedia Cultural Diversity Observatory with new pages (e.g. guidelines, maturity levels, etc). and expanded the external website (wcdo.wmflabs.org) with new tools and visualizations (e.g. treemaps, line charts, stacked bars, etc).
  • We changed the name of the project (including the meta page or portal) to Wikipedia Diversity Observatory as it reflects better the scope, which does not limit to cultural diversity.

Data and Research:

  • We perfected the method to obtain local content datasets (named Cultural Context Content) for all 300 Wikipedia language editions and made them public. Now it also collects data related to many other content biases and gaps: geograpy gap, gender gap, ethnic groups gap, sexual orientation, religious groups gap, among others.
  • We created a database with all the World Languages with its characteristics in order to study potential new Wikipedias. This can be found in the tables of diversity_categories.db.
  • We introduced in Wikidata new data relative language geolocation (country, subregions and gecoordinates). We used the WALS database, among others.
  • We provided valuable solutions to bridge the content gaps with some lists of articles (Top CCC articles) according to the previously mentioned types of gaps.

Community Engagement

These are the slides used for the talk on Wikipedia Cultural Diversity Observatory given in Marrakech, Morocco for the WikiArabia event.

These are the latest dissemination actions we did in order to raise awareness on the content diversity problem in Wikipedia. They include academic papers and conferences, book chapters, among others.

Methods and activities[edit]

What did you do in project?

The project was divided in three phases (Phase 1: Development (Visualizations and Tools), Phase 2: Dissemination across communities, Phase 3: Research: Wikipedia project gaps / Language strategy planning). These phases could be equated to research, development and activism. As it can be seen in the mid-term and monthly reports (which are certainly much more detailed at a task-level), research and activism have been done first, while development has been the last part of the project.

These are some of the methods and activities carried out to deliver the results:

Phase 1: Development (Visualizations and Tools)

  • We improved the framework in order to be able to run the different processes to the data.
  • We included gender and many other types of biases in the dataset in order to be able to provide richer lists of articles and analyses (not initially foreseen in the project plan).

Phase 2: Dissemination across communities, Academia and general reader

  • We gave 8 presentations in Wikimedia conferences, 1 book chapter, 1 paper and several online videocalls in order to explain the different tools and visualizations.
  • We automatized the processes and published all the code at public places so anybody can contribute (a github account wcdo and the project complies with the requirements for 'right to fork policy').

Phase 3: Research: Wikipedia project gaps / Language strategy planning

  • We published the Dataset publicly and for the research community and presented it at the conference ICWSM, Munich June 11-13th (Program). Reference: Miquel-Ribé, M., & Laniado, D. (2019). Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions. Proceedings of the 13th International AAAI Conference on Web and Social Media (pdf). ICWSM. ACM.
  • We published the Diversity Observatory scope in a paper. Miquel-Ribé, M. Diversity in a Language-Independent Wiki: Six Design Requirements and Goals to Embed a Diversity Mindset. In Proceedings of the 19th International Semantic Web Conference.
  • We collected feedback from communities in order to improve the interface at every Wikimedia activity described.
  • We actively participated in the Wikimedia Strategy Process 2030 as part of the Diversity WG to contribute with diversity-based recommendations and later joined the Writers Group to finally produce final strategy document.

This project has lasted for 12 months instead of 9, and, as said in the plan "at the end of the grant period, some dissemination activities continue" because the development part has necessarily taken more time in order to obtain the data. There had been some technical difficulties due to infrastructure changes and bottlenecks (database replicas and dumps) explained in previous reports.

Phase 4: Extension

This Phase 4 responds to a project extension which have been defined according to specific goals. The following points are some of the outcomes of this phase:

  • We changed the name of the project to Wikipedia Diversity Observatory in order to widen the scope and include more categories relevant to diversity.
  • We collected new data and expanded the wikipedia_diversity.db database for categories LGBT, ethnic groups, indigenous peoples, religious groups and time.
  • We created new visualizations and tools to retrieve articles about and understand the following gaps (ethnic groups, lgbtq articles, time-related articles).
  • We created a new tool to see the diversity of categories found in the recent changes.

Outcomes and impact[edit]


What are the results of your project?

The main outcomes are the database Wikipedia Diversity for all 300 Wikipedia langauge editions, the portal Wikipedia Diversity Observatory along with its external website with Dash/Plotly.

What has changed from the project plan?

The project plan presented in this project was quite tight but very clear in terms of the specific tasks to be carried away. I want to highlight two particular aspects on how the project execution differed from the plan.

On the one hand, we put the efforts in setting finishing the code to extract all the diversity types and create the database. This means that this part is done.

The 1) datasets and its results are reliable, 2) the method and abstraction to calculate several statistics to monitor the gaps is solid and 3) the technology for the external website is adequate (plotly) and allows creating new dashboards more easily.

On the other hand, we could not automatize the monitoring results on a monthly basis. This is because the background infrastructure to serve the data was not working at the desirable efficiency. At the same time, we are still optimizing the processes as they take around a week to collect the data and compute all the statistics. This is an area that has room for improvement.

Progress towards stated goals[edit]

Please use the below table to:

  1. List each of your original measures of success (your targets) from your project plan.
  2. List the actual outcome that was achieved.
  3. Explain how your outcome compares with the original target. Did you reach your targets? Why or why not?

The different tasks were organized around the three goals:

1. To monitor the content cultural diversity and gaps over time for each language edition (support editors).

2. To reach the entire Wikimedia movement with strategies for community engagement and stimulation (raise awareness).

3. To find new paths to increase diversity by studying marginalized languages (research languages).

Here we list all the different project tasks/goals as presented on the proposal and their evaluation (whether they have been done and explanation).

We also added the new goals/tasks from the extension.

If the reader wants to revise these goals, she can check the project summary and outcomes.

Planned measure of success
(include numeric target, if applicable)
Actual result Explanation
Short-term actions: disseminating the results (dashboards) and tools with regional, thematic and language-based contests (one-burst actions). Done. The use cases are explained in the website, but the tool has been used in Intercultur, Wikimedia CEE Spring, among others.
Long-term action: sending a monthly newsletter. Not done. It is all set up to do monthly monitoring in terms of data design, process and interface. but not on the infrastructure reliability. It is not possible to set up a newsletter because the infrastructure is not reliable. These concerns are explained in the midpoint report.
Culture gap (coverage and spread) dashboards for each language edition. Done Visualizations and tables available at the external site.
Geolocated articles grouped by countries and continents for each language edition. Done Visualizations and tables available at the external site.
Other groups of articles dashboards (Top CCC, Glam, Folk and Monuments) for each language edition. Done. Visualizations and tables available at the external site.
Last month article creation dashboard for each language edition. Done. Visualizations and tables available at the external site.
Temporal creation for article groups (geolocated groups and CCC) dashboard for each language edition. Done. Visualizations and tables available at the external site.
Last month pageviews dashboard for each language edition. Done. Visualizations and tables available at the external site.
Last month pageviews dashboard for each language edition. Done. Visualizations and tables available at the external site.
Improving the usability of the Top CCC article lists. Half done. Not all the requirements could be done (e.g. allow Excel download for all the tables)
List of CCC articles with very few Interwiki Links but with a high editing activity/high number of contributors (CCC “Pearls”) for each language edition. Done. The number of articles covered from other languages CCC is a good indicator.
List of CCC articles with very few Interwiki Links but with a high editing activity/high number of contributors (CCC “Pearls”) for each language edition. Done. Visualizations and tables available at the external site.
List of CCC articles with the most interwiki links for each language edition. Done. Visualizations and tables available at the external site.
List of CCC articles with few statements in Wikidata for each language edition. Done. Visualizations and tables available at the external site.
List of CCC articles with most edits during the past month for each language edition. Done. Visualizations and tables available at the external site.
List of articles that are related to two languages CCC (overlapping of cultures). Done. Visualizations and tables available at the external site.
Search results: groups of articles on a topic or category. Done. Visualizations and tables available at the external site.
Content difference visualization for a specific Top CCC article list. Done. Visualizations and tables available at the external site.
Image galleries for a specific Top CCC article lists and overall CCC. Done. Visualizations and tables available at the external site.
Dashboard showing editors according to several characteristics such as their multilingual participation and topical preferences (e.g. the cultures they write about). Not done. Not possible for the same reason. Infrastructure reliability. The dataset MediaWiki History was launched too late. These concerns are explained in the midpoint report. The code is available but it is not optimized. It remains as a future task.
LGTB Gap and Top LGTB Articles dashboards. Done. Visualizations and tables available at the external site.
Ethnic Groups and Indigenous People dashboards. Partially done. Indigenous groups are not differentiated from other Ethnic Groups in the dashboard. Visualizations and tables available at the external site.
Recent changes and recently created articles related to diversity groups dashboard. Done. Visualizations and tables available at the external site.
Gender and LGTB article content biases dashboards. Done. The LGBT content bias (in-article) has not been possible to measure. The LGBT topics are not related among themselves as much as for example the cultural topics or more specific topics (geography, monuments, etc.). The Gender bias (in-article) has been added to the the Gender gap dashboard.

Think back to your overall project goals. Do you feel you achieved your goals? Why or why not?

Overall the project goals were ambitious and mostly they have been completed with success. The automatization and monthly e-mail still remains a work-in-progress. We will continue giving support to the project to give constant results/monitoring even though this may be actioned manually.

We reckon the project was thought and planned to a minimal detail. This was due the previous research activities of the team members and their expertise in the area. However, this same degree of clarification has become a burden for some undone subgoals, as some more fundamental tasks like creating the dataset or the abstraction to analyze the gaps required more time.

Projects require some constant reassessment of priorities. We are convinced that it was more important to set a good foundation than to rush for visualizations. The requirements for working with all Wikimedia languages and maintain a certainty of its quality has been a priority (we run manual assessment with several users in order to be sure of the datasets quality which is published in the paper).

At the same time, the feedback received by the communities and its members has always been very grateful and useful in terms of usability and further functionalities. The feedback received while being part of the Wikimedia Strategy 2030 has been useful to expand the project to other types of diversity (e.g. sexual orientation, religious groups, etc.).

The connections we made also encouraged us to present the project to more venues than initially planned like WikiArabia 2019.

Global Metrics[edit]

We are trying to understand the overall outcomes of the work being funded across all grantees. In addition to the measures of success for your specific program (in above section), please use the table below to let us know how your project contributed to the "Global Metrics." We know that not all projects will have results for each type of metric, so feel free to put "0" as often as necessary.

For more information and a sample, see Global Metrics.

This project is fundamentally producing valuable statistics, visualizations and datasets. Metrics on their use can be explicative of the degree of success. However, at this moment there is no dissemination nor time enough after its setup in order to evaluate the success in these terms.

Some of the measures of success in regards to viewers expected at the end of an year after the project start are:

  • 1,000 pageviews per month on our statistics website. Not measurable as there is measurement tracking system in meta.
  • 100 data downloads per month of our dataset. Not evaluated yet.
  • 5 data re-use cases of our dataset. Not evaluated yet.
  • A website with a self-updating graphs and data downloads. Done.

Some other classical measures that apply well to activisms are not valuable to this research/communication project

Metric Achieved outcome Explanation
1. Number of active editors involved 30 (testing the interface) + 10 (manual assessment of the datasets). During conference events I performed informal usability tests (with think-aloud protocol) to gather feedback.
2. Number of new editors N.A. (Not applicable)
3. Number of individuals involved N.A. Providing a number would not be explicative of the work in research and communication.
4. Number of new images/media added to Wikimedia articles/pages N.A. Too early to measure. Feedback from communities is positive as they will use it for contests.
5. Number of articles added or improved on Wikimedia projects N.A. Too early to measure.
6. Absolute value of bytes added to or deleted from Wikimedia projects N.A. Too early to measure.

Learning question
Did your work increase the motivation of contributors, and how do you know?::This project has this goal among others: "Every Wikipedia language community is aware and knows about the knowledge inequalities in the entire Wikipedia project".
We presented in different venues the discourse of helping every Wikipedia to have a minimal coverage of each other language cultural geographical content and the community comments were always positive and encouraging. We have no certain means of knowing the impact of the current visualizations prototypes for Top CCC articles or their related panels, but they were said to be impressive and extremely useful by members of communities such as Wikimedia Deutschland, Wikimedia España, Wikimedia Ukraine, among others.
We are positive that the better the interface and the better communication of the WDO data and tools the more it will motivate editors to bridge the gaps with content they find valuable.


The project success is evaluated according to the fulfilment of the different goals and its related tasks. Raising awareness and providing tools are the path we identified in order to improve content diversity. Nonetheless, it is important to remark that this is principally a data-oriented project rather than an activism one (less than a third of the time is planned into tasks to reach the communities). This means that the main focus is to create the resources that will be used to bridge the gaps, regardless of how the communities organize themselves (in the form of events, regional partnerships or online spontaneous activity). This project is fundamentally producing datasets, statistics and visualizations.

Indicators of impact[edit]

Do you see any indication that your project has had impact towards Wikimedia's strategic priorities? We've provided 3 options below for the strategic priorities that Project Grants are mostly likely to impact. Select one or more that you think are relevant and share any measures of success you have that point to this impact. You might also consider any other kinds of impact you had not anticipated when you planned this project.

How did you improve quality on one or more Wikimedia projects?

Covering a minimum of each content bias (culture gap, geography gap, gender gap, etc.) is fundamental to have a complete and quality encyclopaedia. Proposing lists of relevant articles for particular gaps is useful especially for contests, therefore, its impact will become more and more evident when the project WDO becomes more known across the different communities.

Some gaps are more difficult to measure. However, now the database allows measuring all those biases and gaps which are relevant to the Wikimedia Strategy 2030 group. The database allows further analyses and answering valuable questions such as:

  • How well any Wikipedia covers the existing world cultural diversity (gaps)?
  • Are the articles created each month dedicated to fill these gaps?
  • Which are the most relevant articles from each Wikipedia’s related cultural context and particular topics?
  • ...

Project resources[edit]

Please provide links to all public, online documents and other artifacts that you created during the course of this project. Examples include: meeting notes, participant lists, photos or graphics uploaded to Wikimedia Commons, template messages sent to participants, wiki pages, social media (Facebook groups, Twitter accounts), datasets, surveys, questionnaires, code repositories... If possible, include a brief summary with each link.


The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you took enough risks in your project to have learned something really interesting! Think about what recommendations you have for others who may follow in your footsteps, and use the below sections to describe what worked and what didn’t.

We have learnt to think about the needs of 300 languages.

What worked well[edit]

What did you try that was successful and you'd recommend others do? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.

It is actually quite hard to highlight what worked well. We did not use any learning pattern strictly, but we would like to point out Learning patterns/Conducting a semi-structured interview as some of the recommended solutions were valuable for the personal presentations of the project and feedback gathering.

One anecdote proves the important of this project, in specific, when trying to understand the impact of Wikimania 2018.

With dashboards like this: https://wdo.wmcloud.org/diversity_over_time/ We could see that the impact of Wikimania 2018 (Bridging the Knowledg Gaps) which referred explicitly to the gaps related to the African content was not significant. We can see this positively because it confirms that we need tools like this one and constant actions in order to bridge the gaps.

What didn’t work[edit]

What did you try that you learned didn't work? What would you think about doing differently in the future? Please list these as short bullet points.

  • Not preparing for the worst-case scenario in terms of technical infrastructure was a mistake. We needed to go straight to the dumps, and in any case, push for the development of the MediaWikiHistory one, as it was key to finish the project.
  • Not communicating enough with endorsers (considering we were more focused on research, methodological aspects and development). However, the project is now done in terms of data and nearly done in regards to the visualizations. The project needs more dissemination and usability improvements in order to expand to all the contests and uses by the affiliates.

Next steps and opportunities[edit]

Are there opportunities for future growth of this project, or new areas you have uncovered in the course of this grant that could be fruitful for more exploration (either by yourself, or others)? What ideas or suggestions do you have for future projects based on the work you’ve completed? Please list these as short bullet points.

As stated before, WDO has finished conceptualizing and identifying all the content gaps. The framework allows obtaining statistics for each of them and the website can support many dashboards to analyze them over time and at the same time propose points of action to bridge them.

There are many next steps and opportunities that could be detailed. However, we'd like to point out three:

  • Improving the interface of the dashboards and testing them with more users to be sure they're easy to use.
  • Engaging the Affiliates into using them for their contests (e.g. Asian Month, etc.).
  • Creating new relevant article lists for every language edition for specific topics.
  • Doing all the presentations in conferences using only the website dashboards instead of Powerpoint or most of it. This can encourage the audience to play with the dashboards themselves.

This will probably turn into a new project but there is no deadline now as we want to disseminate a bit more and receive feedback from key members of the Wikimedia movement.

There are always other categories that are relevant to diversity, but the ones we identified (gender, culture, geography, ethnic groups, religious groups, sexual orientation, lgbt+) are those that have more people motivated to work with. If there is a specific demand for a topic, we could try to expand the database and dashboards.

Meanwhile dissemination and continual improvement of the tools (e.g. UI) seems the most reasonable way to continue the project.

Part 2: The Grant[edit]


Actual spending[edit]

Please copy and paste the completed table from your project finances page. Check that you’ve listed the actual expenditures compared with what was originally planned. If there are differences between the planned and actual use of funds, please use the column provided to explain them.

All the funds exposed in the finance tab have been spent for the specified purposes.

Remaining funds[edit]

Do you have any unspent funds from the grant?



Did you send documentation of all expenses paid with grant funds to grantsadmin(_AT_)wikimedia.org, according to the guidelines here?


Confirmation of project status[edit]

Did you comply with the requirements specified by WMF in the grant agreement?


Is your project completed?

Yes, it reached the goals proposed for this 9-12 months phase.

Grantee reflection[edit]

We’d love to hear any thoughts you have on what this project has meant to you, or how the experience of being a grantee has gone overall. Is there something that surprised you, or that you particularly enjoyed, or that you’ll do differently going forward as a result of the Project Grant experience? Please share it here!

Finishing a project is always a good sensation. The project has proved that the dashboards are useful and help the communities and different Wikimedia stakeholders. This is very rewarding as we can see that research can become practical and help communities. It can be the trigger to change things in terms of diversity coverage.