Grants:Project/Wikipedia Cultural Diversity Observatory (WCDO)/Final

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Report under review
This Project Grant report has been submitted by the grantee, and is currently being reviewed by WMF staff. If you would like to add comments, responses, or questions about this grant report, you can create a discussion page at this redlink.

Welcome to this project's final report! This report shares the outcomes, impact and learnings from the grantee's project.

For a more detailed report you can consult the midpoint or the timeline.

Part 1: The Project[edit]


Our solution was "to create a site named Wikipedia Cultural Diversity Observatory (WCDO), where to provide automated rich statistics and visualizations on the current state of diversity, cultural context content datasets for each language edition as well as pointing out solutions in order to improve the exchange of content across Wikipedia language editions". This is what we achieved:



  • We created a Language-Territories Mapping (csv in Github) database in order to study local content in every Wikipedia.
  • We established a method to obtain local content datasets (named Cultural Context Content) for all 300 Wikipedia language editions and made them public.
  • We provided valuable solutions to bridge the content culture gap with some lists of articles (Top CCC articles).
  • We created several statistics updated on a monthly basis that explain the depth of the gap for each language edition.
These are the slides used for the talk on Wikipedia Cultural Diversity Observatory given in Lviv, Ukraine for the Wikimedia CEE event.

Community Engagement:

Methods and activities[edit]

What did you do in project?

The project was divided in three phases (Selection of Cultural Context Content, Website development, Dissemination and Community Engagement). These phases could be equated to research, development and activism. As it can be seen in the mid-term and monthly reports (which are certainly much more detailed at a task-level), research and activism have taken most of the time.

Research activities:

  • We used Machine Learning in order to obtain a more accurate version of the CCC (not initially foreseen in the project plan).
  • We included gender (based on Wikidata) in the CCC dataset in order to be able to provide gender-based article lists and analyses (not initially foreseen in the project plan).

Development activities:

  • Technology testing in order to choose a good visualization framework (Plotly instead of Bokeh, after trying d3 as well).
  • Automatized the processes and published all the code at public places so anybody can contribute (a github account wcdo and the project complies with the requirements for 'right to fork policy').

Communication activities:

  • Presented results to describe the culture gap, raise awareness and provided solutions such as the article lists.
  • Feedback gathering from communities in order to improve the interface at every Wikimedia activity described.
  • Looked for new angles to enrich the project and plan further steps (this may lead to a future grant project).

The project has lasted for 9 months instead of 6, and, as said in the plan "at the end of the grant period, some dissemination activities continue" because the research part has necessarily taken more time in order to set up the observatory data and architecture. Nonetheless, other dissemination actions were performed such as intervening and presenting in two Wikimedia conferences.

Outcomes and impact[edit]


What are the results of your project?

The main outcomes are the dataset Cultural Context Content for all 300 Wikipedia langauge editions, the portal Wikipedia Cultural Diversity Observatory along with its external website with Dash/Plotly.

What has changed from the project plan?

The project plan presented in this project was quite tight but very clear in terms of the specific tasks to be carried away. I want to highlight two particular aspects on how the project execution differed from the plan.

On the one hand, we put the efforts in setting a good foundation for the WCDO instead of rushing towards the visualizations.

This means that 1) CCC datasets are reliable, 2) the method and abstraction to calculate several statistics to monitor the gaps is solid and 3) the technology for the WCDO external website is adequate (plotly).

On the other hand, this implied that we could not develop all the visualizations that were proposed. While the project presents several tables, there are no maps or stacked bars to depict geolocated articles or topical coverage.

Progress towards stated goals[edit]

Please use the below table to:

  1. List each of your original measures of success (your targets) from your project plan.
  2. List the actual outcome that was achieved.
  3. Explain how your outcome compares with the original target. Did you reach your targets? Why or why not?

The different tasks were organized around the three goals: design an automatize method to obtain Cultural Context Content (CCC), create the WCDO site with data visualizations and disseminate the observatory across communities.

I list the different tasks/goals and their evaluation (whether they have been done and explanation).

Planned measure of success
(include numeric target, if applicable)
Actual result Explanation
Revise the criteria and improve the method in order to obtain the Cultural Context Content for each Wikipedia language edition (e.g. I propose using heuristics based on Wikidata, additional to the category tree used in the prior method) and extend it to the existing 288 Wikipedia language editions. Done. The requirements for working with 288 languages made this phase last longer than expected.
Set up the Observatory website (, choose the template and portal design. Partially done. The best solution in terms of consistency with other projects, localization (in mediawiki) and findability is to use meta for the portal and use an external website only for the data visualizations.
Create a table with the extent of CCC (and its subgroups of content) both in the observatory and in meta (anologously to and make the weekly automation code Done Other tables are also available at both the portal and the external site.
Develop and implement an on-going presentation of visualizations of cultural context articles created for each Wikipedia. Done, but there is just the foundation now. Although there is a long way to visualize well the CCC articles and its gaps with other graphs than tables.
A monthly updated table with all Wikipedias ranked by the extent of CCC in absolute number of articles and by percentage, along with various subgroups of content (geographical articles, biographies, among others). Done. It can be seen here: or
A monthly updated visual representation of the CCC topical subgroups of content and geographical articles. Not done. There was not time enough to do these visualizations. These were only visualized with tables.
A monthly updated visual representation of the culture gap which allows editors to rapidly see their language edition coverage of the CCC of the other language editions Partially done. There are two tables showing these results. Possible future improvements with visualizations.
A monthly updated culture gap index to easily compare how the different Wikipedia language editions cover the cultural content from the rest of Wikipedia language editions. Done. The number of articles covered from other languages CCC is a good indicator.
An on-going presentation of different visualizations of the articles created in each language edition that can be either a) labelled as CCC and by main topic categories assignation (biography, places, history, etcetera.), or b) as articles that bridge the culture gap with any other language edition. Not done. Monthly article creation monitoring is quite costly in terms of computing and would require further work.
Develop an algorithm which generates the list of the top 100 articles from each Wikipedia Cultural Context Content to be created in other Wikipedia language editions. Done. There are plenty of lists (10) of valuable cultural context related articles for very different segments / topics (geolocated, gender-based, etc.).
Spread the project to the existing interlanguage events (e.g. Wikimedia CEE Spring, Intercultur Wikimedia España, Catalan Culture Challenge, among others) Done. The prototype for Top CCC articles is presented to these specific groups, but more dissemination is required to reach similar groups/events.
Write a paper about the observatory to disseminate across the academia. Done.
Coordinate with the Wikimedia Research team in order to promote recommender improvements. Not done. This sort of collaboration was dependent on the CCC datasets and they have been refined until the end.
Set a plan for community engagement and dissemination including all the possible groups, sites and media. Partially done. Press release and other actions to the wider audience have not been execute due to lack of time.
Attend to Wikimania 2018 and present the Observatory. Done. Along with other Wikimedia events.

Think back to your overall project goals. Do you feel you achieved your goals? Why or why not?

Overall the project goals were ambitious and they have been completed with success.

We reckon the project was thought and planned to a minimal detail. This was due the previous research activities of the team members and their expertise in the area. However, this same degree of clarification has become a burden for some undone subgoals, as some more fundamental tasks like creating the CCC Dataset or the abstraction to analyze the gaps required more time.

Projects require some constant reassessment of priorities. We are convinced that it was more important to set a good foundation than to rush for visualizations. The requirements for working with all Wikimedia languages and maintain a certainty of its quality has been a priority (we run manual assessment with several users in order to be sure of the datasets quality which is published in the paper).

At the same time, the feedback received by the communities and its members - both from Affcom and online - has always been very grateful and useful in terms of usability and further functionalities. This encouraged us to present the project to more venues than initially planned - as the culture gap does not receive the same coverage is currently receiving the gender gap.

Global Metrics[edit]

We are trying to understand the overall outcomes of the work being funded across all grantees. In addition to the measures of success for your specific program (in above section), please use the table below to let us know how your project contributed to the "Global Metrics." We know that not all projects will have results for each type of metric, so feel free to put "0" as often as necessary.

For more information and a sample, see Global Metrics.

This project is fundamentally producing valuable statistics, visualizations and datasets. Metrics on their use can be explicative of the degree of success. However, at this moment there is no dissemination nor time enough after its setup in order to evaluate the success in these terms.

Some of the measures of success in regards to viewers expected at the end of an year after the project start are:

  • 1,000 pageviews per month on our statistics website. Not measurable as there is measurement tracking system in meta.
  • 100 data downloads per month of our dataset. Not evaluated yet.
  • 5 data re-use cases of our dataset. Not evaluated yet.
  • A website with a self-updating graphs and data downloads. Done.

Some other classical measures that apply well to activisms are not valuable to this research/communication project

Metric Achieved outcome Explanation
1. Number of active editors involved 30 (testing the interface) + 10 (manual assessment of the datasets). During conference events I performed informal usability tests (with think-aloud protocol) to gather feedback.
2. Number of new editors N.A. (Not applicable)
3. Number of individuals involved N.A. Providing a number would not be explicative of the work in research and communication.
4. Number of new images/media added to Wikimedia articles/pages N.A. Too early to measure. Feedback from communities is positive as they will use it for contests.
5. Number of articles added or improved on Wikimedia projects N.A. Too early to measure.
6. Absolute value of bytes added to or deleted from Wikimedia projects N.A. Too early to measure.

Learning question
Did your work increase the motivation of contributors, and how do you know?:This project has this goal among others: "Every Wikipedia language community is aware and knows about the knowledge inequalities in the entire Wikipedia project".
We presented in different venues the discourse of helping every Wikipedia to have a minimal coverage of each other language cultural geographical content and the community comments were always positive and encouraging. We have no certain means of knowing the impact of the current visualizations prototypes for Top CCC articles or their related panels, but they were said to be impressive and extremely useful by members of communities such as Wikimedia Deutschland, Wikimedia España, Wikimedia Ukraine, among others.
We are positive that the better the interface and the better communication of the WCDO data and tools the more it will motivate editors to bridge the gaps with content they find valuable.

Indicators of impact[edit]

Do you see any indication that your project has had impact towards Wikimedia's strategic priorities? We've provided 3 options below for the strategic priorities that Project Grants are mostly likely to impact. Select one or more that you think are relevant and share any measures of success you have that point to this impact. You might also consider any other kinds of impact you had not anticipated when you planned this project.

How did you improve quality on one or more Wikimedia projects?

Covering a minimum of each other's cultural context content is fundamental to have a complete and quality encyclopaedia.

Proposing lists of relevant articles for particular gaps is useful especially for contests, therefore, its impact will become more and more evident when the project WCDO becomes more known across the different communities.

The value of the CCC dataset lies in being an essential tool for bridging the culture gap. Unlike gender gap, the culture gap is harder to measure. The CCC dataset allows further analyses and answering valuable questions such as:

  • How self-centered any Wikipedia is (the extent of ccc as percentage and number of articles)?
  • Are the CCC articles responding to readers demand for information?
  • How well any Wikipedia covers the existing world cultural diversity (gaps)?
  • Are the articles created each month dedicated to fill these gaps?
  • Which are the most relevant articles from each Wikipedia’s related cultural context and particular topics?

Project resources[edit]

Please provide links to all public, online documents and other artifacts that you created during the course of this project. Examples include: meeting notes, participant lists, photos or graphics uploaded to Wikimedia Commons, template messages sent to participants, wiki pages, social media (Facebook groups, Twitter accounts), datasets, surveys, questionnaires, code repositories... If possible, include a brief summary with each link.


The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you took enough risks in your project to have learned something really interesting! Think about what recommendations you have for others who may follow in your footsteps, and use the below sections to describe what worked and what didn’t.

We have learnt to think about the needs of 300 languages.

What worked well[edit]

What did you try that was successful and you'd recommend others do? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.

It is actually quite hard to highlight what worked well. We did not use any learning pattern strictly, but we would like to point out Learning patterns/Conducting a semi-structured interview as some of the recommended solutions were valuable for the personal presentations of the project and feedback gathering.

What didn’t work[edit]

What did you try that you learned didn't work? What would you think about doing differently in the future? Please list these as short bullet points.

  • Setting goals on interactive visualizations when the data is the most fundamental part was a mistake. However, we were alerted by Jmorgan of the importance of documenting everything.
  • Not communicating enough with endorsers (considering we were more focused on research, methodological aspects and development).

Next steps and opportunities[edit]

Are there opportunities for future growth of this project, or new areas you have uncovered in the course of this grant that could be fruitful for more exploration (either by yourself, or others)? What ideas or suggestions do you have for future projects based on the work you’ve completed? Please list these as short bullet points.

As stated before, WCDO has set its foundation (CCC Datasets), the abstraction for statistics and the website for visualizations. There are many next steps and opportunities that could be detailed. However, we'd like to point out three:

  • Creating new relevant article lists for every language edition for specific topics.
  • Monitoring monthly created articles in order to see whether communities are bridging the gap.
  • Exploring the cultural diversity from languages without Wikipedia to see potential new projects.

This will probably turn into a new project but there is no deadline now as we want to disseminate a bit more and receive feedback from key members of the Wikimedia movement.

Part 2: The Grant[edit]


Actual spending[edit]

Please copy and paste the completed table from your project finances page. Check that you’ve listed the actual expenditures compared with what was originally planned. If there are differences between the planned and actual use of funds, please use the column provided to explain them.

Remaining funds[edit]

Do you have any unspent funds from the grant?



Did you send documentation of all expenses paid with grant funds to grantsadmin(_AT_), according to the guidelines here?


Confirmation of project status[edit]

Did you comply with the requirements specified by WMF in the grant agreement?


Is your project completed?

Yes, it reached the goals proposed for this 6-9 months foundational phase.

Grantee reflection[edit]

We’d love to hear any thoughts you have on what this project has meant to you, or how the experience of being a grantee has gone overall. Is there something that surprised you, or that you particularly enjoyed, or that you’ll do differently going forward as a result of the Project Grant experience? Please share it here!

We are exhausted but happy to write these lines. We believe the project's potential has been confirmed by all the feedback obtained from communities and different Wikimedia stakeholders. It is very rewarding to realize that it is possible to transform research into something practical and help communities.