Wikipedia Diversity Observatory/ToDo

From Meta, a Wikimedia project coordination wiki

In this page, we propose to write different lines of action for the Observatory to continue growing as a project to study diversity in Wikimedia and raise awareness on its current situation. Hopefully, these will be undertaken in the most immediate future. If you want to collaborate, do not hesitate to contact us at tools.wcdo@tools.wmflabs.org..

Goals or lines of action

For the project to grow, it is essential:

  1. To study not only the gaps in content but also in community composition. In other words, there should be a better collection of data in regards to community diversity.
  2. To collect data in regards to the contexts of each Wikipedia language community (e.g. GDP, education levels, etc.) in order to have an assessment of their barriers, their volunteering capacities, and therefore estimate their possible impact in the near future.

In line with the past phases of the project, we can divide the different tasks/goals in a) data extraction and b) the creation of tools and dashboards.

Data extraction

  • Collect data on diversity of editors according to their public information in their user pages. Explore the possibility of obtaining self-reported data from the communities / in collaboration with community engagement (WMF).
  • Identify a list of indicators for each country/subregion in order to determine the barriers for Wikimedia contributors, collect them from reliable source, and create a database/introduce them into Wikidata.
  • Collect the data of the cultural/other types of diversity in Wikidata and Commons as a whole.
  • Collect the data of editors per country in order to contextualize the results on cultural context representation in Wikipedia content.


Tools and dashboards

In-article gaps

It would be valuable to create a series of dashboards in order to give information on the gaps at in-article level, i.e. the missing perspectives for better diversity.

  • Notability support dashboard. One dashboard with different indicators in order to give more context to the article topic and its notability. For example, it could show the cultures the article is about, the average number of references in its local Wikipedia, and the % of coverage of articles from that culture in Wikipedia. The purpose of giving more information (or meta-information) about the article is to make editors aware of the reasons why it might or not might fulfill the Notability criteria they have set, and if, they consider so, to be more flexible about it (or signal it). This meta-information can help make more informed decisions.
  • Neutral Point of View support dashboard. One dashboard that shows the number of articles referenced in an article (as outgoing links) in one language edition along with all the articles referenced for this article in all the language editions’ versions of the article. It would suggest the most common articles that are used in all language editions but are missing on this one. It would also use the Wikidata properties-qitem in order to show articles that are not being used in the discourse.
  • Gender and LGTB biases. One dashboard in order to compare the percentage of outlinks to Gender and LGTB topics for any group of articles in order to see the biases, in other words, to detect the missing perspectives. It would allow comparing these percentages across different languages and see which are more biased and miss more perspectives (i.e. are less diverse).


Wikipedia content gaps

It would be valuable to create a series of dashboards in order to give information on gaps (at Wikipedia level).

  • Recent changes dashboard. One dashboard that shows the list of articles with recent changes (edits) and the list of articles recently created and their belonging to diversity groups (cultures, countries, gender, etc.) mapped to different colors to easily recognize them.
  • City/Region comparison dashboard. One dashboard that allows you to search for one city or region and obtain a representation of its most important places, people, etc. and its relevance (in terms of number of edits, pageviews, etc.). This would allow comparing the relevance or representation of places within the same cultural context.
  • Wikidata dashboard. One dashboard which shows the extent of each kind of diversity in Wikidata along with relevance features (e.g. number of edits, number of pageviews, etc.) in a similar way than its done for the Wikipedia language editions.
  • Top Articles Lists dashboard. More Top articles lists but based on the different groups of diversity previously not analyzed. These lists can be related to each specific language context/country and also form a “global” list for each type of diversity.
  • Editor participation dashboard. One dashboard where you can visually compare the engagement of the different language editions in terms of participation. For a single view on a community, this could be represented as a funnel in which there would be a) all editors, b) active editors, c) editors multilingual, d) editors taking a flag role, etcetera. among others.
  • Gender gap / time dashboard. One dashboard in which you can see the gender gap for a language edition (and for a group of language editions) according to specific periods of time (centuries). The gender gap may be inevitable in periods of history in which women did not take public roles, and therefore, there are no sources explaining their relevance. Instead, we can see the evolution of the gender gap, and the correction of this bias in the current century and the last decades.

There are many more dashboards showing valuable sides of the content gaps and its relation to the context that can be useful to determine the actions in order to help these communities overcome their challenges and be more prolific.


Dashboards improvements

Every dashboard should allow to download the data in an excel.

  • Every table should be able to download its content as Wikitext.
  • Every table should allow pagination.
  • Every Dashboard should allow returning results in csv or json (API-like).
  • Every table should allow to print as pdf.