Wikipedia Diversity Observatory

Wikipedia Diversity Observatory

A project to understand and increase diversity within Wikipedia content and communities

Research Resources

The Wikipedia Diversity Observatory (WDO) is a space to study diversity in Wikipedia's content and communities, identify and discuss needs and gaps, and propose and develop solutions to bridge them.

Mission and Vision

The Diversity Observatory mission is to help align the movement to achieve more diversity in the content of the different projects and in the movement overall. This includes diversity based on culture, geography, gender, sexual orientation, ethnic group, language, among others.

We envision the Wikimedia projects with more knowledge equity, and we envision the Movement as a fair representation of humanity taking into account all the existing diversity.

Activities

This page is a joint space for editors, researchers and all sort of contributors to study and fight against the content gaps. Hence, we provide strategic valuable data and tools to organize and take action. We want to centralize our internal knowledge to potentiate the initiatives working on diversity.

Among the main activities of the project, there is the research of content gaps, the study of the Movement’s diversity, the task of raising of awareness on the gaps and barriers, the creation of tools to provide points of action, and the dissemination of these efforts and results to the world through academic and general publications. As far as content diversity, the project raises awareness on Wikipedia’s current state of diversity according to specific topics and categories and provides datasets, visualizations and tools to improve it.

The Diversity Observatory is a place to discover the current state of diversity in the Movement, find out who is missing, and join efforts. As we move towards the 2030 Strategic horizon, we will be evaluating whether we are improving on our goals.

Categories for diversity

There exist some specific categories that are especially relevant for diversity as they tend to be underrepresented on Wikipedia. To achieve more diversity in content and fight for knowledge equity, it is necessary to represent all the different 1) places (geographical entities), 2) peoples (characteristics such as gender, sexual orientation, religious groups, ethnic groups, and indigenous group), 3) cultural concepts for each group of people and place, and 4) languages (national, indigenous and marginalized) of the world on Wikipedia.

The geography gap manifests itself mainly by a lack of articles about specific geographical entities (whether they are continents, countries, etc.) in most of the Wikipedia language editions.
The gender gap manifests itself mainly by a lack of articles (biographies) about women in most of the Wikipedia language editions when compared with articles about men.
An ethnic group or ethnicity is a category of people who identify with each other, usually on the basis of presumed similarities such as common language, ancestry, history, society, culture, nation or social treatment within their residing area.
The sexual orientation groups gap manifests itself mainly by a lack of articles (biographies and any topic) about LGTBQ+.
The religious groups gap manifests itself mainly by a lack of articles about people coming from every religion in every Wikipedia language edition. There is also a gap in knowledge about the topics that relate to every religion.
The culture gap manifests itself mainly by 1) a lack of representation of topics of a language cultural context in its language edition and 2) the lack of sharing or coverage of articles in other language editions that represent their cultural context.

The language gap manifests itself in a lack of a Wikipedia language edition for all the languages that are spoken in the world. Depending on the language status (e.g. minoritized), the number of speakers (e.g. a minority or a majority of the population), among other factors, it will be more difficult to engage speakers into becoming contributors. It is necessary to understand every linguistic situation. We created this page to provide statistics and analysis to see which languages could obtain their Wikipedia more easily.

There are several tools to bridge these gaps (e.g. gender is addressed by Wigi, Delenezh, and WDCM Biases dashboards). In this page you can read more about the definitions, community initiatives and tools to bridge the gaps.

List of dashboards with tools and visualizations

This is a list of the different dashboards created to visualize the gaps and tools to provide points of action to work on them. They do not limit to cultural diversity but include other kinds of diversity based on geography or gender. These are the ones hosted at wdo.wmcloud.org.

Presentation of the Diversity Observatory at the conference OpenSym ’20

Visualizations

Tools

Other diversity tools hosted in other platforms

We also want to provide a short overview on the different other tools and research papers created outside this project that are useful to understand and detect cultural differences between language editions and possibly bridge the gaps or work on other diversity problems like the content gender gap.

Background goals

These are the three main outcome goals the Diversity Observatory is working on to increase the diversity within the Wikimedia projects:

Main outcome goals:

Every Wikipedia language edition ensures a coverage of all the human groups of people that are currently underrepresented (e.g. gender, sexual orientation, religion, ethnic group, etc.).
Every Wikipedia language edition ensures a minimal representation of their own territories’ cultural and geographical context (from geography to biographies, traditions, language, and others) and a minimal coverage of every other language cultural context content.
Every Wikipedian has information about marginalized languages without a Wikipedia, so he can help out their speakers to create one and start representing their cultural context.

In order to reach these goals, we detail some other more specific goals in community engagement and research and development activities of the project.

Community engagement goals:

Every Wikipedia language community is aware and knows about the knowledge inequalities in the entire Wikipedia project.
Every Wikipedia language community is aware of the importance of representing her own culture so the rest of language editions users can import and learn from it.
Every Wikipedia event and community organized contest considers dedicating sections and activities aimed at mitigating the cultural knowledge gaps and derived inequalities.
Every Wikipedian has access to some data on the world's languages without a Wikipedia in order to disseminate the importance and try to engage in creating one.

Research and development goals:

Every Wikipedian has access to some data visualization tools in order to browse the gaps and create new valuable articles.
Every Wikipedian has access to some statistical analysis on the extent of the gaps and understands the priorities in order to bridge or cover them.
Every Wikipedian can get access to information on the needs and barriers that affect every other (potential) Wikipedian.
Every Wikipedian can see which are the underrepresented groups, their defining characteristics and access to content about them.

The Diversity Observatory also aims at raising debates on the different types of diversity and how to work on them. You can always contact us and engage in diversity-related strategic discussions.

Disseminations timeline

These are the latest actions we did in order to raise awareness on the cultural diversity problem in Wikipedia. It is the dissemination of research results, concepts, and tools.

01/12/2021 | Miquel-Ribé, M., Kaltenbrunner, A., & Keefer, J. M. (2021). Bridging LGBT+ Content Gaps Across Wikipedia Language Editions. The International Journal of Information, Diversity, & Inclusion, 5(4), 90-131. https://www.jstor.org/stable/48641981
01/11/2021 | Miquel-Ribé, M., & Laniado, D. (2021). The Wikipedia Diversity Observatory: helping communities to bridge content gaps through interactive interfaces. Journal of Internet Services and Applications, 12(1), 1-25. https://jisajournal.springeropen.com/articles/10.1186/s13174-021-00141-y
14/04/2021 | WikiWorkWorkshop @ The Web Conference 2021 | Academic Paper/Presentation: Miquel-Ribé, M., & Laniado, D. & Kaltenbrunner, A. Local Content Matters: Insights on Wikipedia Editor and Reader Engagement (PDF). non-archival submission.
02/11/2020 | WikidataWorkshop @ ISWC | Academic Paper/Presentation: Miquel-Ribé, M. Diversity in a Language-Independent Wiki: Six Design Requirements and Goals to Embed a Diversity Mindset (PDF). In Proceedings of the 19th International Semantic Web Conference. (Video).
03/10/2020 | Knowledge equity and content diversity section in "Reading Wikipedia in the Classroom. Module 1. Using Wikipedia to foster media and information literacy skills. Teacher's Guide" by Wikimedia Education.
22/09/2020 | Feedback to the project and paper "Knowledge Gaps Taxonomy" leaded by the Wikimedia Research Team (presentation).
27/08/2020 | OpenSym | Academic Paper/Presentation: Miquel-Ribé, M., & Laniado, D. The Wikipedia Diversity Observatory: A Project to Identify and Bridge Content Gaps in Wikipedia (PDF). In Proceedings of the 16th International Symposium on Open Collaboration. (Video).
06/10/2019 | WikiArabia | Talk: The State of Cultural Diversity in Arabic Wikipedia: Insights and Challenges.
17/08/2019 | Wikimania | Poster: Wikipedia Cultural Diversity Dataset: helping editors to enrich cross-language coverage. This poster explained the dataset.
17/08/2019 | Wikimania | Poster: Maturity Levels for Cultural Diversity in Wikipedia Language Communities. This poster explained the different levels.
18/08/2019 | Wikimania | Diversity Talk: Wikipedia Cultural Diversity Observatory (WCDO): Empowering Communities to Bridge the Culture Content Gaps. This presentation explained the current state of the project with its new Missing CCC lists and also alerted of the lack of impact of Wikimania 2018 to bridge the African content gap (pdf slides and video).
18/08/2019 | Wikimania | Language Talk: Minoritized Languages and Missing Languages in Wikipedia: An Opportunity to Increase Cultural Diversity in Wikipedia. This presentation explained that to make Wikipedia more culturally diverse we need more languages (proposed a method to select them) and help minoritized languages to create their content (suggested a method to propose new articles) (pdf slides).
18/08/2019 | Wikimania | Readership Talk: Increasing Wikipedia Readership By Creating Local Content In Language Editions. This presentation explained that local content is vital in order to increase a language edition readership and gave some numerical reasons (pdf slides).
16/08/2019 | Wikimania | Research Talk: Cultural Diversity Funnels: A Metaphor To Study Wikipedia Communities and Knowledge Gaps. This presentation explained that there exist different barriers that stop cultural diversity representation and proposed the metaphor of a funnel in order to depict it.

05/07/2019 | Celtic Knot | Language Talk: Languages Matter to Cultural Diversity: Finding Missing Languages and Bridging the Gaps in Minority Languages”.
12/06/2019 | ICWSM Conference | Academic Paper/Presentation: Miquel-Ribé, M., & Laniado, D. (2019). Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions. Proceedings of the 13th International AAAI Conference on Web and Social Media. ICWSM, Munich June 11-13th (ICWSM. ACM.)
16/05/2019 | Chapter for the book “Wikipedia@20” | “The Sum of Human Knowledge? Not in One Wikipedia Language Edition”.

Get involved

The Observatory does need dissemination in order to reach all the possible Wikimedia events and activities where it could provide some value. If you want to collaborate, get involved. Leave your username and e-mail us at tools.wcdo@tools.wmflabs.org. If you have any question, you can also message marcmiquel or other team members.

Getting involved can be useful in order to find a meeting point or a place to start working on diversity. In case you want to code some extra visualizations, you can find the project's code here: github page.