Grants:Programs/Wikimedia Research Fund/Wikidata Gender Diversity (WiGeDi)

From Meta, a Wikimedia project coordination wiki
Wikidata Gender Diversity (WiGeDi)
start and end dates19.09.2022 – 31.08.2023
budget (USD)40,000-50,000 USD
applicant(s)• Mushroom



Applicant's Wikimedia username. If one is not provided, then the applicant's name will be provided for community review.


Project title

Wikidata Gender Diversity (WiGeDi)

Entity Receiving Funds

Provide the name of the individual or organization that would receive the funds.

Daniele Metilli

Research proposal[edit]


Description of the proposed project, including aims and approach. Be sure to clearly state the problem, why it is important, why previous approaches (if any) have been insufficient, and your methods to address it.

The Wikidata Gender Diversity (WiGeDi) project aims to investigate the issue of gender diversity in the Wikidata knowledge base, focusing in particular on the marginalized identities of trans, non-binary, and gender non-conforming people.

All previous studies about this subject in Wikimedia projects have focused on the gender gap, defined as the gap in the representation of women versus that of men. Some of these studies (e.g. the ones by Konieczny and Klein) have acknowledged the existence of trans and non-binary people, but no research has looked specifically at how marginalized gender identities are represented, or how accurate and complete the current representation is.

Our initial study about this subject (Metilli D. & Paolini C., Non-binary gender representation in Wikidata, to be published in Ethics in Linked Data, Litwin Books, 2022; publication draft:; presentation at WikidataCon: shows that gender modeling in Wikidata has a very complex history, from which important lessons can be learned about the representation of marginalized gender identities has been approached by the community, and which steps remain to be taken to make Wikidata a more inclusive project.

The WiGeDi project aims to center marginalized gender identities by performing a broad analysis of gender diversity in Wikidata, from three different — and complementary — perspectives:

  • the modeling question, looking at how the Wikidata ontology has evolved to support a more inclusive representation of gender, e.g., by updating the properties that directly or indirectly express gender; we aim to analyze the Wikidata ontology to identify representational issues and potential areas of improvement;
  • the data question, computing statistics about non-binary gender representation in the knowledge base, and analyzing its effectiveness and accuracy from a quantitative point of view;
  • the community question, looking at how the Wikidata community has handled the evolution towards a more inclusive gender representation, looking in particular at user discussions about the topic.

Our project aims to answer all these questions by publishing a web application containing a real-time dashboard about gender diversity in Wikidata, an annotated timeline of gender modeling since the launch of Wikidata in 2012, and a browsable repository of gender-related user discussions (see section Dissemination).


Approximate amount requested in USD.


Budget Description

Briefly describe what you expect to spend money on (specific budgets and details are not necessary at this time).

We anticipate having to pay a part-time salary for 2 people who will help design and build the web application of the project, annotate the timeline about gender modeling, and perform data analysis on the corpus of user discussions. The provisional budget is as follows:
  • Salary or stipend 24,000
  • Benefits 3,500
  • Equipment 5,000
  • Software 1,000
  • Open access publishing costs 3,000
  • Institutionaloverhead 2,500
  • Conference and travel expenses 4,000
  • Serverhosting 500

Total, $43,500


Address the impact and relevance to the Wikimedia projects, including the degree to which the research will address the 2030 Wikimedia Strategic Direction and/or support the work of Wikimedia user groups, affiliates, and developer communities. If your work relates to knowledge gaps, please directly relate it to the knowledge gaps taxonomy.

WiGeDi will be the first research project to focus on the representation of marginalized gender identities in Wikidata.

The project will address the Wikimedia Strategic Direction by fostering inclusion of marginalized gender identities (goal #3), improving user experience through the monitoring of project activities (goal #10) and the identification of potential improvements in gender modeling (goal #2), and identifying non-inclusive decision processes (goal #5). We plan to coordinate with existing related projects (e.g. WikiProject LGBT, Art+Feminism). Our research focuses on the Gender content representation gap, but also involves other content gaps (e.g. Language, Structured data) and the related contributor/reader representation gaps.


Plans for dissemination.

We plan to publish several open-access research papers, and a web application ( containing:
  • a dashboard on the current status of gender diversity in Wikidata, centered on marginalized gender identities
  • an annotated timeline of gender modeling in Wikidata, contextualized with real-world events
  • a browsable repository of gender-related Wikidata user discussions

These outcomes will enable further studies and comparisons with other knowledge bases and Wikimedia projects.

Past Contributions[edit]

Prior contributions to related academic and/or research projects and/or the Wikimedia and free culture communities. If you do not have prior experience, please explain your planned contributions.

Daniele Metilli is a postdoc fellow at ISTI-CNR and a former administrator of the English Wikipedia and Wikidata. They have over 25 publications (, often involving Wikimedia projects.

Chiara Paolini is a PhD candidate in Linguistics at KU Leuven. With Daniele, she has been studying gender diversity in Wikidata (

Marta Fioravanti and Beatrice Melis are Master’s students in Digital Humanities at the University of Pisa. Marta's research interests involve the extraction and display of information from data and its transformation into knowledge. Beatrice's research interests concern the use of computational methods to preserve human cultural and emotional heritage (see e.g.

I agree to license the information I entered in this form excluding the pronouns, countries of residence, and email addresses under the terms of Creative Commons Attribution-ShareAlike 4.0. I understand that the decision to fund this Research Fund application, the application itself along with all the information entered by my in this form excluding the pronouns, country of residences, and email addresses of the personnel will be published on Wikimedia Foundation Funds pages on Meta-Wiki and will be made available to the public in perpetuity. To make the results of your research actionable and reusable by the Wikimedia volunteer communities, affiliates and Foundation, I agree that any output of my research will comply with the WMF Open Access Policy. I also confirm that I have read the privacy statement and agree to abide by the WMF Friendly Space Policy and Universal Code of Conduct.