Research:Wikidata Gender Diversity

From Meta, a Wikimedia project coordination wiki
Created
00:11, 27 September 2022 (UTC)
Collaborators
Marta Fioravanti
Beatrice Melis
Duration:  2022-09 – 2024-03
Wikidata, gender, modeling, data, community

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

Wikidata Gender Diversity (WiGeDi) will study gender diversity in Wikidata, focusing in particular on marginalized gender identities. It will examine how the current Wikidata ontology model represents gender, and the extent to which this representation is fair and inclusive. It will analyse the data stored in the knowledge base to gather insights and identify possible gaps. Finally, it will look at how the community has handled the move towards the inclusion of a wider spectrum of gender identities. A web application will be created to share the results publicly in a user-friendly way.

Description[edit]

The Wikidata Gender Diversity (WiGeDi) project aims to investigate the issue of gender diversity in the Wikidata knowledge base, focusing in particular on the marginalized identities of trans, non-binary, and gender non-conforming people. All previous studies about this subject in Wikimedia projects have focused on the gender gap, defined as the gap in the representation of women versus that of men. Some of these studies (e.g. the ones by Konieczny and Klein) have acknowledged the existence of trans and non-binary people, but no research has looked specifically at how marginalized gender identities are represented, or how accurate and complete the current representation is.

Our initial study about this subject (Metilli D. & Paolini C., Non-binary gender representation in Wikidata, to be published in Ethics in Linked Data, Litwin Books, in press; publication draft: https://drive.google.com/file/d/1G_iW20w6iscbHdJr__-v4jzhyOw4_H-4/view; presentation at WikidataCon: https://pretalx.com/wdcon21/talk/7TRCWD/) shows that gender modeling in Wikidata has a very complex history, from which important lessons can be learned about the representation of marginalized gender identities has been approached by the community, and which steps remain to be taken to make Wikidata a more inclusive project.

The WiGeDi project aims to center marginalized gender identities by performing a broad analysis of gender diversity in Wikidata, from three different — and complementary — perspectives:

  • the modeling question, looking at how the Wikidata ontology has evolved to support a more inclusive representation of gender, e.g., by updating the properties that directly or indirectly express gender; we aim to analyze the Wikidata ontology to identify representational issues and potential areas of improvement;
  • the data question, computing statistics about non-binary gender representation in the knowledge base, and analyzing its effectiveness and accuracy from a quantitative point of view;
  • the community question, looking at how the Wikidata community has handled the evolution towards a more inclusive gender representation, looking in particular at user discussions about the topic.

Our project aims to answer all these questions by publishing a web application containing a real-time dashboard about gender diversity in Wikidata, an annotated timeline of gender modeling since the launch of Wikidata in 2012, and a corpus of gender-related user discussions.

News and updates[edit]

Meet us at Queer Data Days!
March 15–16
  • 15 March 2024: Queer Data Days project event —  NEW! 
  • 13 March 2024: Abstract accepted at Data Power Conference 2024 —  NEW! 
  • 6 March 2024: Panel accepted at Social Media & Society 2024 —  NEW! 
  • 5 March 2024: Abstract accepted at DH Conference 2024 —  NEW! 
  • 24 January 2024: Invited presentation at Queering Big Data, Algorithms and AI, Manchester
  • 8 January 2024: Paper proposal accepted for Bulletin of Applied Transgender Studies special issue
  • 1 January 2024: Project extended until 31 March 2024
  • 2 December 2024: Data Modelling Days presentation
  • 6 November 2023: Ethics in Linked Data book chapter officially published (preprint available here)
  • 29 October 2023: WikidataCon presentation
  • 1 September 2023: Project extended until 31 December 2023
  • 18 August 2023: Wikimania presentation
  • 12 July 2023: LD4 presentation
  • 4 July 2023: IGALA presentation
  • 21 June 2023: Wikimedia Research Showcase presentation
  • 20 June 2023: Paper proposal accepted for Communication, Culture & Critique special issue (currently under review)
  • 20 June 2023: Abstract and panel accepted at LD4 Conference
  • 19 June 2023: Panel accepted at Wikimania 2023
  • 13 June 2023: Challenging the Binary presentation
  • 17 April 2023: Abstract accepted at Wiki Workshop 2023
  • 10 April 2023: Abstract accepted at Queering Wikipedia 2023
  • 4 April 2023: Abstract accepted at Challenging the Binary Conference
  • 30 March 2023: Abstract accepted at Nordic STS Conference
  • 23–25 March 2023: Mid-project meeting in London
  • 9 March 2023: Abstract accepted at IGALA12 Conference
  • 2 March 2023: Abstract accepted at Data Justice Conference
  • 27 February 2023: Paper proposal accepted for Internet Histories special issue (currently under review)
  • January–March 2023: Early design and development phase
  • January 2023: Official start of contracts
  • October–December 2022: Planning, setup of contracts, data protection & ethics registration
  • 19 September 2022: Official project start date

Methods[edit]

Model[edit]

The Wikidata ontology model is analyzed with the tools of the Semantic Web. The goal is to analyse the model and its evolution over time in a critical way, and visualize the model to enable further analysis. The history of the model will be displayed in the Wikidata Gender Timeline.

Data[edit]

The biographical data is analyzed following a critical data studies approach. The data will be displayed through the Wikidata Gender Dashboard, which will offer statistics about gender identities in Wikidata, focusing in particular on marginalized gender identities.

Community[edit]

The community discussions are analyzed using computational linguistics techniques such as topic modeling and critical discourse analysis. The anonymized discussions will be included in the Wikidata Gender Talk corpus. The corpus will be made available on request for other researchers to study.

Timeline[edit]

The following is the original Gantt chart for the project, amended after approval in September 2022. The chart will be updated to reflect the actual progress.

Policy, Ethics and Human Subjects Research[edit]

This section will be completed after we receive final ethics approval from University College London.

Results[edit]

Wikidata Gender Timeline[edit]

The Wikidata Gender Timeline displays the history of the Wikidata gender model, featuring significant events, changes in the model, and important discussions among the users. Development of the timeline is at an advanced stage. A prototype will be shown at the conferences listed below, starting in May 2023.

Wikidata Gender Dashboard[edit]

The Wikidata Gender Dashboard displays statistics about gender identities in Wikidata. The dashboard will offer several visualizations allowing the user to explore the data from different perspectives. A plan of features has been drawn up and development of the dashboard has begun.

Wikidata Gender Talk[edit]

The Wikidata Gender Talk corpus of user discussions allows computational linguistics analyses that allow us to understand how the community's perception and understanding of gender has changed over time. The anonymized corpus will be made available on request for researchers.

Project website[edit]

Development of the website that will host the above has begun, with an initial release set for May 2023. The website will contain host general information about the project, the team, and our publications and presentations.

Publications and presentations[edit]

Completed[edit]

  • Metilli D. & Paolini C. (December 2023) "Non-binary gender representation in Wikidata". In: Provo A., Burlingame, K. & Watson, B.M. Ethics in Linked Data, Litwin Books [1]
  • Metilli D. & Paolini C. (July 2023) "Ethics in Linked Data Book Panel". Panel at the LD4 Conference, online [2]
  • Metilli D., Melis B., Fioravanti M. & Paolini C. (July 2023) "How do you model my gender? Studying gender representation in the Wikidata knowledge base". Presentation at the LD4 Conference, online [3]
  • Paolini C., Metilli D., Melis B. & Fioravanti M. (July 2023). "Who decides my gender? A corpus-based analysis of Wikidata’s community discussions around trans and non-binary identities". Presentation at the Biennial Conference of the International Gender and Language Association, Brisbane, Australia [4]
  • Metilli D., Melis B., Fioravanti M. & Paolini C. (June 2023). "Early results from the Wikidata Gender Diversity project" (provisional title). Presentation at the Wikimedia Research Showcase, online.
  • Metilli D., Melis B., Fioravanti M. & Paolini C. (June 2023). "Can you model my gender? How the Wikidata community developed a shared ontology of gender". Presentation at the Data Justice Conference, Cardiff, United Kingdom [5]
  • Paolini C., Metilli D., Melis B. & Fioravanti M. (June 2023). "Are you discussing my gender? A corpus-based analysis of Wikidata’s community discussions around non-binary identities". Presentation at the Challenging the Binary Conference, London, United Kingdom [6]
  • Metilli D., Melis B., Paolini C. & Fioravanti M. (June 2023). "Who cares about my gender? Analysing practices of data care and repair in Wikidata". Presentation at the Nordic Science and Technology Studies Conference, Oslo, Norway [7]
  • Metilli D., Melis B., Fioravanti M. & Paolini C. (May 2023). "Queering Wikidata: Early insights from the Wikidata Gender Diversity Project". Presentation at the Queering Wikipedia Conference, online.
  • Metilli D., Melis B., Paolini C. & Fioravanti M. (May 2023) "How does Wikidata shape gender identities? Initial findings and developments from the WiGeDi project". Presentation at Wiki Workshop 2023, online [8]
  • Metilli D. & Paolini C. (October 2021) "Non-binary gender identities in Wikidata". Presentation at WikidataCon 2021, online.

Accepted[edit]

  • Metilli D., Paolini C., Melis B. & Fioravanti M. (August 2023) "How does Wikidata model our gender? Findings from the Wikidata Gender Diversity project". In: "Modelling, mapping and bridging knowledge gaps in gender and diversity". Panel at Wikimania 2023, online [9]
  • Melis B., Paolini C., Fioravanti M. & Metilli D. (expected 2024). "What does it mean to be queer in Wikidata? Practices of gender representation within a transnational online community". Paper proposal accepted for publication in Communication, Culture & Critique, Special Issue on Transnational Queer Cultures and Digital Media [10]
  • Melis B., Fioravanti M., Paolini C. & Metilli D. (expected 2024). "How have you modelled my gender? Reconstructing the history of gender in Wikidata". Paper proposal accepted for publication in Internet Histories, Special Issue on Gender and Internet/Web History [11] [dead link]

Submitted[edit]

  • Metilli D., Melis B., Fioravanti M. & Paolini C. (October 2023). "Are you modeling my gender? Results from the Wikidata Gender Diversity project". Presentation at WikidataCon 2023, online [12]
  • Metilli D. & Paolini C. (November 2023) "Non-binary gender representation in Wikidata" [Previously Published Work Track]. Presentation at Wikidata Workshop, co-located with International Semantic Web Conference, Athens, Greece [13]

See also[edit]

External links[edit]