Grants:Programs/Wikimedia Research Fund/Reducing the gender gap in AfD discussions: a semi-supervised learning approach

From Meta, a Wikimedia project coordination wiki
statusFunded
Reducing the gender gap in AfD discussions: a semi-supervised learning approach
start and end datesJuly 2023 - July 2024
budget (USD)50,000 USD
fiscal year2022-23
applicant(s)• Giovanni Luca Ciampaglia and Khandaker Tasnim Huq

Overview[edit]

Applicant(s)

Giovanni Luca Ciampaglia and Khandaker Tasnim Huq

Affiliation or grant type

University of Maryland

Author(s)

Giovanni Luca Ciampaglia and Khandaker Tasnim Huq

Wikimedia username(s)

Giovanni Luca Ciampaglia User:Junkie.dolphin

Khandaker Tasnim Huq User:Swad_Tasnim

Project title

Reducing the gender gap in AfD discussions: a semi-supervised learning approach

Research proposal[edit]

Description[edit]

Description of the proposed project, including aims and approach. Be sure to clearly state the problem, why it is important, why previous approaches (if any) have been insufficient, and your methods to address it.

A well-known bias in Wikipedia is gender bias: only 18% of the content is about women across all Wikimedia projects, and on the English Wikipedia, only 19% of the biographies are about women. There are many theoretical and empirical contributions on the systematic factors behind the gender gap in Wikipedia. Prior work using community surveys found that women are only a small fraction of the editors, but there could be editorial processes that perpetuate this gap even in the absence of such an imbalance. For example, Tripodi (2021) found that biographies of women are caught in Articles for Deletion (AfD) discussions at a higher rate than those of men. What is the cause for this disproportionate treatment? To study gender differences in AfD discussions (and similar consensus-based processes), researchers need first to determine whether an article is a biography or not, and then of which gender. This is still largely a manual process, and is especially hard for retrospective analyses, since the content of past deleted articles is generally not accessible. In this project, we propose to address Wikipedia’s gender gap in two ways. First, we propose to address the need for retrospective data by developing a gender detection service for AfD discussions. To do so, we propose to apply natural language processing (NLP) techniques to the text of the discussion itself, which is available even if an article has been deleted. Specifically, we will develop a machine learning (ML) pipeline to break the problem into multiple steps: an initial stage that determines whether an article is a biography or not, and a second step to perform gender detection for biographical content only. To train our model we will use gender labels from Wikidata. Since this ground truth is incomplete, we will use semi-supervision to account for missing labels. Our second contribution will be a retrospective analysis of AfD debates of biographical articles, in which we will compare debates about biographies of men and women. In prior work, we have studied the role of editors with an “inclusionist” or “deletionist” stance in AfD debates, and found that the number of deletionists in a debate is a strong predictor of its outcome. Here, thanks to our ML service, we will investigate the effect of gender in the group composition and stance of AfD debates. One hypothesis is that deletionists may be over-represented in AfD debates about women, and that these debates may be shorter and less developed.

Personnel[edit]

N/A

Budget[edit]

Approximate amount requested in USD.

50,000 USD

Budget Description

Briefly describe what you expect to spend money on (specific budgets and details are not necessary at this time).

Salary + Fringe benefits:

- Summer salary (2 weeks) support for PI Ciampaglia

- Support of doctoral student Khandaker Tasnim Huq (9 months)


$43,478

Overhead (15%)


$6,522

Total


$50,000

Impact[edit]

Address the impact and relevance to the Wikimedia projects, including the degree to which the research will address the 2030 Wikimedia Strategic Direction and/or support the work of Wikimedia user groups, affiliates, and developer communities. If your work relates to knowledge gaps, please directly relate it to the knowledge gaps taxonomy.

This project could help researchers and editors have a better understanding of AfD debates on female biographies. Our ML gender detection service could enable researchers and community members to retrospectively identify biographies of women that have been nominated for deletion, as well as monitor new nominations. For example, WikiProjects devoted to closing the gender gap, like “Women in Red”, may benefit from the ability to determine when an existing biography of a woman (not necessarily created from a red link) is nominated for deletion. Our ML service could form the basis for additional tools in this sense. As a proof of concept, we will build a dashboard keeping track of AfDs of biographies with relevant stats broken down by gender.

Dissemination[edit]

Plans for dissemination.

We plan to publish 1-2 articles for the project described above. We will target venues such as CHI, CSCW, ICWSM, Nat. Comm, Nat. Hum. Beh., and Sci. Adv. We also have a track record of dissemination in the media (our research has been covered in the WSJ, NBC, NPR, SciAm, The Conversation, etc.). Finally, we will release all code from this research under an opensource license on Github or GitLab.

Past Contributions[edit]

Prior contributions to related academic and/or research projects and/or the Wikimedia and free culture communities. If you do not have prior experience, please explain your planned contributions.

Our team is based at the iSchool at University of Maryland. Giovanni Luca Ciampaglia is an expert in social computing with a decade-long history of research on Wikipedia and peer production communities. He has published articles on AfDs, MoodBar, editor retention, article creation, and Wikipedia hoaxes. He worked at the WMF as a summer intern in 2011 and later as a research analyst in 2012. Khandaker Tasnim Huq is a first-year PhD student in Information Science. Her research interests are algorithmic bias and gender representation in online collaboration platforms. Her research has been presented at the WikiWorkshop 2021. Our team has not been funded by the Wikimedia Research Fund before.


I agree to license the information I entered in this form excluding the pronouns, countries of residence, and email addresses under the terms of Creative Commons Attribution-ShareAlike 4.0. I understand that the decision to fund this Research Fund application, the application itself along with all the information entered by my in this form excluding the pronouns, country of residences, and email addresses of the personnel will be published on Wikimedia Foundation Funds pages on Meta-Wiki and will be made available to the public in perpetuity. To make the results of your research actionable and reusable by the Wikimedia volunteer communities, affiliates and Foundation, I agree that any output of my research will comply with the WMF Open Access Policy. I also confirm that I have read the privacy statement and agree to abide by the WMF Friendly Space Policy and Universal Code of Conduct.

Yes