Jump to content

Research:Reducing the gender gap in AfD discussions: an evidence scoring approach

From Meta, a Wikimedia project coordination wiki
Created
1st of May, 2024
Duration:  2023-June – 2025-July
Article for Deletion, Gender Bias, Wikidata

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


Introduction

[edit]

A well-known form of bias in Wikipedia is gender bias: across all Wikimedia projects worldwide, only 18% of the content is about women, and on the English Wikipedia, only 19% of the biographies are about women [1].

There are many theoretical and empirical contributions on the systemic factors behind this gap in content representation [2]. For example, Tripodi [3] found that biographies of women are nominated for Articles for Deletion (AfD) discussions at a higher rate than those of men. This is puzzling, especially since earlier work on gender asymmetries in Wikipedia has found that the encyclopedia tends to cover women that are more notable than men [4]. What is the cause for this differential treatment between women and men?

Within Wikipedia, AfDs offer an open forum for editors to decide whether content meets the criteria for inclusion of the project. The goal of an AfD discussion -- like that of other deliberative processes in Wikipedia -- is to reach a consensus about a specific editorial outcome. Owing to their deliberative nature and the fact that discussants are self-selected, their outcome typically reflects the editorial stances toward content inclusion (often referred to as “inclusionism” and “deletionism”) of those taking part in the discussion [5].

In recent work, we have studied the role of editors with an “inclusionist” and “deletionist” stance in AfD debates, and found that the number of deletionists in a debate is a strong predictor of its outcome [6]. The dynamics of the “polarized” groups in AfD discussions about biographies could perhaps explain the gender gap, especially when they are engaged in discussions about the notability of biography subjects.

Our Objectives

[edit]

In this study, we are focusing on two major aspects of the AfD process:

  1. How quickly biographies of men and women are nominated for deletion and what are their chances of “survival” from these events;
  2. The way editors assess the notability for biographies of men and women.

Survival from Nomination for Deletion

[edit]

It is unknown how quickly biographies get nominated for deletion. Estimating the likelihood of deletion over the lifespan of an article holds significance for various reasons. First, articles mature over time due to collaborative nature of Wikipedia editing, which typically enriches their content and references. Notability assessments are an integral part of this editorial process, but articles may face nomination for deletion at any stage of this development. Thus, because articles are continuously developed, early nominations shorten the window for further improvement.

Second, in recent years community interventions like “Women in Red” (WiR) have worked to increase the share of biographies on women. These projects often rely on edit drives to develop new articles about notable women. But articles typically receive higher scrutiny, and thus are at greater risk of deletion, when they are new, potentially hampering the effect of community interventions like WiR.

Thus, our objective is to investigate: how promptly are biographies nominated for deletion in the AfD process? Specifically, we aim to determine if the biographies of women face a disadvantage in terms of a shorter time between creation and nomination for deletion. We aim to address this focal question while accounting for factors that, in addition to gender, could influence considerations for deletion.

One such factor is the biographical status of individuals: notability debates are more challenging for living people due to concerns about the reliability of their entries and the need to prevent harm to their reputation. This may be especially relevant for the biographies of women. Societal awareness of gender equality has improved over the past century, leading to a higher representation of women among notable figures in encyclopedias. This could mean that women are proportionately more featured in articles about living people compared to the overall ratio of women in Wikipedia biographies.

Another factor is whether the person nominated is a historical figure. Wikipedia is affected by a well-known bias for recent events, which may lead to fewer historical figures being represented on the platform. This raises concerns about the susceptibility of female historical figures to deletion nominations in AfD.

Finally, Wikipedia has evolved significantly over the course of its history as a collaborative project, expanding its coverage and establishing stringent rules to maintain content quality and prevent vandalism. The proportion of biographies of women is still low despite efforts within the community (like the aforementioned “Women in Red” project) to reduce the gap in gender coverage. The AfD process may play a role in limiting the effectiveness of these interventions since within these deliberations it is often debated whether gender should be taken into account when gauging the notability of a subject. If this is the case, then we should observe a higher likelihood of nomination for biographies of women created later during the history of Wikipedia.

To address these questions, we employ a computational method to explore potential gender disparities in the timing of nomination for deletion, covering the entire history of the AfD process from January 15, 2001, to November 3, 2023. This involves the use of statistical models in Survival analysis, allowing us to analyze biographies based on their ages and estimate the likelihood of survival from nomination for deletion over time. Moreover, We contribute empirically to understanding nomination for deletion patterns by examining various factors, including the characteristics of biography subjects such as gender, living status, and historical status, as well as the chronological ages of Wikipedia itself. These analyses provide insights into the dynamics of deletion nominations and their associations with different factors.

Notability Assessment Identification

[edit]

The second objective of this study is understanding whether biographies of women are assessed differently than those of men. In the case of AfD discussions, the editorial outcomes often hinge upon the notability (i.e., level of external attention) of the subject whose entry is under discussion. To gauge notability, discussants are expected to cite reliable evidence from independent sources outside of Wikipedia. Thus, when it comes to deliberating about the inclusion of biographical content on Wikipedia, gender bias may not only arise from the self-selection patterns of discussants to individual deliberations; it may also stem from biases in how individual discussants assess the notability of the subjects under discussion.

In summary, gender may act as a catalyst for the deletion of content about women. If this is the case, we would expect deletionists to be over-represented in AfD debates about biographies of women, and these debates to be shorter and less developed than those about biographies of men, with less evidence cited in support of their notability, and from sources of lower reliability and quality. Furthermore, we may expect content inclusion decisions in AfD to be a possible contributing factor for the observed gender gap in content representation on Wikipedia as a whole.

Thus, our research questions are the following:

  • RQ1. Are decisions about inclusion of biographies of women deliberated in a different way from those of men? In particular, we are interested in comparing deliberations along both quantitative (e.g. group size, length of the discussion, etc.), and qualitative dimensions (e.g., degree of development, strength of consensus);
  • RQ2. Can differential outcomes between deliberations on (the biographies of) men and women be ascribed to different stances toward content inclusion? In particular, are AfD deliberations on biographies of women targeted by deletionists more than those of men?
  • RQ3. What is the role of notability assessments in determining differential outcomes between men and women in the AfD process? Can differences in notability assessments explain known gender-based asymmetries?

Because we are interested in characterizing the role of gender in determining deliberation outcomes of interest (e.g. deletion), we need to be able to match AfD discussions of different genders so that other contributing factors that could determine the same outcomes are not responsible for the observed patterns. In particular, besides describing key deliberation metrics, we want to compare AfDs of men and women with a similar stance composition among discussants, and with a comparable amount of external evidence for notability assessments.

To do so, we propose to develop machine learning methods for matching AfD discussions based on the type and amount of external evidence available to discussants. This evidence-scoring approach would form the basis for the development of a AfD matching tool that discussants could use to review the outcomes of previous discussions, and check whether they are consistent with the current one. In doing so, we want to study the information foraging practice of AfD participants and understand how the language-specific affordances of the AfD process affect notability assessments. For example, discussants are provided with direct links to external sources (e.g. Google, NYT, JSTOR). We will thus conduct a content analysis of a sample of AfDs to determine the most frequent external sources used in each community. Of course the above questions require an operational definition of the gender of the subjects of AfD discussions. Here, we choose to leverage Wikidata as the ground truth about gender in AfD discussions.

Of course the gender gap in content representation is not limited to the English Wikipedia only. It is reasonable to assume that the deliberative biases we seek to understand may act in addition to, or as a reflection of, any pre-existing gender bias of the broader cultural and social context in which these discussions take place. Thus, to make sure our results are not dependent on the particular community under study, we propose to explore the above research questions in two smaller Wikipedia communities in addition to the English one: the Italian and Bengali Wikipedias. The choice of these languages is dictated both by practical considerations (our team is fluent in both languages), and by substantive reasons: they are both large projects (1,000,000+ articles for Italian, while the Bengali recently crossed the 100,000+ threshold [8]) at different stages of development.

Methodology

[edit]

Data Collection

[edit]

We collected the following data:

  • The list of existing biographies in English Wikipedia and the creation dates, published from 2001 to 2023, using QUARRY and MediaWiki API
  • The list of biographies nominated in the AfD and the date of nomination, using QUARRY
  • The creation dates of nominated biographies that got deleted from “Archive” table, using QUARRY
  • The list of biographies about "Living people" , using PetScan.
  • Gender, date of birth, and date of death, using Wikidata SPARQL and Wikidata API.
  • Conversation logs of AfD, Action Recommendations, Rationals, Outcomes, Contributors and Dates.

Methods

[edit]

Survival Analysis

[edit]

Our goal is to estimate the probability of “survival” from nomination in AfD as a function of article age (i.e., time since its creation). We use the Kaplan-Meier estimator (Kaplan and Meier, 1958) to estimate the probability of survival from nomination. We also employed the Cox proportional hazards model (Cox, 1972) to assess the risk of nomination considering the following three variables:

  • a) Gender – if the subject is a woman (1) or man (0);
  • b) Status – a variable with three levels:
  1. Historical – if the subject was born before 1907 (cutoff estimated as the year of birth of the verified oldest living person at study time),
  2. Contemporary Alive – if the subject was alive at the time of nomination/analysis, or
  3. Contemporary Dead – if not alive;
  • c) Wikipedia age – the age of Wikipedia at the time of creation of the article.

Open Coding

[edit]

We will perform a content analysis of biographical AfDs to understand how discussants evaluate external sources. Using an open-coding approach, we will train human annotators to identify notability assessments made by discussants within the AfD. These assessments will typically include citations to external sources, which will allow us to identify what sources AfD discussants use in practice.

Propensity Score

[edit]

Based on the type and amount of external evidence available in the discussions, we will define metrics that operationalize the concept of "Notability". We will employ a propensity score approach to estimate the notability of the subjects in the AfD discussions. We will apply either a semi-supervised model or a Causal inference model to estimate the propensity score.

Matching Algorithm

[edit]

Our next step will be an analysis of AfD debates of biographical articles, in which we will compare debates about biographies by gender. To match AfDs based on the strength and type of external evidence, we will experiment with propensity score matching, though we may consider alternative matching methods, like Coarsened Exact Matching. Thanks to the scoring method, we will investigate the effect of the gender of the AfD subject on group composition and stance of the debates, and on how editors assess external sources to gauge the notability of the subject of a biographical article. We will pre-register our study on OSF or a similar repository.

Participatory Design

[edit]

One risk stems from the possibility that our matching tool may be misused by AfD discussants in a way that perpetuates existing bias. To mitigate this risk our tool will not provide any recommendation on the outcome to take in the AfD under consideration and will not provide a “default” match or rely on default external sources. Instead, users will be provided with the opportunity to customize the matching criteria and the sources of external evidence used to score prior AfD discussions. Therefore, we will post invitations to engage in participatory design sessions with relevant WikiProjects (e.g., Women in Red) prior to committing to a particular design for matching.

Cross-Cultural Study

[edit]

Finally, to achieve our goal of a cross-cultural study of AfD discussions, we will develop a suite of multi-lingual tools for AfD debates. Our initial goal will be to support three languages of interest (English, Italian, and Bengali) including a parser for AfD discussions, and a set of scrapers of core external sources used in each language community.

Timeline

[edit]
Table 1: Estimated Timeline of Overall Study
Tasks Start Date End Date Status
Survival Analysis July, 2023 March 2024 Completed
Open Coding July, 2024 September, 2024 Not started
Propensity Score September, 2024 October, 2024 Not Started
Matching Algorithm September, 2024 October, 2024 Not Started
Participatory Design November, 2024 January, 2025 Not Started
Cross-Cultural Study July, 2024 March, 2025 Not Started

Policy, Ethics and Human Subjects Research

[edit]

This research presents minimal risk for AfD discussants since it will rely in its entirety on publicly available data. We will maintain IRB oversight throughout the duration of the project. There are also potential sources of societal risk associated with our tools. One possible risk is associated with reliance on Wikidata for ground truth on gender of subjects of AfD discussions, and in particular the risk of misgendering individuals by relying on labels that may be erroneous or vandalized. To mitigate this risk, which is admittedly low, we will make sure to periodically refresh the corpus of labels and manually review any change we see in it.

Community impact plan

[edit]

This project could help researchers and Wikipedia contributors gain a better understanding of AfD debates on biographies of women and other genders not typically considered when dealing with the gender gap in content representation. Our AfD discussion matching service could enable researchers and contributors to compare the potential outcome of ongoing discussions with that of similar discussions based on the availability and type of external evidence. We envision such a tool could promote consistency in outcomes across debates regardless of gender. For example, WikiProjects devoted to closing the gender gap, like “Women in Red”, may benefit from the ability to identify gaps in outcomes between discussions of biographies of women and men.

As a proof of concept, we will build a dashboard keeping track of AfDs of biographies with relevant stats broken down by gender. We will take a number of steps to maximize the chances of adoption of our tools. We will list our project on the Wikimedia Research Index and post regular updates there. We will deploy our tools on the Wikimedia cloud services and make the source available on Github. We will advertise our research on relevant WikiProjects related to gender and inclusion, like “Women in Red”, and “LGBT Studies” and invite their members to attend participatory design sessions. We will submit a proposal to run a demo or workshop on our tools at Wikimania 2025. Finally, we will submit a pitch for an article on The Conversation or similar outlet for outreach to the broader public.

Results

[edit]

Data

[edit]

Our initial analysis was on biographies published in English Wikipedia.

  • 1.9 million biographies.
  • 84 thousands got nominated
  • Dates of birth range from 7999 B.C. to 2022 A.D
Table 2: Summary of the Dataset
Gender Published Articles Nominated
Women 390,962 (19.79%) 21,473 (25.37%)
Men 1,584,817 (80.13%) 62,893 (74.32%)
Other 1,805 (0.09%) 248 (0.29%)
Total 1,977,588 84,614

Survival Analysis

[edit]

In Figure 1, the Kaplan-Meier curves for men and women in biographies show early drops in survival curves, resembling an "infant mortality" pattern. After the initial drop, the trajectories of these two lines exhibit different slopes. Notably, the curve for women in biographies drops further down than that of men, suggesting that biographies of women receive nominations for deletion more rapidly than their male counterparts.

Figure 1: The probability of survival of the biographies from nomination for deletion. The shaded area corresponds to the 95% confidence intervals.

In Figure 2(a), the hazards model indicates that gender strongly influences the risk of deletion consideration in biographies, with biographies of women nominated 34% faster than those of men. We examined how gender influences premature deletion nomination risk. Figure 2 (b) shows the result from fitting the hazards model with interaction terms between gender and status. The interaction analysis involving gender yielded statistically significant improvements over the baseline model. The interaction with ‘Historical’ (Figure 2 (b)) showed a positive coefficient, indicating historical women face a deletion disadvantage compared to men. In both plots, error bars represent robust standard errors and are all smaller than the data points.

Figure 2(a): Results of Cox proportional hazards models on the full dataset
Figure 2(b):The model with interaction terms between gender and status.

Moreover, Figures 3 illustrate the marginal effects of gender and status on the risk of nomination before and after the interaction, suggesting that living women face a deletion disadvantage compared to other groups. Figure 3 (Right) also shows that historical women have higher risk of nomination than contemporary deceased men

Figure 3: Marginal effects of gender and status on the full dataset. Left: Baseline model; Right: the model with interaction terms between gender and status.

Finally, in a retrospective analysis shown in Figure 4, we observe how factors such as gender, Wikipedia age, and status evolve over the history of Wikipedia. Early on, the influence of gender on nomination risk was negative, but it steadily increased until 2006, and remained consistently positive thereafter. Also, both historical and contemporary deceased women are at a disadvantage from the very beginning and are still at risk.

Figure 4: Retrospective survival analysis. Each data point corresponds to the coefficients of the Cox proportional hazards model, fitted only on the data of articles created up to that year. Articles that were nominated after the observation window correspond to censored observations. The error bars represent robust standard errors. The black dash-dotted line corresponds to a coefficient value of zero

Resources

[edit]

We presented the results of our first objective (survival analysis of nomination) at the WikiWorkshop 2024. The video of the pre-recorded presentation is available on YouTube, and the slides are available on the workshop's website.

References

[edit]
  1. Konieczny, P., & Klein, M. (2018). Gender gap through time and space: A journey through Wikipedia biographies via the Wikidata Human Gender Indicator. New Media & Society, 20(12), 4608-4633. https://doi.org/10.1177/1461444818779080
  2. Ferran-Ferrer, N., Boté-Vericad, J.-J., & Minguillón, J. (2023). Wikipedia gender gap: a scoping review. Information Professional , 32 (6). https://doi.org/10.3145/epi.2023.nov.17
  3. Tripodi, F. (2023). Ms. Categorized: Gender, notability, and inequality on Wikipedia. New media & society, 25(7), 1687-1707.
  4. Wagner, C., Graells-Garrido, E., Garcia, D., & Menczer, F. (2016). Women through the glass ceiling: gender asymmetries in Wikipedia. EPJ data science, 5, 1-24.
  5. Taraborelli, D., & Ciampaglia, G. L. (2010, September). Beyond notability. Collective deliberation on content inclusion in Wikipedia. In 2010 fourth IEEE international conference on self-adaptive and self-organizing systems workshop (pp. 122-125). IEEE
  6. Tasnim Huq, K., & Ciampaglia, G. L. (2021, April). Characterizing Opinion Dynamics and Group Decision Making in Wikipedia Content Discussions. In Companion Proceedings of the Web Conference 2021 (pp. 632-639)