Research:Social and Language Influence in Wikipedia Articles for Deletion Debates

From Meta, a Wikimedia project coordination wiki
Duration:  2022-June – 2023-August

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

Today, digital platforms are integral for group decision-making. This is especially evident on Wikipedia, where the curation of [six million articles] depends upon online discussion. In such a setting, structural (e.g votes) and linguistic (e.g. sentiment) features can be especially salient. Both can contribute to the likelihood of a discussant agreeing with the majority opinion; however, both can also be imperfect signals of a given discussion post’s credibility. Thus, by precisely defining their relationship with individual users’ behaviors, we may gain insights into herding-like behaviors and assist initiatives to improve user engagement and address knowledge gaps.

Project Goals[edit]

We define herding-like behaviors broadly as “the tendency for individuals to follow the lead of the group”.

Using a corpus composed of Wikipedia Articles for Deletion (AfD) debates, we attempt the following:

  • Check that behavior consistent with herding is present in debates.
  • Construct a binary logistic model of user choice to assess the relative influence of structural versus linguistic features.
  • Assess how the influence of linguistic features changes in debates over articles about women.
  • [Potentially] Run a lab experiment to test for causality.

Related Work[edit]

Several works have linked the outcome of AfD debates with vote sequences and voting language. For example, one study looked into the existence of herding effects and voter heterogeneity within debates. The authors found that an over- or under-expression of a particular vote type towards the start of a debate was associated with an over- or under-expression of that vote type, respectively, towards the end of a debate. Another study encoded the rationales of each voting comment with a base BERT model, then trained a logistic regression classifier to predict a probability distribution over debate outcomes given a debate.[1] This model indicated that early votes were highly predictive of debate outcomes, with the effect being especially evident for “keep" votes.[2] We hope to contribute to this literature by using a model to predict individual votes rather than debate outcomes; connecting structural and linguistic features of debates with gender gaps; and partially addressing the issues of selection into debates and endogeneity of user preferences. The outcomes of AfD debates have also been linked with gender differences. Prior work has indicated that over 25% of women biographies on the English-language Wikipedia are nominated for deletion each month, despite the fact that women biographies comprise only around 18% of total biographies.[3] We hope to contribute to this literature by using a model to predict individual votes rather than debate outcomes; connecting structural and linguistic features of debates with gender gaps; and partially addressing the issues of selection into debates and endogeneity of user preferences.



We utilize the Wikipedia Articles for Deletion Corpus included in the Cornell Conversational Analysis Toolkit (ConvoKit).[4] This corpus is a collection of 383,918 AfD debates that occurred between 1/1/2005 and 12/31/2018 on the English-language Wikipedia. Removing a Wikipedia article requires a Wikipedia user to post a nomination on AfD. Other users then post a “keep” or “delete” vote (among other options, which we omit for simplicity), along with a comment describing their rationale. They may also post non-voting comments, which are generally replies to the previous vote. The sequences of votes left by users on AfD pages, as well as their rationales, provide a rich data source for examining social decision-making. Each user’s vote is likely influenced by the existing discussion, both of which are available to us. By examining the votes and comments, we can determine the extent to which users follow the existing majority of the debate.

Behavior Check[edit]

Across debates, we look at the probability that the (k+1)th vote agrees with the majority of the preceding k votes, as a function of k. If herding-like behavior is present, we would expect to see the probability of agreeing with the majority increase with k.

Baseline Model[edit]

We construct a binary logistic model of user votes using subsets of features of the preceding votes.

  • Index in the debate of the vote to predict
  • Two sets of the following features, where one set is computed for the first half of the prefix, and the other set is computed for the second half of the prefix:
    • Presence of previous delete votes
    • Presence of previous keep votes
    • Fraction of previous delete votes
    • Average values of the following features for prior “keep” and prior “delete” votes:
      • Length
      • Sentiment
      • Slang
      • External link references
      • Use of "per nom"
      • Politeness (TBD)
      • Concreteness (TBD)

Baseline Model with Gender[edit]

Currently, we are working to enhance our baseline model by adding gender features and re-running it on a subset of data consisting only of debates nominating biographies for deletion. To create the new features, we use article titles and gender detection algorithms. However, a key feature for this kind of comparative analysis is article quality, which is unavailable in our data; thus, we are also attempting to use another data source - Wikidata - as well as algorithmic assessments of article quality to supplement the predictions of the enhanced model.


[In Progress] A lab experiment can be used to address causality concerns. This experiment would be based on work in experimental economics on causal narratives, mental models, and detection problems.

Recent News[edit]

  • Presented at WikiWorkshop 2023
  • Introduced project at Wikimedia New York's WikiWednesday Salon in June 2023
  • Presented at Wikimania 2023


  1. Dario Taraborelli and Giovanni Luca Ciampaglia. 2010. Beyond notability. collective deliberation on content inclusion in wikipedia. In 2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems Workshop, pages 122–125. IEEE.
  2. Elijah Mayfield and Alan W. Black. 2019. Analyzing Wikipedia Deletion Debates with a Group Decision-Making Forecast Model. Proc. ACM Hum.-Comput. Interact., 3(CSCW), Nov.
  3. Francesca Tripodi. 2021. Ms. Categorized: Gender, notability, and inequality on Wikipedia. New Media & Society, 0(0).
  4. Jonathan P. Chang, Caleb Chiam, Liye Fu, Andrew Wang, Justine Zhang, Cristian Danescu-Niculescu-Mizil. 2020. "ConvoKit: A Toolkit for the Analysis of Conversations". Proceedings of SIGDIAL.