Research:Social and Language Influence in Wikipedia Articles for Deletion Debates

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Duration:  2022-June – ??

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.

For an ever-growing range of decisions, individuals are explicitly or implicitly made to process others’ responses before responding themselves. For example, online shoppers who want to make informed purchases are invariably exposed to others’ product reviews when looking up product specifications, and an investor typically observes industry experts’ opinions and other investors’ actions while deciding on his or her own investments. Given the prevalence of these group decision-making contexts, it is important to clarify how an individual’s response to a decision could be affected by others’ responses to the same decision. We therefore focus on the phenomenon of herding. Broadly, “herding” occurs when everyone does what everyone else is doing, even when private information suggests otherwise. A better understanding of herding can improve existing guidance on group decision-making, because herding can lead to suboptimal outcomes.

Thus, we focus on the Wikipedia Articles for Deletion (AfD) debates. As a community-driven encyclopedia, Wikipedia offers a unique platform for studying group decision-making, especially through language. Removing a Wikipedia article requires a Wikipedia user to post a nomination on AfD. Other users then post a “keep” or “delete” vote (among other options, which we omit for simplicity), along with a comment describing their rationale. They may also post non-voting comments, which are generally replies to the previous vote. The sequences of votes left by users on AfD pages, as well as their rationales, provide a rich data source for examining social decision-making. Each user’s vote is likely influenced by the existing discussion, both of which are available to us. By examining the votes and comments, we can determine the extent to which users conform to the existing majority of the debate.


For this proposal, our data will come from the Wikipedia Articles for Deletion Corpus included in the Cornell Conversational Analysis Toolkit (ConvoKit). This corpus is a collection of approximately 400,000 AfD debates that occurred between 1/1/2005 and 12/31/2018 on the English-language Wikipedia. For some exploratory analyses, we will also do additional cleaning and parsing with ConvoKit’s built-in functions.


Let the prefix of a debate be defined as the sequence of previous votes that a given voter sees as the prefix of the debate.

Activity Statistics[edit]

  • Percentage of (k+1)th voting comments that contain a “delete” (“keep”) vote given the percentage of “delete” (“keep”) votes in a prefix of length k.
  • Probability that the (k+1)th vote agrees with the majority of the prefix, as a function of k.

Binary Logistic Model[edit]

[Tentative.] The following features are used to predict individual user votes:

  • Index in the debate of the vote to predict
  • Two sets of the following features, where one set is computed for the first half of the prefix, and the other set is computed for the second half of the prefix:
    • Presence of previous delete votes
    • Presence of previous keep votes
    • Fraction of previous delete votes
    • Average values of the following features for prior “keep” and prior “delete” votes:
      • Length
      • Sentiment
      • Slang
      • External link references
      • Use of "per nom"
      • Politeness (TBD)
      • Concreteness (TBD)


[Tentative.] A lab experiment will be used to address causality concerns. This experiment will be based on work in experimental economics on causal narratives, mental models, and detection problems.

To know more vist- Website that writes papers for you

Monthly Updates[edit]

  • June 2022: Proposal approved for funding.
  • July 2022: Reviewed literature on natural language processing and social interactions.
  • August 2022: Reviewed literature on belief updating.
  • September 2022: Started developing conceptual model of language and decision-making, as well as tentative designs for a lab experiment.