Research:Wikimedia France Research Award/nominated papers/Creating, destroying, and restoring in Wikipedia

From Meta, a Wikimedia project coordination wiki

Creating, destroying, and restoring value in Wikipedia[edit]

Creating, destroying, and restoring value in Wikipedia is a groundbreaking, classic paper published at the ACM Conference on Supporting Group Work (GROUP) in 2007.

See the full text here.

Summary[edit]

This paper heralds a quantitative approach to measuring impact of an edit. The six authors (Reid Priedhorsky, Jilin Chen, Shyong (Tony) K. Lam, Katherine Panciera, Loren Terveen, and John Riedl) worked at the GroupLens research lab at the University of Minnesota.

They suggest quantifying the impact of a given edit by the number of times the edited version is viewed. The concept of persistent word view (PWV) is builds on the notion of an article view: each time an article is viewed, each of its words is also viewed. When a word written by editor X is viewed, he or she is credited with one PWV.

They use a series of datasets spanning four years, analyzing 4.2 million editors and 58 million edits, with results highlight the importance of frequent editors, who dominate what people see when they visit Wikipedia, and show that this domination is increasing. For example, the top 10% of editors by number of edits contributed 86% of the PWV, and top 0.1% contributed 44%.

They implement a vandalism-detecting metric using only the comments associated to the reverts. The metric is implemented through a sophisticated automated / human research protocol, with results showing the rapidity of damage repair (42% of damage incidents are repaired immediately, with 0.75% of incidents persist beyond 1000 views). Interestingly, they take into account not only how long articles remained in a damaged state, but also how many times they were viewed while in this state. While the overall impact of damage in Wikipedia is low, they show it is rising. The appearance of vandalism-repair bots in early 2006 seeming to have halted the exponential growth.

Based on previous papers, like the 2004 "Studying cooperation and conflict between authors with history flow visualizations" (also a nominee for this award) they use his categories of damages to articles (nonsensical, offensive, false content…), but also correct and add some of their own (Misinformation, Partial delete, Spam). Using human judgement, they show most for the damage belongs to the “nonsense” category.

Jury comments[edit]

seminal ideas, quantitative approach, a lot of content

Vote for this paper

Vote[edit]

  1. Very useful measure of edit impact. Avenue (talk) 00:12, 25 February 2013 (UTC)
  2. Interesting work --PierreSelim (talk) 13:29, 26 February 2013 (UTC)
  3. Ypnypn (talk) 13:55, 6 March 2013 (UTC)
  4. Most definitely; the D_LOOSE and D_STRICT regular expressions alone would win this a prize. Ironholds (talk) 21:22, 9 March 2013 (UTC)
  5. Introduces several broadly relevant new metrics for assessing edit impact & article quality; shows the importance of Wikipedia's core editor base in creating and maintaining the value of the encyclopedia, refuting the commonsense notion that Wikipedia is somehow a product of a nameless, faceless "crowd". And exposes both the ways in which Wikipedia and open wikis in general are vulnerable to quality degradation, and the robust mechanisms that exist to combat it. Best of the five! Disclosure: I have co-authored a research paper with the last author. Jtmorgan (talk) 21:27, 10 March 2013 (UTC)
  6. Tbayer (WMF) (talk) 22:31, 10 March 2013 (UTC)
  7. "Persistent word view" is a interesting idea. Finn Årup Nielsen (fnielsen) (talk) 13:57, 11 March 2013 (UTC)