Research:Measuring Agreement among Wikipedians

From Meta, a Wikimedia project coordination wiki

This page documents a planned research project.
Information may be incomplete and change before the project starts.

This research proposal is a test case for a new MediaWiki gadget which enables qualitative coding of diffs by registered editors.

Key Personnel[edit]

Project Summary[edit]

The research question for this project is: How much do Wikipedians agree about what a diff contains and how to respond to it? In particular, we're interested in responses to edits in the main article space (Namespace 0) and potentially discussion spaces (Namespaces 1 and/or 3), since this is where most edits occur.


The Wikimedia Foundation features team is potentially interested in building enhanced curation tools within MediaWiki (such as for New Page Triage) that enable community reaction to new contributions with much greater speed and ease.

The risk with such tools is that, unlike our traditional mechanisms for generating consensus, the speed with which these tools might operate can potentially allow any overeager curator to unduly influence Wikipedia content and other editors.

Thus, it may be preferable that curatorial actions with these enhanced tools operate on an aggregative basis, where actions such as tagging for cleanup or notability problems would be recorded but would only appear to readers and editors if a certain number of Wikipedians agree. The first open question with such a system is, "How many Wikipedians need to agree before something is the implicit consensus?"


The gadget enabled on a test wiki.

The study entails qualitative data collection on English Wikipedia via an opt-in MediaWiki gadget, the Diff Categorizer, followed by anonymized analysis of how much the qualitative coders, in this case Wikipedians, agreed about their coding. The ideal first codebook would be something very open-ended assessment of the quality of mainspace edits and what the desired reaction is (e.g. revert or not). The tool could also be easily localized to other languages in order to measure agreement in other communities as well.


Invitations to enable the gadget and particiapte will be disseminated on the English Wikipedia Village Pump, and perhaps other similar public avenues.

Wikimedia Policies, Ethics, and Human Subjects Protection[edit]

As a study run by Foundation staff and using WMF resources, it is subject to the Wikimedia Foundation privacy and data retention policies.

Benefits for the Wikimedia community[edit]

Encoding diffs with qualitative metadata about what they contain and how Wikipedians would respond to them can inform not only basic research about what community members think about different kinds of edits, but will inform the design and implementation of software curation tools. Providing hard data about the degree of agreement between editors will provide a clearer picture of how widely the workload might be distributed (e.g. Do we need two editors or five editors to agree that a page needs cleanup before automatically flagging it as such?).

Time Line[edit]

The data collection for this project is open ended and depends largely on how many people enable the gadget and participate.