Research:Exploring systematic bias in ORES
ORES is a system developed by the Wikimedia foundation to automatically classify the quality of edits and articles. It supports a range of applications for Wikipedia editors to filter edits that they may wish to review and for Wikipedia researchers to evaluate the quality of Wikipedia articles. Although ORES has been actively deployed on several Wikipedia language editions for some time, the role that it plays in these complex sociotechnical systems is not well understood. This project will evaluate the ORES model for edit quality in terms of how it reproduces, amplifies, or counteracts specific systemic biases already manifest in Wikipedia. We will first evaluate ORES' intrinsic bias and then analyze changes in systemic bias manifest in collaboration dynamics in Wikipedia following ORES' introduction.
Systemic bias is an active matter of concern both for Wikipedia projects and for the applications of algorithmic classifiers in sociotechnical systems. Systemic biases against newcomers, articles about women, women sociologists, and non-western topics have been clearly demonstrated in Wikipedia. Supervised classifiers such as ORES that learn from data labeled by humans are prone to learn and reproduce the human's biases. Does ORES reproduce Wikipedia's systemic biases? The first step of this project will be to test ORES against a new sample of human-labeled Wikipedia edits in order to measure ORES' systemic bias. We will then compare our estimate ORES' bias to the average bias of Wikipedia's quality controllers to guess whether ORES amplifies or dampens the systemic bias already present in Wikipedia.
When an algorithmic classification system amplifies systemic bias is deployed, one might predict corresponding changes in bias at the organizational and sociotechnical levels. For instance, if ORES reproduces Wikipedian systemic bias against new articles about women then its use by new page patrollers on Wikipedia might be expected to amplify this bias. While this direct correspondence between the an algorithm's intrinsic and the consequences of its use seems intuitive, it is not a logical necessity, but an empirical question that we seek to answer in this project. After we estimate the intrinsic bias of ORES, we will evaluate outcomes of its introduction using a quasi-experimental random discontinuity design. Doing so we will find out if the straightforward relationship between algorithmic bias and the social consequences of putting a biased system into use obtains in the case of the ORES quality classifier in Wikipedia.
Describe in this section the methods you'll be using to conduct your research. If the project involves recruiting Wikimedia/Wikipedia editors for a survey or interview, please describe the suggested recruitment method and the size of the sample. Please include links to consent forms, survey/interview questions and user-interface mock-ups.
WMF internship/contract: April-June 2019
Outline/plan draft: Tuesday April 9th. Outline shared to research team: Wednesday April 17th for discussion in meeting on the 18th.
Policy, Ethics and Human Subjects Research