Research:Detox

Created

Contact

Ellery Wulczyn

Wikimedia Foundation

Collaborators

CJ Adams

Jigsaw (company)

Lucas Dixon

Jigsaw (company)

Patrick Earley

Wikimedia Foundation

Dario Taraborelli

Wikimedia Foundation

Nithum Thain

Jigsaw (company)

Camille Francois

Jigsaw (company)

Open access
via arxiv

Open source
via Github

Open data
via Figshare

Research:Projects

This page documents a completed research project.

Overview

Discussions on Wikipedia are a crucial mechanism for editors to coordinate their work of curating the world’s knowledge. Unfortunately discussions are not only the locus of coordination and cooperation; they are also a major avenue by which editors experience toxicity and harassment. There has been growing concern among the community, the WMF, and the WMF Board^[1] about the impact harassment has on community health and a commitment to seriously engage the problem. In collaboration with Jigsaw, Wikimedia Research is developing tools for automated detection of toxic comments using machine learning models. These models allow us to analyze the dynamics and impact of comment-level harassment in talk page discussions at scale. They may also be used to build tools to visualize and possibly intervene in the problem.

Research

The work we have done so far is outlined in a paper which will be presented at WWW 2017 and includes:

building machine learning models for detecting personal attacks and aggressive tone in user and article talk page comments
analysis of the prevalence of personal attacks, characteristics of attackers, and moderation of attacks

We are also investigating the following open questions:

What is the impact of personal attacks and aggressive tone on the retention of editors, especially good-faith newcomers?
What unintended and unfair biases do models contain, and how can we mitigate them?
How can machine-learnt models be applied to help the community? For example to triage issues, have accurate measurements of harassment and toxic language, and to encourage a more open debate, and a wider diversity of viewpoints.

Data Sets

All data collected or generated for this project is available under free licenses on Figshare, per our open access policy. There are currently two distinct types of data included:

A corpus of all 95 million user and article talk diffs made between 2001–2015 scored by the personal attack model.
A human-annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff).

See the dataset documentation page for more information.

Demo Tool/API

You can try out a demo of our models for personal attacks and aggressive tone. We will be updating this demo regularly. Please get in touch if are interested in consuming model scores programmatically.

We are also exploring building a visualization of toxic comments on wikipedia that would provide:

An overview all comments in the last month, showing those flagged by one of the machine learnt models.
Allow users to click on a machine-flagged comment, and send correction feedback to the model, or to to the wikipedia page to intervene.
Show which toxic comments have been reverted.

Feel free to send us feedback on our experimental mock of how it might look.

Impact of Harassment on User Retention

We are at the beginning stages of quantifying the impact that personal attacks and aggressive tone have on editor retention. We are currently focusing on using observational data to study how newcomers behave after being harassed. In the future, we plan to extend the work to non-newcomers and investigate the causal impact of harassment on retention . See our harassment and user retention documentation page for more information.

Fairness & Algorithmic biases

As with any project using machine learning, fairness and algorithmic biases are important concerns. Machine learning can only be as good as the data it is trained upon. If the input data carries unfair biases, this may inadvertently be learnt by the training algorithm. See the fairness documentation page for more information.

Resources and Related Work

There has been a lot of academic work on harassment in online communities and quite a bit of discussion among Wikipedians on how to address toxicity. We are collecting links to relevant community discussions, data sets, papers and code, and would welcome community additions and suggestions.

Getting Involved

There are lots of ways to get involved with this project. If you are interested, please consider:

Helping judge talk page comments via the Discussion quality WikiLabels campaign (Warning: comments may contain distressing content )
Sign up as a participant to get periodic progress updates.
Checking out the Discussion-modeling Phabricator project.
Leaving a comment or a suggestion our project talk page.

Discussion in the press and on the net

Some press coverage of this work includes:

References

↑ "Research:Detox/Resources - Meta". meta.wikimedia.org. Retrieved 2017-01-17.

[1] "Research:Detox/Resources - Meta". meta.wikimedia.org. Retrieved 2017-01-17.

[1]