Discussions on Wikipedia are a crucial mechanism for editors to coordinate their work of curating the world’s knowledge. Unfortunately discussions are not only the locus of coordination and cooperation; they are also a major avenue by which editors experience toxicity and harassment. There has been growing concern among the community, the WMF, and the WMF Board about the impact harassment has on community health and a commitment to seriously engage the problem. In collaboration with Jigsaw, Wikimedia Research is developing tools for automated detection of toxic comments using machine learning models. These models allow us to analyze the dynamics and impact of comment-level harassment in talk page discussions at scale. They may also be used to build tools to visualize and possibly intervene in the problem.
- building machine learning models for detecting personal attacks and aggressive tone in user and article talk page comments
- analysis of the prevalence of personal attacks, characteristics of attackers, and moderation of attacks
We are also investigating the following open questions:
- What is the impact of personal attacks and aggressive tone on the retention of editors, especially good-faith newcomers?
- What unintended and unfair biases do models contain, and how can we mitigate them?
- How can machine-learnt models be applied to help the community? For example to triage issues, have accurate measurements of harassment and toxic language, and to encourage a more open debate, and a wider diversity of viewpoints.
- A corpus of all 95 million user and article talk diffs made between 2001–2015 scored by the personal attack model.
- An human annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff).
See the dataset documentation page for more information.
You can try out a demo of our models for personal attacks and aggressive tone. We will be updating this demo regularly. Please get in touch if are interested in consuming model scores programmatically.
We are also exploring building a visualization of toxic comments on wikipedia that would provide:
- An overview all comments in the last month, showing those flagged by one of the machine learnt models.
- Allow users to click on a machine-flagged comment, and send correction feedback to the model, or to to the wikipedia page to intervene.
- Show which toxic comments have been reverted.
Feel free to send us feedback on our experimental mock of how it might look.
Impact of Harassment on User Retention
We are at the beginning stages of quantifying the impact that personal attacks and aggressive tone have on editor retention. We are currently focusing on using observational data to study how newcomers behave after being harassed. In the future, we plan to extend the work to non-newcomers and investigate the causal impact of harassment on retention . See our harassment and user retention documentation page for more information.
Fairness & Algorithmic biases
As with any project using machine learning, fairness and algorithmic biases are important concerns. Machine learning can only be as good as the data it is trained upon. If the input data carries unfair biases, this may inadvertently be learnt by the training algorithm. See the fairness documentation page for more information.
Resources and Related Work
There has been a lot of academic work on harassment in online communities and quite a bit of discussion among Wikipedians on how to address toxicity. We are collecting links to relevant community discussions, data sets, papers and code, and would welcome community additions and suggestions.
There are lots of ways to get involved with this project. If you are interested, please consider:
- Helping judge talk page comments via the Discussion quality WikiLabels campaign (Warning: comments may contain distressing content )
- Sign up as a participant to get periodic progress updates.
- Checking out the Discussion-modeling Phabricator project.
- Leaving a comment or a suggestion our project talk page.
Discussion in the press and on the net
Some press coverage of this work includes:
- Ars Technica: THE QUANTIFIED TROLL
- Bleeping Computer
- NYMag: Are Anonymous Users Really the Worst Trolls?
- Vice Motherboard: Inside Wikipedia’s Attempt to Use Artificial Intelligence to Combat Harassment
- Algorithms and insults: Scaling up our understanding of harassment on Wikipedia -- Wikimedia Research Blog Post (Jigsaw version on medium)
- Böse Wikipedia-Postings sollen Filter befeuern auf heise.de
- "Research:Detox/Resources - Meta". meta.wikimedia.org. Retrieved 2017-01-17.