Research talk:Understanding content moderation on English Wikipedia

Using Perspective API

Latest comment: 5 years ago3 comments3 people in discussion

Catilton at team: I'm excited to see you're tackling this important issue. I have one caveat w/r/t your planned use of Perspective API to measure 'toxicity'. The toxicity scores that the API provides are not always particularly accurate and often miss a lot of nuance! See for example this study from 2017^[1] ("the system has high false alarm rate in scoring high toxicity to benign phrases") as well as this (admittedly less scientific) assessment by journalist Violet Blue^[2] ("'I am a black trans woman with HIV' got a toxicity rank of 77 percent"). There's even some evidence that the Perspective model may be less accurate at predicting content that women are more likely (than men) to find toxic.^[3] I'll also say that my own semiformal audits of Perspective (I use it as a teaching tool for a data science class) on various Twitter datasets aligns with the findings above: Perspective is good at detecting words related to profanity and insults, but is often fooled/confused by contextual features (who is speaking, what the topic is, sarcasm, negation, local norms for acceptable speech) and tends to score emotion-laden speech as more toxic than speech that doesn't contain words related to affective state or emotional valence—irrespective of content or context.

At the same time, I recognize that Perspective may still be the best tool we have for detecting lexical cues related to inappropriate speech on Wikipedia at scale, and that the model is under active development and may have improved substantially—I haven't tested it for a while. If you use Perspective, perhaps you could hand-code a stratified random sample of utterances across each topic/dimension to provide some validation of the model predictions. This could also have the benefit of helping you further refine your taxonomy. Hope you find this feedback helpful; thanks again for taking on the project. Cheers, Jmorgan (WMF) (talk) 20:38, 1 May 2019 (UTC)Reply

+1 to what Jonathan is saying about Perspective API and being careful to vet its outputs (or really the outputs of any API that automatically classifies text along nuanced dimensions like toxicity). If you're looking for additional APIs that might provide another view, ORES editquality is an obvious choice as well. Unfortunately I don't know of many more APIs or open-source models that are aimed at this challenge. --Isaac (WMF) (talk) 20:15, 2 May 2019 (UTC)Reply

Thanks, Jmorgan (WMF) and Isaac (WMF), for reviewing our research page and for the helpful feedback. I'll make sure the rest of the my team, who may not be checking this page frequently, see your comments as well.--Catilton (talk) 14:03, 7 May 2019 (UTC)Reply

References

Comparing removed content against...

Latest comment: 5 years ago1 comment1 person in discussion

I wanted to raise the question of how to make meaningful comparisons between content that was removed and content that was not removed. If you look just at the content that was removed, that'll be interesting but incomplete. To also understand whether it is feasible to accurately identify this content at scale, then you'll need some insight into whether tools that identified this content would also be liable to capture a lot of content that looks similar but is acceptable -- i.e. some indication of an expected false positive rate for each area in your taxonomy. I don't have an obvious solution for identifying this control group, but you might consider things like:

For each revert, go through the history of the article where it occurred and pull the most similar edit that was not reverted
For each revert, find a prior or subsequent edit by that user that was not reverted

Then if quantitative analyses like Perspective API's toxicity scores or whether or not the content included a citation etc. cannot distinguish between the removed content and the similar, but not-removed content, that should help us to understand how contextual or nuanced these removal decisions are. In general, I'm looking forward to seeing what your team comes up with and don't hesitate to reach out! --Isaac (WMF) (talk) 20:12, 2 May 2019 (UTC)Reply

Related Work

Latest comment: 5 years ago1 comment1 person in discussion

J. Nathan Matias just published an article that should provide some excellent background and insight into moderation on Reddit: https://journals.sagepub.com/doi/pdf/10.1177/2056305119836778 --Isaac (WMF) (talk) 20:19, 2 May 2019 (UTC)Reply

Thanks Casey for connecting

Latest comment: 5 years ago1 comment1 person in discussion

I have related research at

University of Virginia/Automatic Detection of Online Abuse

I am in the process of wrapping up this round of on-wiki reporting. You have the pre-print; I am getting this into wiki soon.

If I can help or support then ping me. Blue Rasberry (talk) 18:09, 10 May 2019 (UTC)Reply

[1] ttps://arxiv.org/abs/1702.08138

[2] ttps://www.engadget.com/2017/09/01/google-perspective-comment-ranking-system/

[3] ttps://link.springer.com/chapter/10.1007/978-3-319-67256-4_32

[1]

[2]

[3]