Jump to content

Research talk:Automated classification of edit quality/Work log/2017-07-24

Add topic
From Meta, a Wikimedia project coordination wiki

Monday, July 24, 2017


Labels' validity test


Cross-posted to https://phabricator.wikimedia.org/T171497

I looked through 50 random edits from the training set labeled either as Damaging Goodfaith (DG) or Damaging Badfaith (DB). Actually, I did it twice - for enwiki and for ruwiki (both languages I claim I know). As my revision showed, out of 100 edits, 48 were mislabeled (if I really know those languages I claim to know). Let's start with English.



This is the line that retrieves 50 edits labeled either as Damaging/Goodfaith or Damaging/Badfaith in 25/25 proportion:

$ (cat enwiki.labeled_revisions.20k_2015.json | grep '"damaging": true' | grep '"goodfaith": true' | json2tsv rev_id |  sed -r "s#([0-9]+)#https://en.wikipedia.org/wiki/?diff=\0 (Damaging, Goodfaith)#" | shuf -n 25; cat enwiki.labeled_revisions.20k_2015.json | grep '"damaging": true' | grep '"goodfaith": false' | json2tsv rev_id | sed -r "s#([0-9]+)#https://en.wikipedia.org/wiki/?diff=\0 (Damaging, Badfaith)#" | shuf -n 25) | shuf

In the English set, out of 50 random edits, 20 are mislabeled and 30 are labeled correctly. And 8 of the edits turned out to be neither Damaging/Goodfaith (DG), nor Damaging/Badfaith (DB), but Not-Damaging/Goodfaith (NG). Here is a sort of a collision matrix:

true DG true DB true NG
labeled DG 11 9 5
labeled DB 3 19 3
labeled NG - - -

The RAW data: https://etherpad.wikimedia.org/p/enwiki_damage_faith_review


$ (cat ruwiki.labeled_revisions.20k_2015.json | grep '"damaging": true' | grep '"goodfaith": true' | json2tsv rev_id |  sed -r "s#([0-9]+)#https://ru.wikipedia.org/wiki/?diff=\0 (Damaging, Goodfaith)#" | shuf -n 25; cat ruwiki.labeled_revisions.20k_2015.json | grep '"damaging": true' | grep '"goodfaith": false' | json2tsv rev_id | sed -r "s#([0-9]+)#https://ru.wikipedia.org/wiki/?diff=\0 (Damaging, Badfaith)#" | shuf -n 25) | shuf

In the Russian set, out of 50 random edits, 18 are mislabeled and 32 are labeled correctly. And 9 of the edits turned out to be Not-Damaging/Goodfaith (NG).

true DG true DB true NG
labeled DG 15 4 6
labeled DB 5 17 3
labeled NG - - -

The RAW data: https://etherpad.wikimedia.org/p/ruwiki_damage_faith_review

Down the rabbit hole


Although it's always hard to judge about someone else's edit, this rule of thumb should help: when in doubt, assume good faith.

"Assume good faith" helped me in re-labeling this particular edit: https://en.wikipedia.org/wiki/?diff=620783533 It came labeled initially as Damaging and Badfaith. A registered user, who now edits wiki as it was a full-time job, replaced "better" with "worser." This word obviously is not an element of modern English, and what is allowed to Shakespeare is not allowed to Wikipedia editor. At the same time, the idea of the edit was to replace "better" with an opposite word. And that proved to be a correct edit. Maybe an editor acted in good faith but created the grammar issue accidentally? His record list under contributions shows that he does take Wikipedia seriously, so I would label this edit as Damaging and Goodfaith. Anyway, this case is a good illustration of why we need to set up a productive discussion over some labels with Meta ORES: https://phabricator.wikimedia.org/T171496.

So, the most interesting results of the training set labels' revision are:

  • Many damaging badfaith edits are reverted in the following 1-2 minutes by the abusers themselves. This type of vandals are like curious kids who want to check whether Wikipedia is really editable by anyone. If the vandal edit is not removed by its creator, it takes patrol from several minutes and in some cases up to several days to find and revert it (which would take even longer without helpful counter-vandalism bots ). I believe, the revert time of the "curious" badfaith edits influences the median time of damaging edits reversion. Funny, but when we say that 50% of all vandalism is being detected and reverted within an estimated four minutes of appearance we should thank also some vandals for such a great statistic!
  • To deduce whether a particular edit is goodfaith a reviewer needs to see a couple of edits back and forward, and, maybe, take a look at the editor's contribution.
  • Quite often, vandals use registered accounts rather than reveal their IP addresses, especially in the English Wikipedia. So whether the user is anonymous not always gives a robust signal: not all vandals are anons and not all anons are vandals. In the English subset we reviewed, the proportion of IP vandalism to registered one was 18 damaging badfaith edits committed under IP versus 10 DBs under a registered account. However, for Russian Wikipedia, this proportion is drastically different: 20 IPs / 1 registered.
  • One of the typical vandal behavioral patterns is making several edits to different places in one or two articles in a span of 5-10 minutes. A goodfaith editor would take more time to figure out an edit, and also would try to make them all in one commit. If it's several commits, then a goodfaith editor must be improving his own edit. A badfaith editor would rather take several random shots to make the reversion of all his malicious edits harder.
  • Sometimes, vandals lie in the description to their edit that it was "minor" (there is a tag for that) or it was some sort of a necessary correction. So the description is also not a reliable feature.
  • One popular vandal use of Wikipedia is a "shameless plug" when a user plugs in either link to their business or a content. I've seen at least once when such a content was copy pasted into three different article being irrelevant to all of them. As to the shameless links, they are often being put into relevant topic articles.
  • Some labels just don't make sense, unless they are misclicks. Like those eight not damaging goodfaith edits in our enwiki subset mislabeled as damaging. It is hard to tell what informed this mislabeling: five of these edits were made by registered users which should have made them look legit; in three cases it was either a spelling or punctuation mistake corrected; and one case is just a registered user developing his own article in the sandbox. One explanation could be a misleading UI for Wiki Labels. To test if that's the case, Aaron Halfaker fetched the edits labeled "not damaging/badfaith," a category that makes no sense. And yet, we had a lot of that kind of labels in our training set. Which makes a strong case against the current UI of the Wiki Labels tool.
  • There is one more explanation for the category "not damaging/badfaith" existing. A reviewer may actually hit "not damaging" and "unknown," and then the labels aggregation code deem all unknowns to default "badfaith". And it turned out to be true: the fetch_labels.py algorithm considered all nulls to be "badfaith". After we changed it "goodfaith," the accuracy of the English Wikipedia Goodfaith/Badfaith classifier improved from 0.909 to 0.928 AUC. Interestingly, for Russian Wikipedia, this change did not result in any significant improvement in accuracy.

Can the Reverted model help improve the Damaging model?


So far, we've only tuned the Goodfaith model. What can be done about the Damaging one? I looked through another 50 edits from our training set, those that were labeled Reverted/Not Damaging.

$ cat enwiki.labeled_revisions.20k_2015.json | grep '"damaging": false' | grep '"reverted_for_damage": true' | json2tsv rev_id |  sed -r "s#([0-9]+)#https://en.wikipedia.org/wiki/?diff=\0 (Reverted, Not Damaging)#" | shuf -n 50

32 of them I would label as Damaging, although not all of them were badfaith. Seems like reviewers are often missing promo links, deletion of the massive patches, which potentially may misguide the Damaging model. On the one hand, we may improve the accuracy of the Damaging model taking a "would-be-reverted" status assigned by the Reverted model as a feature. On the other hand, it is a risky move and before we do it, it is necessary to thoroughly study the Reverted model's yield.

Here you can enjoy comments to all 50 edits, along with the edits themselves.