Research talk:Revision scoring as a service/Work log/2016-02-01

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Monday, February 1, 2016[edit]

February! And time to look at the wikidata reverted that probably need review (because they might be vandalism).

> SELECT rev_id FROM wikidata_nonbot_reverted_sample WHERE NOT (trusted_edits OR trusted_user OR client_edit OR merge_edit) and reverted ORDER BY RAND() LIMIT 100

Here's the etherpad that I'll work from: https://etherpad.wikimedia.org/p/wikidata_reverted_edits_in_need_of_review Will post here when I'm done. --EpochFail (talk) 21:19, 1 February 2016 (UTC)

While I was working, I became curious about anonymous editors and how much more often they are reverted than registered under this new definition of "edits needing review".

> select NOT (trusted_edits OR trusted_user OR client_edit OR merge_edit) AS needs_review, anon_user, COUNT(*) AS edits, SUM(reverted) AS reverted, SUM(reverted)/COUNT(*) AS prop FROM wikidata_nonbot_reverted_sample GROUP BY needs_review, anon_user;
+--------------+-----------+--------+----------+--------+
| needs_review | anon_user | edits  | reverted | prop   |
+--------------+-----------+--------+----------+--------+
|            0 |         0 | 466054 |     1260 | 0.0027 |
|            0 |         1 |     22 |        0 | 0.0000 |
|            1 |         0 |  15546 |      123 | 0.0079 |
|            1 |         1 |   6914 |      499 | 0.0722 |
+--------------+-----------+--------+----------+--------+
4 rows in set (0.33 sec)

So, regular edits by non-trusted registered editors seem to be reverted about 1/10th as often as anons. That's a pretty substantial gap. I wonder if we can attribute it entirely to vandalism or if registered user edits are just reviewed with less scrutiny. Let's find out. :) --EpochFail (talk) 21:24, 1 February 2016 (UTC)

Reverted edits needing review[edit]

Vandalism Good faith mistake Good edit Not mainspace
61 20 17 2

OK. 61% of this is vandalism and 81% was clearly damaging. If we filter out the edits that look like they were reverted because of site-link deletions (11), that increase the proportion to 61% for vandalism and 91% clearly damaging. --EpochFail (talk) 23:26, 1 February 2016 (UTC)

Non-reverted edits needing review[edit]

Finally. This is the last set that seems to need review. I want to look at a random sample of non-reverted edits that don't fit in the "don't need review" groups so that we can see how often vandalism and other types of damage is missed.

> SELECT rev_id FROM wikidata_nonbot_reverted_sample WHERE NOT (trusted_edits OR trusted_user OR client_edit OR merge_edit) AND NOT reverted ORDER BY RAND() LIMIT 100;

OK. here it is! https://etherpad.wikimedia.org/p/wikidata_non-reverted_edits_in_need_of_review --EpochFail (talk) 23:31, 1 February 2016 (UTC)


Vandalism Good faith mistake Good edit
1 3 94

OK. In this set, we get 1/98 = 1.0% vandalism because it took a long time and wasn't a revert or a rollback. We get 3/98=3.1% good-faith mistakes -- one of which we are looking into because it is hard to figure out what is going on. And the rest is good. That's 94/98=96%. --EpochFail (talk) 03:35, 2 February 2016 (UTC)