Research talk:STiki 1 million reverts review/Work log/2016-11-02

From Meta, a Wikimedia project coordination wiki

Wednesday, November 2, 2016[edit]

Hey folks! Just sitting down with this dataset. *cracks knuckles* Let's have a quick look!

> sum(mr$reverts)
[1] 895344

Hmm.. Looks like I'm missing about 100k reverts. I wonder why that is. maybe my regex has some imperfections. I'll have to consult West.andrew.g.

Took a look at your SQL query. Seems like you are trying to glean STiki usage purely from edit comments? I'm sure most of the missing reverts are from the early days before we converged on the current format. The code wasn't even under version control in those early days. I can figure it out some of the old strings, though, by joining the STiki feedback table against a metadata one to get some of those older comments. I imagine it will be of most benefit to put critical STiki tables onto WMF infrastructure to do quick joins on RIDs. West.andrew.g (talk) 04:00, 2 November 2016 (UTC)[reply]

Anyway, forward!

So, first I want to look at how STiki's usage has been changing over time. There's some yearly periods here so I'll break down the graphs by year and month.

The monthly stiki revert counts are plotted for each year that the tool was active.
STiki reverts. The monthly stiki revert counts are plotted for each year that the tool was active.

We can definitely see a dip in may/jun which is probably all the vandals going on summer vacation ;) I'm surprised not to see the same pattern in 2016 though. OK next I want to look at the proportion of reverts that were of anons' edits.

The proportion of reverts of anon editors is plotted over the time that STiki was active
STiki anon reverts. The proportion of reverts of anon editors is plotted over the time that STiki was active

Here, we can see a general decline in the overall proportion of anons who were reverted (compared to registered editors). When STiki first came out, it was 100% anons, then it dropped to about 90% anons in 2011. Then midway through 2013, we see it drop again to the low 80%s. I don't know what to think of the two dips in 2014 and the beginning of 2015. It's more than one month that shows that dip, so it seems like it might be real.

I'll pull the table description text and "STiki timeline" I started into this work log. I think if I curate that timeline well enough, it will begin to answer these questions and additional variables we might want to isolate. I know my very first interface didn't process edits by registered users (thus 100% anon). At some point the interface shifted its default queue from my metadata model to that of CBNG. We might assume their model penalizes anons less and relies more on edit language. The queue an RID was chosen from is annotated in STiki's tables. West.andrew.g (talk) 04:27, 2 November 2016 (UTC)[reply]

OK next I was curious about the rate at which reverts were flagged as "good-faith".

The proportion of good-faith reverts are plotted for reverts of anons and registered users over time.
STiki good-faith reverts. The proportion of good-faith reverts are plotted for reverts of anons and registered users over time.

It's interesting that registered editor reverts get flagged as "good-faith" at about a 10% higher rate and that remains consistent over time. It's also really interesting that the proportions track pretty closely.

OK. One last thing. Let me link to the dataset! https://github.com/halfak/STiki-revert-analysis/blob/master/datasets/enwiki.monthly_stiki_reverts.tsv

That's all for today. I'll have a good think about all this and regroup with some new analysis soon. --EpochFail (talk) 01:03, 2 November 2016 (UTC)[reply]