Research:STiki 1 million reverts review

From Meta, a Wikimedia project coordination wiki
14:00, 13 October 2016 (UTC)
no affiliation

This page is an incomplete draft of a research project.
Information is incomplete and is likely to change substantially before the project starts.

STiki has reverted 1 million edits. What's up with all that? That's our goal in this project. We'll explore and describe the role that STiki has played in English Wikipedia.


On 2016-OCT-16, STiki will complete its first full backup since crossing the 1M revert threshold (its a MySQL DB). I will pull that file and post it somewhere in case you want a local copy. I'll probably also get that static snapshot running on my server and can give you server/database access (though the machine is no powerhouse). This should give us a consistent copy from which to work, that doesn't interfere with STiki's live operation, and that doesn't have constantly growing tables as we try to do data analysis. It's around 20GB compressed, the vast majority of which is the metadata/features stored alongside every NS0 edit, and hyperlink data from where STiki tried to play the anti-spam game for a bit (this could probably be ommitted). I don't recall the uncompressed size. West.andrew.g (talk) 21:26, 14 October 2016 (UTC)

Explaining critical tables and columns[edit]

TABLE: "feedback" : This is where every press of the "classification" buttons is recorded (except for "pass"). The opaque column in this table is "LABEL", which can take on the following values:

QUEUE Innocent Guilty Good-faith
STiki -1 1 5
CBNG -2 2 10
WikiTrust -3 3 15
Spam -4 4 20

TABLE: "log_client" : If a client does something that requires a database change, it is done through a stored procedure, and all those calls are recorded here. We don't actually record the parameters, just the name of the stored procedure. The "user" in this case is also the database user, not the Wikipedia user. We are able to link these using some session matching. The initial idea of the table was to do audits and make sure no one was using the exposed API/procedures to mass classify edits as innocent or place a "hold/reservation" on great quantities of edits so the queue would be useless to others. It's important to realize that STiki's queues are "synchronized". Any action in one queue performs the same action in all other queues. The client actions impacting the database are:

SP name SP description
queue_fetch_* The client saying "I need some edits!" from a queue. The server will return the 10 available RIDs with the highest priority that the user hasn't ignored. It will also place a TTL "reservation" on these RIDs such that they are unavailable to others.
feedback_insert The user has pressed "vandalism", "guilty", or "ignore" and we need to record that in the "feedback" table.
queue_delete Because the user has classified an edit, we need to dequeue it so no one else gets it.
queue_wipe Remove one's reservation on some RIDs. Done if a user switches queues mid-session or more often, a clean shut-down of STiki.
oe_insert If the classification is "vandalism" we have an "offending edit" (OE) and this gets recorded special as these are used in calculating reputation metrics for articles/editors.
queue_ignore A user has pressed the "pass" button for an edit. This is the only place passes are recorded.
queue_resurrect A weird corner case that captures -some- uses of the "back" button. If a user classified an edit as "innocent", went "back", then classified it as "pass", we need to do some DB unwinding which this handles.
leaderboard When a user generates a version of the leader-board from inside the client.

A timeline of STiki's history[edit]

Date UNIX TS Description
2010-FEB-26 1267209791 First STiki classification (testing by west.andrew.g)
2010-DEC-25 1293253809 The "log_client" table comes online
2016-OCT-21 1477023316 Last STiki classification per snapshot

Research questions[edit]

  • How long does the average anti-vandal session last? How many classifications is that? Does vandalism hit-rate effect session length or frequency?
  • Who are the anti-vandals using the tool? How do they distribute geographically? Where are they in their wiki careers? When STiki use stops, are they quitting Wikipedia, or have they found alternative tasks (is it a gateway drug?)?
  • Has issuing barnstars for classification thresholds achieved anything? Are we able to gamify anti-vandals for throughput?
  • STiki briefly hosted an anti-spam queue. No one used it. Reviews took a long time. Does this tell us anything?
  • STiki integrated a "good faith revert" function at some point besides just vandalism/innocent/pass. How did this effect "vandalism" presses?
  • Can a history of STiki + CBNG edit scoring tell us anything about Wikipedia's vandalism propensities in a longitudinal fashion?




  • AGW's PhD dissertation (I know, right?) - The content of Chapter 6 has never been published in a conference/journal and is fair game. It was an 11th hour attempt to shove a bit of STiki data into the document. Obviously we need to go much deeper.
  • Geiger, R. S., & Halfaker, A. (2013, February). Using edit sessions to measure participation in wikipedia. CSCW (pp. 861-870). ACM. -- To compare STiki anti-vandal sessions with those project-wide.
  • Geiger, R. S., & Halfaker, A. (2013, August). When the levee breaks: without bots, what happens to Wikipedia's quality control processes?. OpenSym (p. 6). ACM. -- STiki users review edits that make it past ClueBot NG and Huggle.
  • - The mainstream media looks at STiki scoring in the context of 2016 US presidential election