Research:Monthly wikimedia editor activity dataset
The page documents a dataset that describes the activity levels of registered editors across Wikimedia's projects. This dataset contains a row for every (wiki, user, month) that contains a count of all 'revisions' saved and a count of those revisions that were 'archived' when the page was deleted.
Halfaker, Aaron (2015): Wikimedia editor activity (monthly). figshare.
dx.doi.org/10.6084/m9.figshare.1553296
Methods
[edit]The Wikimedia internal database replicas were queried. Two queries were deployed against all wikis and then joined. The last complete month in the dataset is June, 2015.
Editor months
|
---|
SELECT wiki, month, user_id, user_name, user_registration, SUM(revisions * archived) AS archived, SUM(revisions) AS revisions FROM ( SELECT LEFT(rev_timestamp, 6) AS month, DATABASE() AS wiki, rev_user AS user_id, FALSE AS archived, COUNT(*) AS revisions FROM revision WHERE rev_timestamp >= "201503" GROUP BY LEFT(rev_timestamp, 6), rev_user UNION ALL SELECT LEFT(ar_timestamp, 6) AS month, DATABASE() AS wiki, ar_user AS user_id, TRUE AS archived, COUNT(*) AS revisions FROM archive WHERE ar_timestamp >= "201503" GROUP BY LEFT(ar_timestamp, 6), ar_user ) AS editor_months INNER JOIN user USING (user_id) GROUP BY wiki, month, user_id ORDER BY wiki, month; |
Local user info (attached_method)
|
---|
SELECT DATABASE() AS wiki, user_id AS user_id, user_registration AS user_registration, gu_id AS globaluser_id, lu_attached_timestamp AS user_attached, lu_attached_method AS attached_method FROM user LEFT JOIN centralauth.localuser ON lu_wiki = DATABASE() AND lu_name = user_name LEFT JOIN centralauth.globaluser ON gu_name = lu_name GROUP BY user_id; |
Sample
[edit]- wiki -- The dbname of the wiki in question ("enwiki" == English Wikipedia, "commonswiki" == Commons)
- month -- YYYYMM
- user_id -- The user's identifier in the local wiki
- user_name -- The user name in the local wiki (from the 'user' table)
- user_registration -- The recorded registration date for the user in the 'user' table
- archived -- The count of deleted revisions saved in this month by this user
- revisions -- The count of all revisions saved in this month by this user (archived or not)
- attached_method -- The method by which this user attached this account to their global account
wiki | month | user_id | user_name | user_registration | archived | revisions | attached_method |
---|---|---|---|---|---|---|---|
aawiki | 200404 | 3 | Angela | NULL | 2 | 2 | password |
aawiki | 200406 | 2 | Tim Starling | NULL | 10 | 10 | password |
aawiki | 200408 | 2 | Tim Starling | NULL | 2 | 2 | password |
aawiki | 200409 | 14 | Davidcannon | NULL | 2 | 2 | primary |
aawiki | 200412 | 2 | Tim Starling | NULL | 12 | 12 | password |
aawiki | 200501 | 21 | Rich Farmbrough | NULL | 16 | 16 | password |
aawiki | 200502 | 23 | ` | NULL | 2 | 2 | primary |
aawiki | 200508 | 62 | Afar god | NULL | 6 | 10 | primary |
... | |||||||
enwiki | 200101 | 9161929 | KlausSeistrup | 20090308075444 | 0 | 1 | new |
enwiki | 200101 | 9167505 | ALittleLuck | 20090309011538 | 0 | 1 | new |
enwiki | 200101 | 10164500 | IvoryRing | 20090722031343 | 0 | 1 | new |
enwiki | 200101 | 10164531 | StasK | 20090722031938 | 1 | 2 | primary |
enwiki | 200101 | 10164597 | ChessyPig | 20090722033150 | 0 | 1 | new |
enwiki | 200101 | 11327310 | Dhcp058.246.lvcm.com | 20091230114723 | 2 | 3 | new |
Note that some users who registered their accounts before 2006 may have strange 'user_registration' dates because that field was not tracked in a reasonable way until 2006. See bugzilla:22097.
Summary analysis
[edit]Coming soon