Research:Monthly wikimedia editor activity dataset
The page documents a dataset that describes the activity levels of registered editors across Wikimedia's projects. This dataset contains a row for every (wiki, user, month) that contains a count of all 'revisions' saved and a count of those revisions that were 'archived' when the page was deleted.
Halfaker, Aaron (2015): Wikimedia editor activity (monthly). figshare.
dx.doi.org/10.6084/m9.figshare.1553296
Methods
[edit]The Wikimedia internal database replicas were queried. Two queries were deployed against all wikis and then joined. The last complete month in the dataset is June, 2015.
Editor months
|
|---|
SELECT
wiki,
month,
user_id,
user_name,
user_registration,
SUM(revisions * archived) AS archived,
SUM(revisions) AS revisions
FROM (
SELECT
LEFT(rev_timestamp, 6) AS month,
DATABASE() AS wiki,
rev_user AS user_id,
FALSE AS archived,
COUNT(*) AS revisions
FROM revision
WHERE rev_timestamp >= "201503"
GROUP BY LEFT(rev_timestamp, 6), rev_user
UNION ALL
SELECT
LEFT(ar_timestamp, 6) AS month,
DATABASE() AS wiki,
ar_user AS user_id,
TRUE AS archived,
COUNT(*) AS revisions
FROM archive
WHERE ar_timestamp >= "201503"
GROUP BY LEFT(ar_timestamp, 6), ar_user
) AS editor_months
INNER JOIN user USING (user_id)
GROUP BY wiki, month, user_id
ORDER BY wiki, month;
|
Local user info (attached_method)
|
|---|
SELECT
DATABASE() AS wiki,
user_id AS user_id,
user_registration AS user_registration,
gu_id AS globaluser_id,
lu_attached_timestamp AS user_attached,
lu_attached_method AS attached_method
FROM user
LEFT JOIN centralauth.localuser ON
lu_wiki = DATABASE() AND
lu_name = user_name
LEFT JOIN centralauth.globaluser ON
gu_name = lu_name
GROUP BY user_id;
|
Sample
[edit]- wiki -- The dbname of the wiki in question ("enwiki" == English Wikipedia, "commonswiki" == Commons)
- month -- YYYYMM
- user_id -- The user's identifier in the local wiki
- user_name -- The user name in the local wiki (from the 'user' table)
- user_registration -- The recorded registration date for the user in the 'user' table
- archived -- The count of deleted revisions saved in this month by this user
- revisions -- The count of all revisions saved in this month by this user (archived or not)
- attached_method -- The method by which this user attached this account to their global account
| wiki | month | user_id | user_name | user_registration | archived | revisions | attached_method |
|---|---|---|---|---|---|---|---|
| aawiki | 200404 | 3 | Angela | NULL | 2 | 2 | password |
| aawiki | 200406 | 2 | Tim Starling | NULL | 10 | 10 | password |
| aawiki | 200408 | 2 | Tim Starling | NULL | 2 | 2 | password |
| aawiki | 200409 | 14 | Davidcannon | NULL | 2 | 2 | primary |
| aawiki | 200412 | 2 | Tim Starling | NULL | 12 | 12 | password |
| aawiki | 200501 | 21 | Rich Farmbrough | NULL | 16 | 16 | password |
| aawiki | 200502 | 23 | ` | NULL | 2 | 2 | primary |
| aawiki | 200508 | 62 | Afar god | NULL | 6 | 10 | primary |
| ... | |||||||
| enwiki | 200101 | 9161929 | KlausSeistrup | 20090308075444 | 0 | 1 | new |
| enwiki | 200101 | 9167505 | ALittleLuck | 20090309011538 | 0 | 1 | new |
| enwiki | 200101 | 10164500 | IvoryRing | 20090722031343 | 0 | 1 | new |
| enwiki | 200101 | 10164531 | StasK | 20090722031938 | 1 | 2 | primary |
| enwiki | 200101 | 10164597 | ChessyPig | 20090722033150 | 0 | 1 | new |
| enwiki | 200101 | 11327310 | Dhcp058.246.lvcm.com | 20091230114723 | 2 | 3 | new |
Note that some users who registered their accounts before 2006 may have strange 'user_registration' dates because that field was not tracked in a reasonable way until 2006. See bugzilla:22097.
Summary analysis
[edit]Coming soon