Research:Rolling new active editor

From Meta, a Wikimedia project coordination wiki
Rolling new active editor
Specification
A is a newly registered user who both registered and completed edits to pages in any namespace of a Wikimedia project between and .
WMF Standard
  • = 5 edits
  • = 30 days
Related metrics
New editorNew active editorNew wikipedian
Status
SQL
SET @n = 5; /* edits threshold */
SET @u = 30; /* activity unit in days */
SET @T = "20140102"; /* February 1st, 2014 before midnight */
 
/* Results in a set of "new editors" */
SELECT
  user_id,
  user_name,
  user_registration
FROM
  (
    /* Get revisions to content pages that are still visible */
    SELECT
      user_id,
      user_name,
      user_registration,
      SUM(rev_id IS NOT NULL) AS revisions
    FROM user
    INNER JOIN logging ON /* Filter users not created manually */
      log_user = user_id AND
      log_type = "newusers" AND
      log_action = "create"
    LEFT JOIN revision ON
        rev_user = user_id
    WHERE 
        user_registration BETWEEN DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S") AND @T AND
        rev_timestamp BETWEEN DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S") AND @T
    GROUP BY 1,2,3
 
    UNION ALL
 
    /* Get revisions to content pages that have been archived */
    SELECT
      user_id,
      user_name,
      user_registration,
      SUM(ar_id IS NOT NULL) AS revisions /* Note that ar_rev_id is sometimes set to NULL :( */
    FROM user
    INNER JOIN logging ON /* Filter users not created manually */
      log_user = user_id AND
      log_type = "newusers" AND
      log_action = "create"
    LEFT JOIN archive ON 
      ar_user = user_id
    WHERE 
        user_registration BETWEEN DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S") AND @T AND
        ar_timestamp BETWEEN DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S") AND @T
    GROUP BY 1,2,3
  ) AS user_content_revision_count
WHERE user_id NOT IN (SELECT ug_user FROM user_groups WHERE ug_group = "bot")
GROUP BY 1,2,3
HAVING SUM(revisions) >= @n;

Rolling new active editors are non-bot editors who registered recently and saved a minimum threshold of edits. Specifically, editors who registered within days and saved at least edits are classified. Note that rolling new active editors have the potential to be counted as rolling surviving new active editors if they continue to be active more than days after their registration.


Discussion[edit]

The n edits threshold[edit]

The edits necessary to cross this threshold, the fewer editors will meet the criteria. Historically, "active editors" were considered to be editors who made 5 edits to specific types of pages over the course of a calendar month. So for historical purposes, we recommend .

The u activity unit[edit]

The wider this unit is set, the more casually editors will be considered "active". Historically, "active editors" were considered during calendar months. So for historical purposes, we recommend .

Time lag[edit]

This metric has a built-in time lag of .

Bot filtering[edit]

Bots are filtered using the bot flag method.

Analysis[edit]

Discussion[edit]

Notes[edit]