Research:Rolling recurring old active editor

From Meta, a Wikimedia project coordination wiki
Rolling recurring old active editor
Specification
A is a user registered before , completed edits between and and continued to complete edits between and .
WMF Standard
  • = 5 edits
  • = 30 days
Measures
Editor retention
Status
SQL
SET @n = 5; /* edits threshold */
SET @u = 30; /* activity unit in days */
SET @T = "20140102"; /* February 1st, 2014 before midnight */

/* Build a table of active editors from last month */
CREATE TEMPORARY TABLE staging.last_active
SELECT
    rev_user
FROM (
    SELECT
        rev_user
    FROM revision
    WHERE rev_timestamp BETWEEN
        DATE_FORMAT(DATE_SUB(@T, INTERVAL @u*2 DAY), "%Y%m%d%H%i%S") AND
        DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S")
    UNION ALL
    SELECT
        ar_user AS rev_user
    FROM archive
    WHERE ar_timestamp BETWEEN
        DATE_FORMAT(DATE_SUB(@T, INTERVAL @u*2 DAY), "%Y%m%d%H%i%S") AND
        DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S")
) AS split_revisions
GROUP BY rev_user
HAVING COUNT(*) >= @n;

/* Build a table of active editors from current month */
CREATE TEMPORARY TABLE staging.current_active
SELECT
    rev_user
FROM (
    SELECT
        rev_user
    FROM revision
    WHERE rev_timestamp BETWEEN
        DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S") AND @T
    UNION ALL
    SELECT
        ar_user AS rev_user
    FROM archive
    WHERE ar_timestamp BETWEEN
        DATE_FORMAT(DATE_SUB(@T, INTERVAL @u DAY), "%Y%m%d%H%i%S") AND @T
) AS split_revisions
GROUP BY rev_user
HAVING COUNT(*) >= @n;

/* Find the intersection and filter the newbies & bots out */
SELECT
    user_id,
    user_name,
    user_registration
FROM staging.current_active
LEFT JOIN staging.last_active USING (rev_user)
INNER JOIN user ON rev_user = user_id
WHERE 
    user_registration <
        DATE_FORMAT(DATE_SUB(@T, INTERVAL @u*2 DAY), "%Y%m%d%H%i%S") AND
    rev_user NOT IN (SELECT ug_user FROM user_groups WHERE ug_group = "bot");

Rolling recurring old active editors are non-bot editors who registered more than 2 time periods ago and completed a minimum number of edits in the last two time periods.

Discussion[edit]

The n edits threshold[edit]

The edits necessary to cross this threshold, the fewer editors will meet the criteria. Historically, "active editors" were considered to be editors who made 5 edits to specific types of pages over the course of a calendar month. So for historical purposes, we recommend .

The u activity unit[edit]

The wider this unit is set, the more casual editors will be considered "active". Historically, "active editors" were considered during calendar months. So for historical purposes, we recommend .

Time lag[edit]

This metric has a built-in time lag of for the time period in question.

Bot filtering[edit]

Bots are filtered using the bot flag method.

Discussion[edit]

The n edits threshold[edit]

The u activity unit[edit]

Time lag[edit]

Plain word definition[edit]

In plain words (using the default values) a '

Analysis[edit]

Discussion[edit]

Notes[edit]

In plain words (using the default values) a 'Rolling recurrent old active editor' is an editor that registered before 2 months ago and completed 5 edits in the last month and another 5 in the month prior to that one. Month is defined not as a calendar month but rather 30 calendar days.