# Research:Surviving new editor

Standard metric
Surviving new editor
Specification
A ${\displaystyle {\text{surviving new editor}}(n,m,t_{1},t_{2},t_{3})}$ is a new editor who completes at least ${\displaystyle n}$ edits within ${\displaystyle t_{1}}$ time since registration (${\displaystyle T}$) and also completes ${\displaystyle m}$ edits in the survival period ${\displaystyle [T+t_{2},T+t_{2}+t_{3}]}$.
WMF Standard
• ${\displaystyle n}$ = 1 edit
• ${\displaystyle m}$ = 1 edit
• ${\displaystyle t_{1}}$ = 1 day
• ${\displaystyle t_{2}}$ = 30 days (~ one month)
• ${\displaystyle t_{3}}$ = 30 days (~ one month)
Measures
Editor retention
Aliases
Retained editor
Related metrics
New editor
Status
draft
SQL
SET @activation_period = 1; /* One day */
SET @n = 1; /* One activation edit */
SET @trial_period = 30; /* 30 days */
SET @survival_period = 30; /* 30 days*/
SET @m = 1; /* One survival edit */
SET @start_date = "20140101"; /* January 1st, 2014 after midnight */
SET @end_date = "20140201"; /* February 1st, 2014 before midnight */

SELECT
user_id,
user_name,
user_registration,
SUM(activation_edits) > @n AS activated,
SUM(activation_edits) > @n AND SUM(surviving_edits) > @m AS surviving,
(
UNIX_TIMESTAMP(NOW()) <
) AS censored
FROM (
SELECT
user_id,
user_name,
user_registration,
SUM(
rev_timestamp BETWEEN
user_registration AND
) AS activation_edits,
SUM(
rev_timestamp BETWEEN
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
) AS surviving_edits
FROM user
LEFT JOIN revision ON
user_id = rev_user AND
(
rev_timestamp BETWEEN
user_registration AND
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @activation_period DAY), "%Y%m%d%H%i%M") OR
rev_timestamp BETWEEN
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
)
WHERE user_registration BETWEEN @start_date AND @end_date
UNION ALL
SELECT
user_id,
user_name,
user_registration,
SUM(
ar_timestamp BETWEEN
user_registration AND
) AS activation_edits,
SUM(
ar_timestamp BETWEEN
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
) AS surviving_edits
FROM user
LEFT JOIN archive ON
user_id = ar_user AND
(
ar_timestamp BETWEEN
user_registration AND
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @activation_period DAY), "%Y%m%d%H%i%M") OR
ar_timestamp BETWEEN
DATE_FORMAT(DATE_ADD(user_registration, INTERVAL @trial_period DAY), "%Y%m%d%H%i%M") AND
)
WHERE user_registration BETWEEN @start_date AND @end_date
) split_edit_counts
GROUP BY user_id, user_name, user_registration;


Surviving new editor is a standardized user class used to measure the number of first-time editors in a wiki project who continue to edit for a substantial period of time. It's used as a proxy for editor retention.

## Discussion

### The ${\displaystyle t_{1}}$ activation period

The activation period selects users whose retention needs to be measured:

• setting ${\displaystyle t_{1}=0}$ measures the retention (or rather a delayed activation) of newly registered users, regardless of when they started editing.
• by setting ${\displaystyle t_{1}>0}$ to a value other than 0 we restrict the measurement of retention to a subset of users who edited within a given activation period since registration
• by setting ${\displaystyle t_{1}=1}$ we measure the retention of new editors, based on the proposed definition of a new editor: when we do so, we effectively consider surviving new editors as a proper subset of new editors.

### The ${\displaystyle t_{2}}$ trial period

During the trial period, new editors are presumed to be testing out Wikipedia and Wikipedians are testing out the editor. This is the time when non-retained editors tend to leave Wikipedia and when retained editors decide to stick around. The longer the duration of this period, the longer an editor will need to remain active in order to be counted.

### The ${\displaystyle t_{3}}$ survival period

During the survival period, new editors who are retained are expected to show some activity to indicate their survival. The longer the duration of the survival period, the more likely we are to notice some activity from editors who are less consistently active. Longer survival periods are also likely to catch users who left Wikipedia reactivating their accounts.

## Analysis

### Wikis

#### German

Survival rate comparison (dewiki). The proportion of surviving newly registered user is plotted by registration date for a set of different trial and survival periods.

#### English

Survival rate comparison (enwiki). The proportion of surviving newly registered users is plotted by registration date for a set of different trial and survival periods.

### Sensitivity

#### Trial period duration

Trial period factor. The factor of difference between proportions of surviving new editors for different trial periods is plotted (based on trial period = 3 months and locking the survival period to 3 months).

Figure #Trial period factor plots the factor relationship between the # of users who edit after 3 months (horizontal line at ${\displaystyle 1}$) and the number users who edit after 1, 2, 4, 5 and 6 months. It looks like both enwiki and dewiki have a bit of trend where the number of users surviving for 1 or 2 trial months in relation to 3 or more is changing. This is not extreme and therefore might not matter. But it does suggest that even users who survive 1-2 months are getting less likely to survive 3.

#### Survival period duration

Survival period factor. The factor of difference between proportions of surviving new editors for different survival periods is plotted (based on survival period = 3 months and locking the trial period to 3 months).

Figure #Survival period factor plots the factor relationship between the # of users who edit within a 3 month window (horizontal line at ${\displaystyle 1}$) and the number users who edit within 1, 2, 4, 5 and 6 month windows. For the survival period duration, we don't see any meaningful change over time.