Research:Productive new editor
- = 1 productive edit
- = 1 day
Productive new editor is a standardized user class used to measure the number of first-time editors in a wiki project over time who make productive contributions. It's used as a proxy for editor productivity, and to a lesser extent, editor activation. A "productive new editor" is a new editor who saves revisions to content namespace pages that are not reverted.
- 1 Discussion
- 2 Analysis
- 3 Usage
- 4 References
Excluding edits to deleted content
Spammers and other non-productive new editors tend to create articles that are non-productive and those articles tend to be deleted rather than the edits to the articles being reverted (and therefore excluding them from the productive edit criteria). Edits to articles that are deleted by the end of a new editor's first week since registration are not included in counts of productive edits.
The n productive edits threshold
Like choosing an for any metric based on counts (e.g. new editor and active editor), choosing a threshold is somewhat arbitrary. Choosing a higher threshold will result in a smaller proportion of newly registered users being considered productive.
The t time cutoff
There are a few ways that the timespan for identifying productive edits can be drawn. The two most common ways are based on time bounds and events. A time-bounded approach is based on the use of some cutoff to limit observations to a certain amount of time after a user registered their account. An event-based approach will use some event as the starting point to count user contributions. Another candidate time-span includes edits that a newcomer performed in their first edit session. Since productive new editor qualifies the activity of a new editor we set , which effectively makes the class of productive new editors a proper subset of new editors. We analyze the effect of choosing a different value for below.
Time to revert cutoff
Because a revert can theoretically occur years after the original edit, r en:Censoring_(statistics) everts are only counted if they occurred within 48 hours of the original edit. For more details, see Research:Revert#Time to revert cutoff.
- This metric represents productivity as a binary attribute of a user, it does not measure how productive a new editor is. New editors who make many productive edits and contribute substantial amounts of content will look identical (under this metric) to new editors who fix a few typos.
- The most clever vandalism/vandals may go unnoticed for more than 48 hours.
Generation of this metric will need to be delayed by after users' registration dates in order to allow days for newly registered users to make edits and an additional for other editors to have a chance to revert them. In the case of the WMF Standard parameterization, this works out to .
Besides the variables describing a productive edit, there are two variables used in this metric:
- The value of
- The value of
Given that the raw number of productive new editors is highly dependent on the raw number of new editors, this metric is best examined as a proportion. Given that identifying reverts is computationally difficult, the following plots were generated by randomly sampling newly registered users stratified by registration month.
Factor comparison of n and t
User interface experiments at the Wikimedia Foundation apply the productive new editor metric across an entire cohort to generate the productive newcomer proportion. For example, in an A/B test, we might compare the proportion of productive new editors in a control group to a test group. This allows us to have a very basic understanding of whether the A/B test led to more new editors making productive contributions.
See the following research reports for examples of this type of usage:
- Article feedback/Stage 3/Conversion and newcomer quality
- VisualEditor's effect on newly registered_editors/Results
- Onboarding new Wikipedians/OB6