Jump to content

Research:Asking anonymous editors to register

From Meta, a Wikimedia project coordination wiki
18:22, 15 July 2014‎ (UTC)
  • Matt Flaschen
  • Kaity Hammerstein
  • Rob Moen
  • Sam Smith
  • Moiz Syed
  • Steven Walling
Duration:  2014-03 – 2014-08
This page documents a completed research project.

Our research in to the volume and impact of anonymous editing suggests that, at least on English Wikipedia and likely on our other large projects, users who edit anonymously just before registration are more productive.[1] We also know that anonymous editors are a large source of new registrations on many Wikipedias,[2] and that many community members recall editing anonymously prior to signing up.[3]

The preceding facts lead us to hypothesize that anonymous editors are likely a key group to focus on for new user acquisition. In this experiment, we'll explore the effectiveness of producing a call to action (CTA) to register when anonymous editors attempt to edit a page. In this study, we'll be trying to answer one overarching research question, How does asking anonymous editors to register affect their behavior..

In this document, we first described the variants of a CTA that we designed for anonymous users. Then we'll state and justify a set of hypotheses around these variants in relation to a control condition that reflects the current state of Wikipedia. We'll then introduce a methodology that would allow us to test these hypotheses and report the results of a week-long experiment. Finally, we'll conclude with a discussion of the results and the implications that they have for future work in this area.


Measuring user behavior via random token[edit]

In order to distinguish between distinct clients as they move between IP addresses and registered accounts, we will be assigning users randomly-generated tokens that will identify a unique browser/device during the observation and experiment periods. This token will be assigned when a user edits, either via wikitext or VisualEditor. Due to the potential sensitivity of this data, it will be retained for up to 90 days (per the Data retention guidelines) and only used to measure aggregate patterns (e.g. number of signups per experimental bucket), after which all tokens will be purged from analytics databases.

Note: There are limitations in the assumption that one token equals one user. In reality, it is closer to one device. Possible complications include shared computers (some reset cookies between sessions, others don't, which have different effects), users who habitually use private browsing or automatically clear cookies on browser close, and users who switch between browsers and/or computers.


Represents activation of edit links (tab or section) and the "create account" link. Always the first event in a "flow". Helps us understand how the UI is being used.
Represents a view of a CTA -- both pre- or post- included.
Represents activation of buttons (edit/signup/dismiss) in the CTA.
Represents a visit to the account creation form. This will allow us to associate account attempts with experimental buckets
Represents a successful account creation. This will allow us to associate account creations with experimental buckets
Represents a successful change to a page. This will allow us to associate edits with our experimental buckets.

Experimental periods[edit]

Chi-squared power analysis. 

Figure #Chi-squared power analysis shows the expected p-value for various expected proportions and differences ("change") between conditions. Assuming we're using boolean/proportion metrics (e.g. the proportion of anons who registered an account during the experiment) we can use this figure to reason about how many observations will be needed to detect effects of different sizes.

To project the potential pool of anonymous editors, we make use of the cu_changes table to check how many unique IPs and IP/User-agent pairs made edits in a week period. Our estimates suggest that 92k - 97k anonymous editors saved changes during the the last week of March, 2014 (depending on whether different IPs or IP & UA pairs were assumed to identify a user). Given that #Chi-squared power analysis suggests that we should attain significance for even sub-1% changes, it's clear that we should be able to proceed with a week-long experiment.

Observation periods[edit]

The timeline of experimentation and observation is presented.
Timeline. The timeline of experimentation and observation is presented.

We track anonymous users for two observation periods -- one period before the experiment and one period after the experiment.

Pre-experiment observation. One of the primary concerns raised against an active CTA is that it might discourage editors who prefer to edit anonymously. We'd like to look for evidence that regular anonymous editors are demotivated or confused by either signup CTA. In order to identify regular anonymous editors (and other, unforeseen anonymous editing patterns) before we intervene, we'll measure anonymous user behavior for up to two weeks before deploying the signup CTAs.

Post-experiment observation. In order to observe the effect that the experimental CTAs have on the activities of editors, we'll track their contribution patterns for at least a week (and up to two weeks) after the experimental period. After this observation period is complete, we'll cease to store identifying tokens with user actions on the site and analysis of the completed dataset will begin.


In order to answer our research questions above, and to ensure that we're not inadvertently discouraging valuable contributions to the encyclopedia, we will generate activity metrics before and after recorded impressions of signup CTAs. For the control, this pivoting event will occur when the anonymous user first clicks 'edit'.

  • Signup rate - proportion of "edit" clicking logged out users who signup
  • Edit completion rate - edits saved per "edit" clicks
  • Edit volume - average number of edits completed/day
  • Productive edits - the number of productive edits completed/day
  • Time spent editing - time spent editing/day
  • Activation rate - proportion of new signups that complete 1+ article edits in 24hrs since registering
  • Active editor rate - proportion of new signups that complete 5+ edits in a week after registering
  • Revert rate - proportion of article revisions reverted (in NS0)
  • Block rate - the proportion of newly registered editors blocked in a short amount of time

Pure anons[edit]

In order to explore the effect that the intervention would have on the target population -- anonymous editors who had not yet registered an account before the start of the experiment -- we filtered our dataset of token'd users to remove those who performed any action with a previously registered account during the observation period (during or before the experimental period). We also filter tokens with no edit link click event, since these users did not even try to edit during the experimental period. The remaining token'd users we refer to as "pure anon editors". The entirety of the analysis presented in this report only deals with these pure anon editors.


Study 1: Pre-edit vs. post-edit CTAs[edit]

Pre-edit CTAs gathered far more new registrations (+200%) and new active editors than post-edit; however, overall productivity fell by 25% due to the decreased probability of completing edits. Given that the CTAs did not make it clear that anonymous editors do not need to register an account in order to edit, the decrease in productivity might be due a misunderstanding -- that registration was now necessary to edit.

Hypothesis 6: Anonymous editors who are shown a CTA that makes "continue editing" more clear will be more likely to complete edits.

Study 2: Increased prominence of "continue editing"[edit]

The CTAs gathered slightly more registrations than the control (+30% for v1, +15% for v2), but showed a substantially smaller improvement over the control than was seen in the first study. The productivity level of editors in the pre-edit v2 condition was significantly higher than in the pre-edit v1 condition, but both UIs saw lower productivity rates than the control (-30% for v1 and -9% for v2).

Study 3: Moving the intervention later in the workflow[edit]

on hold