Research:Onboarding new Wikipedians/Rollout

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
VisualEditor - Icon - Check.svg
This page documents a completed research project.


On February 11th, Extension:GettingStarted was deployed on 29 wikis. Later, it was updated to the current state of 30 wikis, including all of the top 10 Wikipedias by pageviews.

The purpose of this study is to measure the scale at which GettingStarted operates (e.g. how many newcomers on Wikimedia Projects received a GettingStarted intervention?) and to get a sense for the impact that the new features have on newcomer behavior.

Research questions[edit]

RQ 1: How is GettingStarted being used?

  1. How is GettingStarted being used?
    1. How many newly registered users saw each CTA?
    2. How many of those editors edit -- through GS or otherwise?
  2. Are GettingStarted edits reverted more often than non-GettingStarted edits?

RQ 2: How has GettingStarted affected newcomer activation and productivity?

  1. How did the proportion of new editors (editor activation) change after GettingStarted was deployed?
  2. How did the proportion of productive new editors (editor productivity) change after GettingStarted was deployed?

Methods[edit]

Code repository: https://github.com/halfak/Measuring-the-impact-of-GettingStarted

Deployment wikis[edit]

Based on config and Server admin log we can determine when GettingStarted was deployed.

GettingStarted deployments
wiki GettingStarted deployment Suggestions deployed
astwiki 2014-02-11 18:13:00 NA
bswiki 2014-02-11 18:13:00 NA
cawiki 2014-02-11 18:13:00 NA
dawiki 2014-02-11 18:13:00 NA
dewiki 2014-02-11 18:13:00 NA
elwiki 2014-02-11 18:13:00 NA
enwiki 2014-02-11 18:13:00 2014-02-11 18:13:00
eswiki 2014-02-11 18:13:00 2014-02-11 18:13:00
fawiki 2014-02-11 18:13:00 NA
frwiki 2014-02-11 18:13:00 NA
fowiki 2014-02-11 18:13:00 NA
glwiki 2014-02-11 18:13:00 NA
hewiki 2014-02-11 18:13:00 NA
huwiki 2014-02-11 18:13:00 NA
iswiki 2014-02-11 18:13:00 NA
itwiki 2014-02-11 18:13:00 NA
kowiki 2014-02-11 18:13:00 NA
lbwiki 2014-02-11 18:13:00 NA
mkwiki 2014-02-11 18:13:00 NA
mlwiki 2014-02-11 18:13:00 NA
nlwiki 2014-02-11 18:13:00 NA
plwiki 2014-02-11 18:13:00 NA
ptwiki 2014-02-11 18:13:00 NA
ruwiki 2014-02-11 18:13:00 NA
simplewiki 2014-02-11 18:13:00 NA
svwiki 2014-02-11 18:13:00 2014-02-27 00:18:00
viwiki 2014-02-11 18:13:00 NA
ukwiki 2014-02-11 18:13:00 2014-02-11 18:13:00
zhwiki 2014-02-11 18:13:00 2014-02-11 18:13:00
jawiki 2014-02-27 00:18:00 NA

Measuring usage[edit]

In order to measure the usage of GettingStarted, we observe and compare the number of newly registered users across Wikimedia projects with the number of users with a recorded impression of GettingStarted (see Schema:GettingStartedRedirectImpression). We also observe the number of edits made via GettingStarted through the application of a change tag: "gettingstarted edit".

Assuming a natural experiment[edit]

In order to address RQ 2, we'll be assuming that a natural experiment took place immediately after GettingStarted was deployed. We take advantage of this by comparing metrics of new user activation and productivity before and after deployment. Since the only way to take advantage of GettingStarted's functionality is to be served a CTA immediately after registering an account, there shouldn't be substantial concern about measuring those editors who registered immediately before GettingStarted's deployment.

As opposed to controlled experiments, natural experiments have the potential for confounds to affect inference about causation. A trend that was taking place in a wiki independent of the deployment of the GettingStarted deployment will look like an effect of GettingStarted in the analysis. Thus, it's important when viewing the results to consider this potential issue.

Sample periods[edit]

The p-value of a '"`UNIQ--postMath-00000001-QINU`"' test is plotted by number of observation for differing levels of baseline and change in proportion.  A horizontal line is plotted at p=0.05 and a vertical line is plotted at the # of observations to be sampled.
Power analysis. The p-value of a test is plotted by number of observation for differing levels of baseline and change in proportion. A horizontal line is plotted at p=0.05 and a vertical line is plotted at the # of observations to be sampled.

In order to compare new editor fitness before and after deployment, we sampled newly registered users from the two weeks immediately before and after the deployment dates. Figure #Natural experiment sample periods depicts these sample periods visually.

A conceptual diagram depicts the sample periods before and after deployment of mw:Extension:GettingStarted.
Natural experiment sample periods. A conceptual diagram depicts the sample periods before and after deployment of mw:Extension:GettingStarted.

In order to determine how many observations would need to be sampled, we performed a power analysis for several baseline rates and expected changes. Figure #Power analysis plots the p-value of a Chi-squared test for various levels of baselines and changes. We chose a minimum number of observations at 500 since that was the smallest number of observations that will still let us identify significance for large effects. We define "large effects" as twice the observed effect in English Wikipedia for GettingStarted (which ranged from 1.5-3% depending on the metric[1], so we settled on 5%). 16 wikis had at least 500 newly registered users in the sample periods (es, fr, zh, ru, de, pt, it, fa, nl, pl, vi, sv, uk, ko, hu, he, el). We set the maximum number of observations at 2000 since most changes would appear to be significant at that number of observations and setting an upper bound reduces the processing time necessary.

Comparison[edit]

Boolean measures

Differences in proportions between before and after periods are identified using a en:Chi-squared test.

Scale measures

Differences in expected values between before and after periods are identified using a logged en:t-test.

Results[edit]

RQ 1: How is GettingStarted being used?[edit]

What proportion of users saw/used a GettingStarted CTA?[edit]

A proportional funnel is shown for the flow from newly registered users on all projects to wikis with GettingStarted installed (30 wikis) to making edits with GS.
Group funnel proportions. A proportional funnel is shown for the flow from newly registered users on all projects to wikis with GettingStarted installed (30 wikis) to making edits with GS.

In order to get a sense for what proportion of newly registered users were affected by the deployment of GettingStarted, ran a set of queries to count the number of newly registered users we saw across all Wikimedia projects and tracked their activities as they navigated various funnels that GettingStarted provides. Figure #Group funnel proportions displays the proportion and raw counts of users who made it to each step in the funnel.

Who saw GettingStarted's CTA? Since the GettingStarted experience is currently only available for desktop users. (TODO: link to design docs for GS like experience on mobile) Of the 336,310 newly registered user who registered during our 30 day period after deployment, 273,169 (81.23%) of them registered though the desktop interface. 218,968 of these desktop users registered on one of the 30 wikis were GettingStarted was deployed. 143,627 of the desktop users who registered on GettingStarted wikis saw a GettingStarted CTA. In other words:

42.7% of newly registered users across all projects had the opportunity to take advantage of GettingStarted.

Which CTAs did they see? Of these users who saw a change to the their post-registration experience, the plurality (46.49%) saw the CTA that only asked them if they would like to see suggested tasks for them to perform (see Suggest only CTA). Most often, the "Edit this page" option was not available because the redirect page was a protected article (54.55%) or a page in the Project namespace. The next most common CTA was the combined "Edit this page or Find easy tasks" (see Edit & Suggest CTA). 39.6% of users who saw any CTA saw this one. Finally, 13.91% saw the CTA with only the option to "Edit this page" (see Edit only CTA). These users were predominantly on wikis that lacked suggested tasks (98.9%).


Reverts of GettingStarted edits[edit]

Onboarding.rollout.gs edit.revert rate comparison.svg
Comparison of revert rates. 

One of our concerns with tagging edits "via Getting Started edit suggestions" was that it might draw additional attention from Wikipedians and encourage extra scrutiny of edits made through GettingStarted. If GS tagged edits are receiving extra scrutiny, then we'd expect the rate of reverts for these edits to be higher. To check this hypothesis, we gathered all of the 1st edits performed by newcomers who registered during our 30 day period and detected which revisions were reverted within 48 hours.

Figure #Comparison of revert rates plots the difference between the revert rate of 1st edits not made through GettingStarted with the revert rate of 1st edits made through GettingStarted. Note that in all but a couple of cases, the 95% confidence interval's error bars cross the zero line. This means that there's no significant difference between the revert rate for GettingStarted and non-GettingStarted edits on those wikis. However, there are three Wikis that did see significant differences: viwiki and cawiki, saw higher revert rates for GS edits and enwiki saw lower revert rates for GS edits.

It's important to note that, which such a high number of tests at a 95% error cutoff, we should expect to see a 1-2 wikis report a Type I error. With this in mind, the significant differences observed for viwiki and cawiki should be taken with a grain of salt. However, with English Wikipedia, we had such a large number of observations that the result is clearly significant. It appears that GettingStarted edits are reverted significantly less often than than non-GettingStarted edits.

RQ 2: How has GettingStarted affected newcomer activation and productivity?[edit]

In order to look for evidence of changes in the activation and productivity due to the introduction of GettingStarted, we used an array of metrics to measure newcomer performance before and after the deployment of GettingStarted.

The figures below plot the difference between metrics before and after the deployment. When the plotted value is above zero, that means an increase in the metric was observed. Overall, the results fail to demonstrate a clear difference in the before and after state of these Wikis.

While some wikis show significant differences under some metrics, this type of statistical error is expected to happen with 95 confidence intervals in about 1/20 tests. Here, we see 10 instances of significant results out of 112 tests:

  • Dewiki showed a significant drop in the rate of new editors
  • Plwiki showed a significant increase in the rate of returning new editors
  • Eswiki, Itwik and Plwiki show a significant increase in the number of productive edits newcomers performed in their first day.
  • Plwiki saw a significant increase in the number of newcomer edit sessions while Frwiki saw a significant decrease
  • Plwiki and Ukwiki saw a significant increase in the amount of time spent editing while Frwiki saw a significant decrease

Given the lack of a clear trend cross-wikis and the lack of an obvious correlation between the availability of suggested tasks in the user experience and performance outcomes, it's not clear from these results that GettingStarted is having a measurable effect in the short term. Future work may reduce noise and potential confounds by running a controlled experiment on these wikis.

Boolean measures[edit]

The difference in the proportions of new editors (main NS only) before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
Difference in new editor rates (ns0). The difference in the proportions of new editors (main NS only) before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
The difference in the proportions of productive new editors before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
Difference in productive editor rates. The difference in the proportions of productive new editors before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
The difference in the proportions of returning editors before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
Difference in returning editor rates. The difference in the proportions of returning editors before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

Scalar measures[edit]

The difference in the log mean article revisions saved in newcomers first day before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
Difference in 24h article edits. The difference in the log mean article revisions saved in newcomers first day before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
The difference in the log mean productive edits saved in newcomers first day before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
Difference in 24h productive edits. The difference in the log mean productive edits saved in newcomers first day before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
The difference in the log mean edit sessions saved in newcomers first week before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
Difference in edit sessions (first week). The difference in the log mean edit sessions saved in newcomers first week before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
The difference in the log mean approximate time spent editing saved in newcomers first week before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.
Difference in time spent editing (first week). The difference in the log mean approximate time spent editing saved in newcomers first week before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

References[edit]

  1. Research:OB4