It's been nearly two years since we ran an initial study of VisualEditor's effect on newly registered editors. While most of the results of this study were positive (e.g. workload on Wikipedians did not increase), we still saw a significant decrease in the newcomer productivity. In the meantime, the Editing team has made substantial improvements to performance and functionality. In this study, I'll analyze a new experiment designed to test the effects of enabling this improved VisualEditor software for newly registered users by default.
As in the previous experiment, I focus on three aspects of Wikipedia work that we expect VisualEditor to affect.
- The productivity & survival of new editors
- The quality control burden on current Wikipedians
- The ease of editing
- Summary of findings (if VE were enabled for all newcomers, I expect)
- no difference in newcomer productivity or short-term survival
- small reductions in burden on current Wikipedians
- newcomers will be less likely to save edits they start and they'll spend ~18 seconds longer editing before hitting save when they do
I suspect the reason for the negative result re. ease of editing is because of newcomers experimenting with a fancy new editor – not because of real struggle – or we would have seen decreases in productivity measurements and critical issues raised during user testing.
In order to test the effects of VisualEditor on newly registered users, I worked with the mw:Editing team to run a controlled test on the English Wikipedia. During the test, 50% of newly registered users will be bucketed into a condition where VisualEditor (VE) is enabled by default (VE enabled) and the other 50% will receive the current experience (control).
- experimental: VisualEditor enabled by default (user preference switched on at account creation time)
- control: no change. The wikitext editor is the only default editor.
In this study, I explore three aspects of VisualEditors hypothesized effects.
Newcomer productivity and survival
To look for evidence on effects that VisualEditor may have on a newcomers success and engagement, I measure a few different aspects of productivity and engagement
- productive edits per user: based on productive new editors
- time spent editing: based on edit session duration estimation (note that "edit session" here refers to a collection of edits while "intra-edit session" below refers to a set of actions performed in the service of a single edit)
- short-term survival rate: based on surviving new editors, I use trial & survival periods of 3 and 4 days to look for evidence of short-term retention effects
Burden on current Wikipedians
To look for evidence of change in the amount of burden inflicted on Wikipedians, I examine how much quality control work Wikipedians must do for new editors.
- block rate: # newcomers who are blocked / # newcomers who edit
- reverts per editor: # of reverted edits / # newcomers who edit
- These metrics measure the cost in blocks and reverts per editor for each new editor. I will also explore the proportion of reverts that are performed with automated tools like en:User:Cluebot NG, en:WP:Huggle and en:WP:Stiki since they are specifically designed to reduce the effort necessary to revert damaging edits.
Ease of editing
To look for evidence that newcomers find editing easier, I use Schema:Edit to observe two metrics of their "intra-edit sessions" – discrete interactions with a particular editing interface.
- edit duration: Time taken to complete edit ("ready" --> "saveAttempt")
- edit completion rate: Proportion of edit attempts where the user attempts to save. ("ready" --> "saveAttempt" | abort.type != "nochange")
- In this case, I filter loads of the edit pane (VE or Wikitext) where the user aborts without making a change since copying (Ctrl+C) from the edit pane without intending to make an edit is a common usage pattern of Wikitext editor.
- May 28th — Start of experiment bucketing. 50% of newly registered users on English Wikipedia have VE enabled by default
- June 4th — End of experimental bucketing. VE no longer enabled by default for newly registered users
- June 11th — Observations complete. Analysis begins.
- June 16th — Analysis complete. Preliminary results posted here.
Filtered out mobile-registered editors
In my analysis, I used Schema:ServerSideAccountCreation to filter out users who signed up via the mobile interface since they tend to also edit on mobile and would therefore not have the chance to notice that VE was enabled. These users represent a smaller proportion of editors and edits (~25%) and mostly made no use of VE even when bucketed in the experimental condition. See my notes from 2015-06-15.
Newcomer productivity and survival
- TL;DR: No significant difference.
|New editor prop||0.348||0.343|
|1+ article edit||2502||2551|
|1+ article prop||0.257||0.260|
|Productive new editors||1778||1772|
|Productive new editor prop||0.183||0.181|
|>= 1 hour editing||338||343|
|>= 1 hour prop||0.0347||0.035|
|short-term surv. prop||0.030||0.029|
|VE enabled prop||0.996||0.011|
|Newly registered users||9728||9794|
In order to look for effects that enabling VE would have on newcomer productivity and survival, we compare both proportions of editors who reach a threshold (productive == makes at least one article edit that is not reverted) as well as raw counts per editor. When looking at editor activity, the distribution is highly skewed towards zero. That means we can't use a simple t-test to compare populations between conditions. So, I opted to use a wilcoxon non-parametric test. For proportions, I make use of the χ² test.
- Total edits: Wilcoxon = 91061594, p-value = 0.802
- At least one edit: χ² = 0.455, p-value = 0.500
- Article edits: Wilcoxon = 91474930, p-value = 0.260
- At least one article edit: χ² = 0.255, p-value = 0.613
- Productive edits: Wilcoxon = 91023213, p-value = 0.819
- At least one productive edit: χ² = 0.100, p-value = 0.7524
Time spent editing
- Total seconds in session: Wilcoxon = 90916937, p-value = 0.982
- At least one hour: χ² = 0.004, p-value = 0.947
- One edit 3-7 days after registration: χ² = 0.001, p-value = 0.968
In no case does the test show a significant difference between conditions. Given the breadth of measures, this suggests that enabling VE had no effect on productivity or survival in newly registered users' first week of editing.
Burden on current Wikipedians
- TL;DR: Slight decrease in burden on current Wikipedians
To examine the potential burden that enabling VE would have on current Wikipedians, we examined two aspects of burdensome activity: reverted edits and block rates.
To examine revert rates, we compared the raw counts of reverted edits per new editor using a wilcoxon test and found that editors in the experimental condition produced slightly fewer revisions that would need to be reverted (W = 5869081, p-value = 0.007). While this effect is statistically significant, it's also quite small.
We saw similar results when examining block rates. new editors in the experimental condition were slightly less likely to be blocked than users in the control. However, unlike reverts, this small difference was not significant (χ² = 2.014, p-value = 0.156).
|blocked (prop)||blocked for damage (prop)||editing users|
|experimental||406 (0.120)||259 (0.076)||3386|
|control||415 (0.123)||290 (0.086)||3363|
Ease of editing
- TL;DR: Lower completion rates and more time to save
On order to look for evidence that VE affected ease of editing, we measure both the edit completion rate and the time spent making an edit. To examine this, we make use of Schema:Edit. One hurdle to performing this analysis is that, in order to log events down to a reasonable rate, wikitext sessions are sampled at a 25% rate, but visualeditor sessions are not. In order to make sure that this did not bias the results, I re-sampled visualeditor sessions at 25%. See my notes from 2015-05-28 for how I did this and checked that it worked correctly. There's another issue. Some editors have a very large number of intra-edit sessions while most have few. If I were to naively analyze the dataset, we'd find that the editors with many sessions would dominate the results. So, I sampled the first 5 recorded sessions per user.
Edit completion rates
To examine edit completion rates, I measured the proportion of intra-edit sessions that resulted in a "saveAttempt". I was careful to filter out intra-edit sessions where the user made no change before exiting (
action.abort.type = "nochange") – which I expect is common for editors using wikitext editor to copy-paste some markup – or where the user was simply switching between editors (
action.abort.type in ("switchwith", "switchwithout")). A χ² test shows that both the rate at which editors attempted and successfully saved edits was significantly lower when VE was enabled (attempt/session: χ² = 31.1, p-value < 0.001, success/session: χ² = 32.9, p-value < 0.001). This is very surprising result given that we did not see productivity loss when VE was enabled.
|users||intra-edit sessions||"visualeditor" (prop)||attempted save (prop)||successful save (prop)|
|control||3421||4677||51 (0.011)||3204 (0.685)||2977 (0.637)|
|experimental||3471||4245||2396 (0.564)||2669 (0.629)||2449 (0.577)|
Time to save
To examine the time that it took editors to complete their edits, I look at the time between the start of an intra-edit session and the first "saveAttempt". Luckily, this measurement manifests as a simple log-normal shape (see #Time to save (by bucket) and #Time to save (by editor)), so I can use a parametric t-test to check for a difference in the log-scaled values. As the plots imply and t-test confirms, editors with VE enabled spent substantially longer to save their edits (t = -4.9083, p-value < 0.001). VE enabled newcomers took ~ 18 seconds longer to save their edits on average than users in the control.
In order to dig into this, I also looked at the save timing within the experimental condition. #Time to save (by editor) shows to clearly different modes. Editors who used Wikitext to make their edits usually took about 35 seconds, while users of VisualEditor usually took 2 minutes.
I suspect that, given the lack of a drop in productivity, it's more likely that these two measures (completion rate and time to save) are insufficient for exploring ease of editing. Personally, I suspect that the reason for the implied lackluster performance of VE in these measures, but not productivity, is likely due to users exploring VE's functionality – which would result in more time spent with the editor open and more intra-edit sessions where pressing "save" was never the user's intention.