Research talk:VisualEditor's effect on newly registered editors/May 2015 study/Work log/2015-06-15

From Meta, a Wikimedia project coordination wiki

Monday, June 15, 2015[edit]

Time to look at the editing sessions. I'm not expecting to see much of a difference here given that all of our productivity measures were dead even.

changed_and_noswitch means that the session abort was not "switchwith", "switchwithout" or "nochange". These represent our best denominator when looking at proportions.

bucket via_mobile users.n ve.k ve.p attempted.k attempted.p successful.k successful.p changed_and_noswitch.n changed.n n
control 0 3421 53 0.007391911 3207 0.4472803 2980 0.4156206 4683 4692 7170
experimental 0 3459 2412 0.3404856 2668 0.3766234 2452 0.3461321 4260 4671 7084
control 1 219 0 0 119 0.3190349 110 0.2949062 281 281 373
experimental 1 211 78 0.2154696 89 0.2458564 83 0.2292818 240 252 362

It looks like 34% of edits sessions were VE. We also see a bit of a difference in overall proportion of successful sessions (41.6% vs. 34.6%). Even if we filter out nochance and switching sessions, then we see 2452/4260 = 57.6% for experimental and 2980/4683 = 63.6% for control.

> prop.test(c(2452,2980), c(4260, 4683))

	2-sample test for equality of proportions with continuity correction

data:  c(2452, 2980) out of c(4260, 4683)
X-squared = 34.2779, df = 1, p-value = 4.778e-09
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.08123271 -0.04028203
sample estimates:
   prop 1    prop 2 
0.5755869 0.6363442

That difference is significant. This is surprising since we did not see significant difference in overall productivity. I think that we might see the result of people *playing* with VE. I can imagine people suddenly noticing the two edit links and spending some time checking out what it looks like to edit with VE on a few different articles. If it were me, I'd spend some time typing and copy-pasting around an article to see what it looked like and then not hit save. It's difficult to know if my experience and intuition is like others, but I think it's safe to conclude that something more complex than "VE doesn't work as well as Wikitext for people" is going on here. --Halfak (WMF) (talk) 15:50, 15 June 2015 (UTC)[reply]


Additional questions[edit]

OK. So that roughly concludes my planned evaluation. Now, I'd like to do some descriptive statistics. Since productivity held roughly constant, I want to look at the distribution of productivity for new editors and see what level of productivity newcomers who choose to use VE are general at.

> mean(user_metrics[week_revisions > 0 & bucket=="experimental",]$prop.ve > .5)
[1] 0.4057274

40.6% of experimental editors mostly used VE. That means we should have a good set of observations.

The density of productive edits by the primary editor (VE/Wikitext) is plotted for the experimental bucket.
Productive edit density by primary editor. The density of productive edits by the primary editor (VE/Wikitext) is plotted for the experimental bucket.

It looks like the primary difference between mostly Wikitext and mostly VE is that mostly VE has more editors who make at least one productive edit. I'll need to run a test to be sure. --Halfak (WMF) (talk) 16:00, 15 June 2015 (UTC)[reply]


       group productive editing     n
1: mostly WT       1086    2332 11203
2: mostly VE       1138    2033  2304
> prop.test(c(1086, 1138), c(2332, 2033))

	2-sample test for equality of proportions with continuity correction

data:  c(1086, 1138) out of c(2332, 2033)
X-squared = 38.0831, df = 1, p-value = 6.779e-10
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.12411876 -0.06401967
sample estimates:
   prop 1    prop 2 
0.4656947 0.5597639 

Yes. It looks like 10% more VE users will make at least one productive edit that WT users. This does not suggest that VE increases productive editing -- just that editors who were likely to be productive are more likely to mostly use VE. --Halfak (WMF) (talk) 16:31, 15 June 2015 (UTC)[reply]

Time to completion[edit]

I had forgotten that I planned to measure the time between the start and completion of an edit. So, let's do that!

The density of time-to-safe is plotted for edit sessions during the VisualEditor A/B Test.
Time to save. The density of time-to-safe is plotted for edit sessions during the VisualEditor A/B Test.
The density of time-to-safe is plotted for edit sessions during the VisualEditor A/B test by experimental condition.
Time to save (by bucket). The density of time-to-safe is plotted for edit sessions during the VisualEditor A/B test by experimental condition.

A t-test of the log values suggests this difference is significant. With an expected difference in the average edit time of ~20 seconds.

t = -4.9302, df = 5253.173, p-value = 8.468e-07

Let's look within the experimental condition at edits saved via the visual editor vs. wikitext.

The density of time-to-safe is plotted for edit sessions during the VisualEditor A/B test by editor within the experimental condition.
Time to save (by editor). The density of time-to-safe is plotted for edit sessions during the VisualEditor A/B test by editor within the experimental condition.

It seems like editors who use wikitext are making substantially faster edits. The mode of the wikitext distribution is around 35 seconds, while the mode of the visualeditor distribution is more like 2 minutes. --Halfak (WMF) (talk) 19:22, 15 June 2015 (UTC)[reply]

So, I wonder if, when presented with VE, newcomers will perform different types of edits. We might also be seeing save delays due to the time spent waiting for the editor to load or even the newcomers spending more time exploring VE and its complex menus. --Halfak (WMF) (talk) 19:25, 15 June 2015 (UTC)[reply]