Research talk:VisualEditor's effect on newly registered editors/May 2015 study/Work log/2015-06-15

Monday, June 15, 2015[edit]

Latest comment: 9 years ago3 comments1 person in discussion

Time to look at the editing sessions. I'm not expecting to see much of a difference here given that all of our productivity measures were dead even.

changed_and_noswitch means that the session abort was not "switchwith", "switchwithout" or "nochange". These represent our best denominator when looking at proportions.

bucket	via_mobile	users.n	ve.k	ve.p	attempted.k	attempted.p	successful.k	successful.p	changed_and_noswitch.n	changed.n	n
control	0	3421	53	0.007391911	3207	0.4472803	2980	0.4156206	4683	4692	7170
experimental	0	3459	2412	0.3404856	2668	0.3766234	2452	0.3461321	4260	4671	7084
control	1	219	0	0	119	0.3190349	110	0.2949062	281	281	373
experimental	1	211	78	0.2154696	89	0.2458564	83	0.2292818	240	252	362

It looks like 34% of edits sessions were VE. We also see a bit of a difference in overall proportion of successful sessions (41.6% vs. 34.6%). Even if we filter out nochance and switching sessions, then we see 2452/4260 = 57.6% for experimental and 2980/4683 = 63.6% for control.

> prop.test(c(2452,2980), c(4260, 4683))

	2-sample test for equality of proportions with continuity correction

data:  c(2452, 2980) out of c(4260, 4683)
X-squared = 34.2779, df = 1, p-value = 4.778e-09
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.08123271 -0.04028203
sample estimates:
   prop 1    prop 2 
0.5755869 0.6363442

That difference is significant. This is surprising since we did not see significant difference in overall productivity. I think that we might see the result of people *playing* with VE. I can imagine people suddenly noticing the two edit links and spending some time checking out what it looks like to edit with VE on a few different articles. If it were me, I'd spend some time typing and copy-pasting around an article to see what it looked like and then not hit save. It's difficult to know if my experience and intuition is like others, but I think it's safe to conclude that something more complex than "VE doesn't work as well as Wikitext for people" is going on here. --Halfak (WMF) (talk) 15:50, 15 June 2015 (UTC)Reply

Additional questions[edit]

OK. So that roughly concludes my planned evaluation. Now, I'd like to do some descriptive statistics. Since productivity held roughly constant, I want to look at the distribution of productivity for new editors and see what level of productivity newcomers who choose to use VE are general at.

> mean(user_metrics[week_revisions > 0 & bucket=="experimental",]$prop.ve > .5)
[1] 0.4057274

40.6% of experimental editors mostly used VE. That means we should have a good set of observations.

Productive edit density by primary editor. The density of productive edits by the primary editor (VE/Wikitext) is plotted for the experimental bucket.

It looks like the primary difference between mostly Wikitext and mostly VE is that mostly VE has more editors who make at least one productive edit. I'll need to run a test to be sure. --Halfak (WMF) (talk) 16:00, 15 June 2015 (UTC)Reply

       group productive editing     n
1: mostly WT       1086    2332 11203
2: mostly VE       1138    2033  2304
> prop.test(c(1086, 1138), c(2332, 2033))

	2-sample test for equality of proportions with continuity correction

data:  c(1086, 1138) out of c(2332, 2033)
X-squared = 38.0831, df = 1, p-value = 6.779e-10
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.12411876 -0.06401967
sample estimates:
   prop 1    prop 2 
0.4656947 0.5597639

Yes. It looks like 10% more VE users will make at least one productive edit that WT users. This does not suggest that VE increases productive editing -- just that editors who were likely to be productive are more likely to mostly use VE. --Halfak (WMF) (talk) 16:31, 15 June 2015 (UTC)Reply

Time to completion[edit]

Latest comment: 9 years ago2 comments1 person in discussion

I had forgotten that I planned to measure the time between the start and completion of an edit. So, let's do that!

Time to save. The density of time-to-safe is plotted for edit sessions during the VisualEditor A/B Test.

Time to save (by bucket). The density of time-to-safe is plotted for edit sessions during the VisualEditor A/B test by experimental condition.

A t-test of the log values suggests this difference is significant. With an expected difference in the average edit time of ~20 seconds.

t = -4.9302, df = 5253.173, p-value = 8.468e-07

Let's look within the experimental condition at edits saved via the visual editor vs. wikitext.

Time to save (by editor). The density of time-to-safe is plotted for edit sessions during the VisualEditor A/B test by editor within the experimental condition.

It seems like editors who use wikitext are making substantially faster edits. The mode of the wikitext distribution is around 35 seconds, while the mode of the visualeditor distribution is more like 2 minutes. --Halfak (WMF) (talk) 19:22, 15 June 2015 (UTC)Reply

So, I wonder if, when presented with VE, newcomers will perform different types of edits. We might also be seeing save delays due to the time spent waiting for the editor to load or even the newcomers spending more time exploring VE and its complex menus. --Halfak (WMF) (talk) 19:25, 15 June 2015 (UTC)Reply