Research talk:VisualEditor's effect on newly registered editors/May 2015 study/Work log/2015-06-12

Friday, June 12, 2015[edit]

Latest comment: 9 years ago1 comment1 person in discussion

Observation period complete. Starting my final analysis. First, the measures of productivity and survival.

bucket	via_mobile	editing.k	week_editing.k	main_editing.k	week_main_editing.k	talk_editing.k	week_talk_editing.k	user_editing.k	week_user_editing.k	wp_editing.k	week_wp_editing.k	productive.k	week_productive.k	surviving.k	gt_one_hour.k	enabled.k	n
control	1	927	953	794	811	85	94	126	144	15	17	413	435	71	57	6	3670
experimental	0	3237	3386	2387	2502	383	439	812	904	106	133	1659	1778	287	338	9693	9728
control	0	3211	3363	2448	2551	425	499	778	876	86	111	1671	1772	287	343	110	9794
experimental	1	949	979	817	840	93	101	151	165	14	16	430	446	58	52	3775	3779

OK. Time to look for significance.

R:Productive new editor

24h

> prop.test(c(1659, 1671), c(9728, 9794))
...
X-squared = 0, df = 1, p-value = 1

Well, that's about as insignificant as it can get.

Full week

> prop.test(c(1778, 1772), c(9728, 9794))
...
X-squared = 0.0995, df = 1, p-value = 0.7524

Same. If there is any difference, it's too small to measure with 20k observations.

Before we move on, let's measure the total number of productive edits and see if there's a difference there.

> with(
+     user_metrics,
+     wilcox.test(
+         day_productive_edits[bucket == "control"],
+         day_productive_edits[bucket == "experimental"]
+     )
+ )

	Wilcoxon rank sum test with continuity correction

data:  day_productive_edits[bucket == "control"] and day_productive_edits[bucket == "experimental"]
W = 91088984, p-value = 0.6909
alternative hypothesis: true location shift is not equal to 0

> with(
+     user_metrics,
+     wilcox.test(
+         week_productive_edits[bucket == "control"],
+         week_productive_edits[bucket == "experimental"]
+     )
+ )

	Wilcoxon rank sum test with continuity correction

data:  week_productive_edits[bucket == "control"] and week_productive_edits[bucket == "experimental"]
W = 91023213, p-value = 0.8194
alternative hypothesis: true location shift is not equal to 0

With p-values around .69 and .82, we're not seeing any difference we can say is real.

Survival

> prop.test(c(287, 287), c(9728, 9794))
...
X-squared = 0.0016, df = 1, p-value = 0.9682

No significant difference there either. --Halfak (WMF) (talk) 19:27, 12 June 2015 (UTC)Reply

Burden[edit]

Latest comment: 9 years ago1 comment1 person in discussion

Now to look into changes in burden.

bucket	via_mobile	blocked.k	blocked.p	reverted.k	reverted.p	blocked_for_damage.k	blocked_for_damage.p	n
control	1	92	0.02506812	475	0.1294278	57	0.01553134	3670
experimental	0	406	0.0417352	919	0.09446957	259	0.02662418	9728
control	0	415	0.04237288	1001	0.1022054	290	0.02960997	9794
experimental	1	81	0.02143424	474	0.12543	66	0.01746494	3779

Reverts

First, let's do reverted edits per editor. We'll use the wilcoxon test again.

> with(
+     user_metrics,
+     wilcox.test(
+         day_reverted_main_revisions[bucket == "control"],
+         day_reverted_main_revisions[bucket == "experimental"]
+     )
+ )

	Wilcoxon rank sum test with continuity correction

data:  day_reverted_main_revisions[bucket == "control"] and day_reverted_main_revisions[bucket == "experimental"]
W = 91583600, p-value = 0.05568
alternative hypothesis: true location shift is not equal to 0

Wow. Marginal significance here. Rigor tells us we can't believe this result. Either there is a real, but very small effect or non at all. However, we can still talk about the potential implications. Let's say that this result is real. That would mean that current Wikipedians need to revert slightly fewer revisions when VE is enabled than when it is not.

Oops! I forgot to filter out editors who registered via mobile.

> with(
+     user_block_metrics,
+     wilcox.test(
+         day_reverted_main_revisions[!via_mobile & bucket == "control"],
+         day_reverted_main_revisions[!via_mobile & bucket == "experimental"]
+     )
+ )

	Wilcoxon rank sum test with continuity correction

data:  day_reverted_main_revisions[!via_mobile & bucket == "control"] and day_reverted_main_revisions[!via_mobile & bucket == "experimental"]
W = 48045048, p-value = 0.04534
alternative hypothesis: true location shift is not equal to 0

So, it looks like we have crossed the significance threshold, but even if this is, in fact, a real effect, it's very very small.

Block rate

Next, we're going to look at the proportion of users who are blocked for spam/vandalism

> prop.test(c(259, 290), c(9728, 9794))

	2-sample test for equality of proportions with continuity correction

data:  c(259, 290) out of c(9728, 9794)
X-squared = 1.4845, df = 1, p-value = 0.2231
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.007725427  0.001753852
sample estimates:
    prop 1     prop 2 
0.02662418 0.02960997

VE enabled users get slightly fewer spam/vandalism blocks, but there's no significance here. --Halfak (WMF) (talk) 20:12, 12 June 2015 (UTC)Reply