Research talk:VisualEditor's effect on newly registered editors/May 2015 study/Work log/2015-06-12

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Friday, June 12, 2015[edit]

Observation period complete. Starting my final analysis. First, the measures of productivity and survival.

bucket via_mobile editing.k week_editing.k main_editing.k week_main_editing.k talk_editing.k week_talk_editing.k user_editing.k week_user_editing.k wp_editing.k week_wp_editing.k productive.k week_productive.k surviving.k gt_one_hour.k enabled.k n
control 1 927 953 794 811 85 94 126 144 15 17 413 435 71 57 6 3670
experimental 0 3237 3386 2387 2502 383 439 812 904 106 133 1659 1778 287 338 9693 9728
control 0 3211 3363 2448 2551 425 499 778 876 86 111 1671 1772 287 343 110 9794
experimental 1 949 979 817 840 93 101 151 165 14 16 430 446 58 52 3775 3779

OK. Time to look for significance.

R:Productive new editor

24h

> prop.test(c(1659, 1671), c(9728, 9794))
...
X-squared = 0, df = 1, p-value = 1

Well, that's about as insignificant as it can get.

Full week

> prop.test(c(1778, 1772), c(9728, 9794))
...
X-squared = 0.0995, df = 1, p-value = 0.7524

Same. If there is any difference, it's too small to measure with 20k observations.

Before we move on, let's measure the total number of productive edits and see if there's a difference there.

> with(
+     user_metrics,
+     wilcox.test(
+         day_productive_edits[bucket == "control"],
+         day_productive_edits[bucket == "experimental"]
+     )
+ )

	Wilcoxon rank sum test with continuity correction

data:  day_productive_edits[bucket == "control"] and day_productive_edits[bucket == "experimental"]
W = 91088984, p-value = 0.6909
alternative hypothesis: true location shift is not equal to 0

> with(
+     user_metrics,
+     wilcox.test(
+         week_productive_edits[bucket == "control"],
+         week_productive_edits[bucket == "experimental"]
+     )
+ )

	Wilcoxon rank sum test with continuity correction

data:  week_productive_edits[bucket == "control"] and week_productive_edits[bucket == "experimental"]
W = 91023213, p-value = 0.8194
alternative hypothesis: true location shift is not equal to 0

With p-values around .69 and .82, we're not seeing any difference we can say is real.

Survival
> prop.test(c(287, 287), c(9728, 9794))
...
X-squared = 0.0016, df = 1, p-value = 0.9682

No significant difference there either. --Halfak (WMF) (talk) 19:27, 12 June 2015 (UTC)

Burden[edit]

Now to look into changes in burden.

bucket via_mobile blocked.k blocked.p reverted.k reverted.p blocked_for_damage.k blocked_for_damage.p n
control 1 92 0.02506812 475 0.1294278 57 0.01553134 3670
experimental 0 406 0.0417352 919 0.09446957 259 0.02662418 9728
control 0 415 0.04237288 1001 0.1022054 290 0.02960997 9794
experimental 1 81 0.02143424 474 0.12543 66 0.01746494 3779
Reverts

First, let's do reverted edits per editor. We'll use the wilcoxon test again.

> with(
+     user_metrics,
+     wilcox.test(
+         day_reverted_main_revisions[bucket == "control"],
+         day_reverted_main_revisions[bucket == "experimental"]
+     )
+ )

	Wilcoxon rank sum test with continuity correction

data:  day_reverted_main_revisions[bucket == "control"] and day_reverted_main_revisions[bucket == "experimental"]
W = 91583600, p-value = 0.05568
alternative hypothesis: true location shift is not equal to 0

Wow. Marginal significance here. Rigor tells us we can't believe this result. Either there is a real, but very small effect or non at all. However, we can still talk about the potential implications. Let's say that this result is real. That would mean that current Wikipedians need to revert slightly fewer revisions when VE is enabled than when it is not.

Oops! I forgot to filter out editors who registered via mobile.

> with(
+     user_block_metrics,
+     wilcox.test(
+         day_reverted_main_revisions[!via_mobile & bucket == "control"],
+         day_reverted_main_revisions[!via_mobile & bucket == "experimental"]
+     )
+ )

	Wilcoxon rank sum test with continuity correction

data:  day_reverted_main_revisions[!via_mobile & bucket == "control"] and day_reverted_main_revisions[!via_mobile & bucket == "experimental"]
W = 48045048, p-value = 0.04534
alternative hypothesis: true location shift is not equal to 0

So, it looks like we have crossed the significance threshold, but even if this is, in fact, a real effect, it's very very small.

Block rate

Next, we're going to look at the proportion of users who are blocked for spam/vandalism

> prop.test(c(259, 290), c(9728, 9794))

	2-sample test for equality of proportions with continuity correction

data:  c(259, 290) out of c(9728, 9794)
X-squared = 1.4845, df = 1, p-value = 0.2231
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.007725427  0.001753852
sample estimates:
    prop 1     prop 2 
0.02662418 0.02960997 

VE enabled users get slightly fewer spam/vandalism blocks, but there's no significance here. --Halfak (WMF) (talk) 20:12, 12 June 2015 (UTC)