Research talk:Onboarding new Wikipedians/Rollout/Work log/2014-03-13

Thursday, March 13th

Latest comment: 10 years ago4 comments1 person in discussion

I have two goals for today.

Extend my analysis to other metrics.
Increase sample sizes in order to see significant effects around 2% change in proportions

I plan to do these in parallel, but #2 is going to take a lot of processing time so I'll start with it. So, first, I need to know how many observations I'll need in order to get significance. Time for a power analysis. I expect around a 2% change based on Research:OB#Overview_of_results.

Proportions power analysis. The p-value of a $\chi ^{2}$ test is plotted by number of observation for differing levels of baseline and change in proportion. A horizontal line is plotted at p=0.05 and a vertical line is plotted at the # of observations to be sampled.

It looks like 2000 observation is a sort of sweet spot where we can detect significant differences for most changes.

tests[n == 2000,list(baseline, change, p.value = round(p.value, 2), signif=p.value < 0.05),]
    baseline change p.value signif
 1:     0.05   0.01    0.19  FALSE
 2:     0.05   0.02    0.01   TRUE
 3:     0.05   0.03    0.00   TRUE
 4:     0.05   0.04    0.00   TRUE
 5:     0.05   0.05    0.00   TRUE
 6:     0.15   0.01    0.41  FALSE
 7:     0.15   0.02    0.09  FALSE
 8:     0.15   0.03    0.01   TRUE
 9:     0.15   0.04    0.00   TRUE
10:     0.15   0.05    0.00   TRUE
11:     0.25   0.01    0.49  FALSE
12:     0.25   0.02    0.16  FALSE
13:     0.25   0.03    0.03   TRUE
14:     0.25   0.04    0.00   TRUE
15:     0.25   0.05    0.00   TRUE

It looks like we'll need a another 500 observations to identify significant effects at 2% for a baseline of about 15%. We'd have to push another 1500 (4k total) observations to have significance at a baseline of 25%.

I've bumped the sample up to 2k per wiki and kicked off the stats generation process again. --Halfak (WMF) (talk) 19:00, 13 March 2014 (UTC)Reply

OK. Based on the old sample, he's the proportion of activated editors:

The difference in the proportions of new editors before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

Difference in activation rates. The difference in the proportions of new editors before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

Same story here as before except that no wikis saw a significantly lower activation rate while ukwiki and itwiki saw significantly higher activation rates post deployment. --Halfak (WMF) (talk) 19:12, 13 March 2014 (UTC)Reply

Now, I'd like to perform a similar test for my log-normally distributed data -- e.g. # of edits.

Density of 1st day revisions. The density of 1st day revisions is plotted by wiki and "condition" (before and after deployment of mw:Extension:GettingStarted).

Now to look at the differences and compare significance.

Delta new user edits. The difference in the logged count of revisions made per editor within 24h of registering between editors who registered before and after deployment is plotted by wiki with 95% CI error bars.

The plot above is the result of t.tests performed on the log data, so the y axis reports differences in log space. The story here is similar to that of activation rates. Trends seem to be dominated by the proportion of new editors who make at least one edit. --Halfak (WMF) (talk) 19:26, 13 March 2014 (UTC)Reply

I just realized that I forgot to check what happens if I limit activation to editing in the main namespace only. Here we go:

Article editing activation rate. The difference in the proportions of new editors (main NS only) before and after the deployment of mw:Extension:GettingStarted is plotted with 95% CI error bars.

Here we see a lot more wikis showing positive trends, but again we lack significance with this limited dataset. Looks like I should hold off until I have that larger sample to work with. --Halfak (WMF) (talk) 20:18, 13 March 2014 (UTC)Reply