Research talk:VisualEditor's effect on newly registered editors/May 2015 study/Work log/2015-09-30
Add topicWednesday, September 30, 2015
[edit]Today I want to look at the short, mid and long-term survival trends of editors in the experimental conditions. I was recently working on a similar analysis for an experiment with the Teahouse. I'm going to replicate the same methodology here. If you want to read the details, see Research_talk:Teahouse_long_term_new_editor_retention/Work_log/2015-09-28.
The experiment ended on June 4th, 2015, so that means I have nearly 4 months to work with. I want to look at the following trial and survival periods:
- 1 week, 1 month
- 1 month, 1 month
- 2 months, 1 month
So now the query:
SELECT
user_id,
SUM(revisions_1_to_2_weeks) AS revisions_1_to_2_weeks,
SUM(revisions_1_to_2_months) AS revisions_1_to_2_months,
SUM(revisions_2_to_3_months) AS revisions_2_to_3_months
FROM (
(SELECT
user_id,
SUM(
rev_timestamp IS NOT NULL AND
DATEDIFF(rev_timestamp, user_registration) BETWEEN 7 AND 14
) AS revisions_1_to_2_weeks,
SUM(
rev_timestamp IS NOT NULL AND
DATEDIFF(rev_timestamp, user_registration) BETWEEN 30 AND 60
) AS revisions_1_to_2_months,
SUM(
rev_timestamp IS NOT NULL AND
DATEDIFF(rev_timestamp, user_registration) BETWEEN 60 AND 90
) AS revisions_2_to_3_months
FROM staging.ve2_experimental_users as user
INNER JOIN user USING (user_id)
LEFT JOIN revision ON
rev_user = user_id AND
rev_timestamp >= DATE_FORMAT(
DATE_ADD(user_registration, INTERVAL 7 DAY),
"%Y%m%d%H%i%S"
)
GROUP BY 1)
UNION
(SELECT
user_id,
SUM(
ar_timestamp IS NOT NULL AND
DATEDIFF(ar_timestamp, user_registration) BETWEEN 7 AND 14
) AS revisions_1_to_2_weeks,
SUM(
ar_timestamp IS NOT NULL AND
DATEDIFF(ar_timestamp, user_registration) BETWEEN 30 AND 60
) AS revisions_1_to_2_months,
SUM(
ar_timestamp IS NOT NULL AND
DATEDIFF(ar_timestamp, user_registration) BETWEEN 60 AND 90
) AS revisions_2_to_3_months
FROM staging.ve2_experimental_users AS user
INNER JOIN user USING (user_id)
LEFT JOIN archive ON
ar_user = user_id AND
ar_timestamp >= DATE_FORMAT(
DATE_ADD(user_registration, INTERVAL 21 DAY),
"%Y%m%d%H%i%S"
)
GROUP BY 1)
) user_span_revisions
GROUP BY user_id;
Here's a sample of the output:
user_id revisions_1_to_2_weeks revisions_1_to_2_months revisions_2_to_3_months 2532<snip> 6 0 0 2532<snip> 0 0 0 2532<snip> 0 0 0 2532<snip> 0 0 0 2532<snip> 4 1 0 2532<snip> 2 0 0 2532<snip> 0 0 0 2532<snip> 0 0 0 2532<snip> 0 0 0
1+ edits survival
[edit]First, I'll consider an editor "surviving" if they make at least 1 edit in the survival period.
bucket | 1 to 2 weeks.k | 1 to 2 months.k | 2 to 3 months.k | n | 1 to 2 weeks.p | 1 to 2 months.p | 2 to 3 months.p |
---|---|---|---|---|---|---|---|
control | 321 | 294 | 211 | 13464 | 0.02384135 | 0.02183601 | 0.01567142 |
experimental | 311 | 327 | 230 | 13507 | 0.0230251 | 0.02420967 | 0.01702821 |
Chi^2 tests
|
---|
> prop.test(bucket.survival$revisions_1_to_2_weeks.k, bucket.survival$n) 2-sample test for equality of proportions with continuity correction data: bucket.survival$revisions_1_to_2_weeks.k out of bucket.survival$n X-squared = 0.1623, df = 1, p-value = 0.6871 alternative hypothesis: two.sided 95 percent confidence interval: -0.002868681 0.004501194 sample estimates: prop 1 prop 2 0.02384135 0.02302510 > prop.test(bucket.survival$revisions_1_to_2_months.k, bucket.survival$n) 2-sample test for equality of proportions with continuity correction data: bucket.survival$revisions_1_to_2_months.k out of bucket.survival$n X-squared = 1.585, df = 1, p-value = 0.208 alternative hypothesis: two.sided 95 percent confidence interval: -0.006027302 0.001279978 sample estimates: prop 1 prop 2 0.02183601 0.02420967 > prop.test(bucket.survival$revisions_2_to_3_months.k, bucket.survival$n) 2-sample test for equality of proportions with continuity correction data: bucket.survival$revisions_2_to_3_months.k out of bucket.survival$n X-squared = 0.6897, df = 1, p-value = 0.4063 alternative hypothesis: two.sided 95 percent confidence interval: -0.004457761 0.001744186 sample estimates: prop 1 prop 2 0.01567142 0.01702821 |
Looks like the VE cohort is a bit ahead, but the difference is not significant.
5+ edits survival
[edit]Let's try considering survival to only be legitimate if the editor saves at least 5 edits in the survival period.
bucket | 1 to 2 weeks.k | 1 to 2 months.k | 2 to 3 months.k | n | 1 to 2 months.p | 1 to 2 weeks.p | 2 to 3 months.p |
---|---|---|---|---|---|---|---|
control | 108 | 116 | 83 | 13464 | 0.008615567 | 0.00802139 | 0.006164587 |
experimental | 102 | 129 | 79 | 13507 | 0.009550603 | 0.00755164 | 0.005848819 |
Chi^2 tests
|
---|
> prop.test(bucket.survival5$revisions_1_to_2_weeks.k, bucket.survival5$n) 2-sample test for equality of proportions with continuity correction data: bucket.survival5$revisions_1_to_2_weeks.k out of bucket.survival5$n X-squared = 0.1366, df = 1, p-value = 0.7117 alternative hypothesis: two.sided 95 percent confidence interval: -0.001702440 0.002641941 sample estimates: prop 1 prop 2 0.00802139 0.00755164 > prop.test(bucket.survival5$revisions_1_to_2_months.k, bucket.survival5$n) 2-sample test for equality of proportions with continuity correction data: bucket.survival5$revisions_1_to_2_months.k out of bucket.survival5$n X-squared = 0.5552, df = 1, p-value = 0.4562 alternative hypothesis: two.sided 95 percent confidence interval: -0.003273534 0.001403462 sample estimates: prop 1 prop 2 0.008615567 0.009550603 > prop.test(bucket.survival5$revisions_2_to_3_months.k, bucket.survival5$n) 2-sample test for equality of proportions with continuity correction data: bucket.survival5$revisions_2_to_3_months.k out of bucket.survival5$n X-squared = 0.0659, df = 1, p-value = 0.7974 alternative hypothesis: two.sided 95 percent confidence interval: -0.001602756 0.002234292 sample estimates: prop 1 prop 2 0.006164587 0.005848819 |
Again we don't see significance, but unlike above, we don't see a clear advantage for either group. --Halfak (WMF) (talk) 18:32, 30 September 2015 (UTC)