Research talk:VisualEditor's effect on newly registered editors/May 2015 study/Work log/2015-06-05
Friday, June 5, 2015[edit]
Bucketing just ended. I'm going to take the opportunity to get a head-start on the analysis. Time to gather our bucketed users.
First, let's look at the time bounds:
> SELECT -> LEFT(user_registration, 10) AS hour, -> SUM(ve.up_user IS NOT NULL)/COUNT(*) AS ve_prop -> FROM user -> LEFT JOIN user_properties AS ve ON -> up_user = user_id AND -> up_property = 'visualeditor-enable' -> WHERE -> user_registration BETWEEN "20150528" AND "20150605" AND -> user_id > 25241662 -> GROUP BY 1; +------------+---------+ | hour | ve_prop | +------------+---------+ | 2015052800 | 0.0061 | <... snip ...> | 2015052822 | 0.0000 | | 2015052823 | 0.2964 | | 2015052900 | 0.3030 | | 2015052901 | 0.3891 | | 2015052902 | 0.3196 | | 2015052903 | 0.3969 | | 2015052904 | 0.3706 | | 2015052905 | 0.3234 | | 2015052906 | 0.3603 | | 2015052907 | 0.3246 | | 2015052908 | 0.3404 | | 2015052909 | 0.3237 | <... snip ...> | 2015060417 | 0.3683 | | 2015060418 | 0.3634 | | 2015060419 | 0.3564 | | 2015060420 | 0.3366 | | 2015060421 | 0.3455 | | 2015060422 | 0.3517 | | 2015060423 | 0.1958 | +------------+---------+ 192 rows in set (0.99 sec)
It looks like "2015052823" through "2015060423" will work as expected.
Time to get the sample -- same as the pilot with different date bounds:
SELECT
event_userId AS user_id,
IF(event_userId % 2 = 0, "experimental", "control") AS bucket,
timestamp AS registration,
event_displayMobile AS via_mobile,
ve.up_user IS NOT NULL AS ve_enabled
FROM log.ServerSideAccountCreation_5487345
LEFT JOIN enwiki.user_properties ve ON
event_userId = up_user AND
up_property = 'visualeditor-enable'
WHERE
wiki = "enwiki" AND
event_isSelfMade AND
timestamp BETWEEN "2015052823" and "2015060423";
$ head -n 3 experimental_users.tsv; wc experimental_users.tsv user_id bucket registration via_mobile ve_enabled 25324895 control 20150528230028 1 0 25324896 experimental 20150528230034 0 0 26972 134860 1038541 experimental_users.tsv
And there we have it. About 27k experimental users. --Halfak (WMF) (talk) 13:55, 5 June 2015 (UTC)
I just updated the rest of the queries, but I ran out of time. We should be in a good shape for when I actually get to do a prelim analysis. I'm guessing that's going to be Monday morning. For now, I'm off to work on other things. --Halfak (WMF) (talk) 14:54, 5 June 2015 (UTC)