Research talk:VisualEditor's effect on newly registered editors/May 2015 study/Work log/2015-06-05

From Meta, a Wikimedia project coordination wiki

Friday, June 5, 2015[edit]

Bucketing just ended. I'm going to take the opportunity to get a head-start on the analysis. Time to gather our bucketed users.

First, let's look at the time bounds:

> SELECT 
    ->   LEFT(user_registration, 10) AS hour,
    ->   SUM(ve.up_user IS NOT NULL)/COUNT(*) AS ve_prop
    -> FROM user
    -> LEFT JOIN user_properties AS ve ON
    ->   up_user = user_id AND
    ->   up_property = 'visualeditor-enable'
    -> WHERE
    ->   user_registration BETWEEN "20150528" AND "20150605" AND
    ->   user_id > 25241662
    -> GROUP BY 1;

+------------+---------+
| hour       | ve_prop |
+------------+---------+
| 2015052800 |  0.0061 |

<... snip ...>

| 2015052822 |  0.0000 |
| 2015052823 |  0.2964 |
| 2015052900 |  0.3030 |
| 2015052901 |  0.3891 |
| 2015052902 |  0.3196 |
| 2015052903 |  0.3969 |
| 2015052904 |  0.3706 |
| 2015052905 |  0.3234 |
| 2015052906 |  0.3603 |
| 2015052907 |  0.3246 |
| 2015052908 |  0.3404 |
| 2015052909 |  0.3237 |

<... snip ...>

| 2015060417 |  0.3683 |
| 2015060418 |  0.3634 |
| 2015060419 |  0.3564 |
| 2015060420 |  0.3366 |
| 2015060421 |  0.3455 |
| 2015060422 |  0.3517 |
| 2015060423 |  0.1958 |
+------------+---------+
192 rows in set (0.99 sec)

It looks like "2015052823" through "2015060423" will work as expected.

Time to get the sample -- same as the pilot with different date bounds:

SELECT
  event_userId AS user_id,
  IF(event_userId % 2 = 0, "experimental", "control") AS bucket,
  timestamp AS registration,
  event_displayMobile AS via_mobile,
  ve.up_user IS NOT NULL AS ve_enabled
FROM log.ServerSideAccountCreation_5487345
LEFT JOIN enwiki.user_properties ve ON
  event_userId = up_user AND
  up_property = 'visualeditor-enable'
WHERE
  wiki = "enwiki" AND
  event_isSelfMade AND
  timestamp BETWEEN "2015052823" and "2015060423";
$ head -n 3 experimental_users.tsv; wc experimental_users.tsv 
user_id	bucket	registration	via_mobile	ve_enabled
25324895	control	20150528230028	1	0
25324896	experimental	20150528230034	0	0
  26972  134860 1038541 experimental_users.tsv

And there we have it. About 27k experimental users. --Halfak (WMF) (talk) 13:55, 5 June 2015 (UTC)[reply]


I just updated the rest of the queries, but I ran out of time. We should be in a good shape for when I actually get to do a prelim analysis. I'm guessing that's going to be Monday morning. For now, I'm off to work on other things. --Halfak (WMF) (talk) 14:54, 5 June 2015 (UTC)[reply]