Research talk:Anonymous editor acquisition/Signup CTA experiment/Work log/2014-05-29

From Meta, a Wikimedia project coordination wiki

Thursday, May 29th[edit]

OK. So the analysis I ran [Research_talk:Anonymous_editor_acquisition/Signup_CTA_experiment/Work_log/2014-05-28|yesterday]] has some substantial issues. For example, I was counting the number of anons who showed up before the experiment even began. In order to make a good comparison between the conditions, I need to limit my observations to token'd users who had an opportunity to be involved in the experiment. Luckily, there are a few events that were only active during the experiment and one that specifically represents the beginning of the flow of users through the experiment funnel: Schema:SignupExpPageLinkClick

So, I think that all I need to do is run the last query and limit the set of users to those with at least one page link click.

SELECT
    wiki,
    bucket,
    IF(first_user_id IS NULL,
        "pure anon",
        IF(first_user_registration IS NULL OR first_user_registration <= 20140502000000,
        "old user",
        IF(first_user_registration <= 20140519180800,
        "tracked user",
        "experiment user"))) AS editor_class,
    COUNT(*) AS tokens
FROM staging.token_info
WHERE link_clicks > 0
GROUP BY 1,2,3;

... bah. Hold the press. I'm not getting any link clicks for users in the control condition. This is wrong.

> SELECT
    ->     wiki,
    ->     IF(ORD(RIGHT(event_token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(event_token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket,
    ->     CONCAT("link click ", event_link) AS event,
    ->     count(*)
    -> FROM
    ->     log.SignupExpPageLinkClick_8101692
    -> WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
    -> AND event_token IS NOT NULL
    -> GROUP BY 1,2,3;
+--------+-----------+---------------------------+----------+
| wiki   | bucket    | event                     | count(*) |
+--------+-----------+---------------------------+----------+
| dewiki | control   | link click create account |     5252 |
| dewiki | control   | link click edit page      |    16729 |
| dewiki | control   | link click edit section   |    37069 |
| dewiki | post-edit | link click create account |     4674 |
| dewiki | post-edit | link click edit page      |    14299 |
| dewiki | post-edit | link click edit section   |    33477 |
| dewiki | pre-edit  | link click create account |     4779 |
| dewiki | pre-edit  | link click edit page      |    13423 |
| dewiki | pre-edit  | link click edit section   |    32620 |
| enwiki | control   | link click create account |    35746 |
| enwiki | control   | link click edit page      |   111620 |
| enwiki | control   | link click edit section   |   175675 |
| enwiki | post-edit | link click create account |    33119 |
| enwiki | post-edit | link click edit page      |   100428 |
| enwiki | post-edit | link click edit section   |   155819 |
| enwiki | pre-edit  | link click create account |    32515 |
| enwiki | pre-edit  | link click edit page      |   107158 |
| enwiki | pre-edit  | link click edit section   |   162989 |
| frwiki | control   | link click create account |     3804 |
| frwiki | control   | link click edit page      |    17112 |
| frwiki | control   | link click edit section   |    25838 |
| frwiki | post-edit | link click create account |     3636 |
| frwiki | post-edit | link click edit page      |    15940 |
| frwiki | post-edit | link click edit section   |    25340 |
| frwiki | pre-edit  | link click create account |     3571 |
| frwiki | pre-edit  | link click edit page      |    17418 |
| frwiki | pre-edit  | link click edit section   |    25642 |
| itwiki | control   | link click create account |     1282 |
| itwiki | control   | link click edit page      |    10943 |
| itwiki | control   | link click edit section   |    21152 |
| itwiki | post-edit | link click create account |     1235 |
| itwiki | post-edit | link click edit page      |     9824 |
| itwiki | post-edit | link click edit section   |    18759 |
| itwiki | pre-edit  | link click create account |     1235 |
| itwiki | pre-edit  | link click edit page      |    10798 |
| itwiki | pre-edit  | link click edit section   |    20608 |
+--------+-----------+---------------------------+----------+
36 rows in set (7.18 sec)

There they are. The problem must be in my stats query.

> SELECT
    ->     wiki,
    ->     IF(ORD(RIGHT(token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket,
    ->     link_clicks > 0,
    ->     count(*)
    -> FROM
    ->     token_stats
    -> GROUP BY 1,2,3;
+--------+-----------+-----------------+----------+
| wiki   | bucket    | link_clicks > 0 | count(*) |
+--------+-----------+-----------------+----------+
| dewiki | control   |               0 |    67700 |
| dewiki | control   |               1 |        8 |
| dewiki | post-edit |               0 |    61399 |
| dewiki | post-edit |               1 |      134 |
| dewiki | pre-edit  |               0 |    60402 |
| dewiki | pre-edit  |               1 |     2255 |
| enwiki | control   |               0 |   348731 |
| enwiki | control   |               1 |       21 |
| enwiki | post-edit |               0 |   317645 |
| enwiki | post-edit |               1 |      910 |
| enwiki | pre-edit  |               0 |   309677 |
| enwiki | pre-edit  |               1 |    18641 |
| frwiki | control   |               0 |    52340 |
| frwiki | control   |               1 |        3 |
| frwiki | post-edit |               0 |    47222 |
| frwiki | post-edit |               1 |      202 |
| frwiki | pre-edit  |               0 |    46622 |
| frwiki | pre-edit  |               1 |     2363 |
| itwiki | control   |               0 |    30968 |
| itwiki | control   |               1 |        1 |
| itwiki | post-edit |               0 |    27938 |
| itwiki | post-edit |               1 |      154 |
| itwiki | pre-edit  |               0 |    28832 |
| itwiki | pre-edit  |               1 |     1156 |
+--------+-----------+-----------------+----------+
24 rows in set (2.87 sec)

Sure enough. I'm not getting them. Why? Maybe I'm just not counting right. Let's check total events.

> SELECT
    ->     wiki,
    ->     IF(ORD(RIGHT(token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket,
    ->     total_events > 0,
    ->     count(*)
    -> FROM
    ->     token_stats
    -> GROUP BY 1,2,3;
+--------+-----------+------------------+----------+
| wiki   | bucket    | total_events > 0 | count(*) |
+--------+-----------+------------------+----------+
| dewiki | control   |                0 |    67575 |
| dewiki | control   |                1 |      133 |
| dewiki | post-edit |                0 |    61413 |
| dewiki | post-edit |                1 |      120 |
| dewiki | pre-edit  |                0 |    23827 |
| dewiki | pre-edit  |                1 |    38830 |
| enwiki | control   |                0 |   348337 |
| enwiki | control   |                1 |      415 |
| enwiki | post-edit |                0 |   318202 |
| enwiki | post-edit |                1 |      353 |
| enwiki | pre-edit  |                0 |   133217 |
| enwiki | pre-edit  |                1 |   195101 |
| frwiki | control   |                0 |    52251 |
| frwiki | control   |                1 |       92 |
| frwiki | post-edit |                0 |    47355 |
| frwiki | post-edit |                1 |       69 |
| frwiki | pre-edit  |                0 |    19302 |
| frwiki | pre-edit  |                1 |    29683 |
| itwiki | control   |                0 |    30907 |
| itwiki | control   |                1 |       62 |
| itwiki | post-edit |                0 |    28044 |
| itwiki | post-edit |                1 |       48 |
| itwiki | pre-edit  |                0 |     9998 |
| itwiki | pre-edit  |                1 |    19990 |
+--------+-----------+------------------+----------+
24 rows in set (2.16 sec)

Ha! Now that's even worse. Not a single one of these should be false (0).


I still don't know why this isn't working. I ran the following and events were counted as expected.

SELECT
    wiki,
    token,
    MIN(timestamp) AS first_event,
    SUM(event LIKE "CTA impression%") AS cta_impressions,
    SUM(event = "CTA impression pre-edit") AS pre_cta_impressions,
    SUM(event = "CTA impression post-edit") AS post_cta_impressions,
    SUM(event LIKE "link click%") AS link_clicks,
    SUM(event LIKE "link click edit%") AS edit_link_clicks,
    SUM(event LIKE "link click create%") AS registration_link_clicks,
    COUNT(*) AS total_events
FROM (
        SELECT
            wiki,
            event_token AS token,
            timestamp,
            CONCAT("CTA impression ", event_cta) AS event
        FROM
            log.SignupExpCTAImpression_8101716
        WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
        AND event_token IS NOT NULL
    UNION ALL
        SELECT
            wiki,
            event_token AS token,
            timestamp,
            CONCAT("link click ", event_link) AS event
        FROM
            log.SignupExpPageLinkClick_8101692
        WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
        AND event_token IS NOT NULL
) AS token_events
GROUP BY 1,2
LIMIT 10

OK. Current theory is that I missed a column when specifying the table and MySQL didn't raise any warnings about too many/few columns when loading the data in. I just changed the query so that it will load the token_stats directly from the query into a new table. Let's see how that goes. --Halfak (WMF) (talk) 00:45, 30 May 2014 (UTC)[reply]


Got it! Woo!

> SELECT
    ->     wiki,
    ->     IF(ORD(RIGHT(token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket,
    ->     total_events > 0,
    ->     count(*)
    -> FROM
    ->     token_stats
    -> GROUP BY 1,2,3;
+--------+-----------+------------------+----------+
| wiki   | bucket    | total_events > 0 | count(*) |
+--------+-----------+------------------+----------+
| dewiki | control   |                1 |    67747 |
| dewiki | post-edit |                1 |    61578 |
| dewiki | pre-edit  |                1 |    62690 |
| enwiki | control   |                1 |   349371 |
| enwiki | post-edit |                1 |   319176 |
| enwiki | pre-edit  |                1 |   328935 |
| frwiki | control   |                1 |    52375 |
| frwiki | post-edit |                1 |    47466 |
| frwiki | pre-edit  |                1 |    49023 |
| itwiki | control   |                1 |    30989 |
| itwiki | post-edit |                1 |    28111 |
| itwiki | pre-edit  |                1 |    30007 |
+--------+-----------+------------------+----------+
12 rows in set (4.63 sec)

OK. Back to the thing I was originally going to do -- limit my counts to tokens that showed up during the experiment.

SELECT
    wiki,
    bucket,
    IF(first_user_id IS NULL,
        "pure anon",
        IF(first_user_registration IS NULL OR first_user_registration <= 20140502000000,
        "old user",
        IF(first_user_registration <= 20140519180800,
        "tracked user",
        "experiment user"))) AS editor_class,
    COUNT(*) AS tokens
FROM staging.token_info
WHERE link_clicks > 0
GROUP BY 1,2,3;

Generating an intermediary table just got a lot slower. I'll just let the table loading query run for a bit and come back if it isn't too late. --Halfak (WMF) (talk) 01:23, 30 May 2014 (UTC)[reply]