Research talk:Anonymous editor acquisition/Signup CTA experiment/Work log/2014-05-29
Add topicThursday, May 29th
[edit]OK. So the analysis I ran [Research_talk:Anonymous_editor_acquisition/Signup_CTA_experiment/Work_log/2014-05-28|yesterday]] has some substantial issues. For example, I was counting the number of anons who showed up before the experiment even began. In order to make a good comparison between the conditions, I need to limit my observations to token'd users who had an opportunity to be involved in the experiment. Luckily, there are a few events that were only active during the experiment and one that specifically represents the beginning of the flow of users through the experiment funnel: Schema:SignupExpPageLinkClick
So, I think that all I need to do is run the last query and limit the set of users to those with at least one page link click.
SELECT
wiki,
bucket,
IF(first_user_id IS NULL,
"pure anon",
IF(first_user_registration IS NULL OR first_user_registration <= 20140502000000,
"old user",
IF(first_user_registration <= 20140519180800,
"tracked user",
"experiment user"))) AS editor_class,
COUNT(*) AS tokens
FROM staging.token_info
WHERE link_clicks > 0
GROUP BY 1,2,3;
... bah. Hold the press. I'm not getting any link clicks for users in the control condition. This is wrong.
> SELECT
-> wiki,
-> IF(ORD(RIGHT(event_token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(event_token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket,
-> CONCAT("link click ", event_link) AS event,
-> count(*)
-> FROM
-> log.SignupExpPageLinkClick_8101692
-> WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
-> AND event_token IS NOT NULL
-> GROUP BY 1,2,3;
+--------+-----------+---------------------------+----------+
| wiki | bucket | event | count(*) |
+--------+-----------+---------------------------+----------+
| dewiki | control | link click create account | 5252 |
| dewiki | control | link click edit page | 16729 |
| dewiki | control | link click edit section | 37069 |
| dewiki | post-edit | link click create account | 4674 |
| dewiki | post-edit | link click edit page | 14299 |
| dewiki | post-edit | link click edit section | 33477 |
| dewiki | pre-edit | link click create account | 4779 |
| dewiki | pre-edit | link click edit page | 13423 |
| dewiki | pre-edit | link click edit section | 32620 |
| enwiki | control | link click create account | 35746 |
| enwiki | control | link click edit page | 111620 |
| enwiki | control | link click edit section | 175675 |
| enwiki | post-edit | link click create account | 33119 |
| enwiki | post-edit | link click edit page | 100428 |
| enwiki | post-edit | link click edit section | 155819 |
| enwiki | pre-edit | link click create account | 32515 |
| enwiki | pre-edit | link click edit page | 107158 |
| enwiki | pre-edit | link click edit section | 162989 |
| frwiki | control | link click create account | 3804 |
| frwiki | control | link click edit page | 17112 |
| frwiki | control | link click edit section | 25838 |
| frwiki | post-edit | link click create account | 3636 |
| frwiki | post-edit | link click edit page | 15940 |
| frwiki | post-edit | link click edit section | 25340 |
| frwiki | pre-edit | link click create account | 3571 |
| frwiki | pre-edit | link click edit page | 17418 |
| frwiki | pre-edit | link click edit section | 25642 |
| itwiki | control | link click create account | 1282 |
| itwiki | control | link click edit page | 10943 |
| itwiki | control | link click edit section | 21152 |
| itwiki | post-edit | link click create account | 1235 |
| itwiki | post-edit | link click edit page | 9824 |
| itwiki | post-edit | link click edit section | 18759 |
| itwiki | pre-edit | link click create account | 1235 |
| itwiki | pre-edit | link click edit page | 10798 |
| itwiki | pre-edit | link click edit section | 20608 |
+--------+-----------+---------------------------+----------+
36 rows in set (7.18 sec)
There they are. The problem must be in my stats query.
> SELECT
-> wiki,
-> IF(ORD(RIGHT(token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket,
-> link_clicks > 0,
-> count(*)
-> FROM
-> token_stats
-> GROUP BY 1,2,3;
+--------+-----------+-----------------+----------+
| wiki | bucket | link_clicks > 0 | count(*) |
+--------+-----------+-----------------+----------+
| dewiki | control | 0 | 67700 |
| dewiki | control | 1 | 8 |
| dewiki | post-edit | 0 | 61399 |
| dewiki | post-edit | 1 | 134 |
| dewiki | pre-edit | 0 | 60402 |
| dewiki | pre-edit | 1 | 2255 |
| enwiki | control | 0 | 348731 |
| enwiki | control | 1 | 21 |
| enwiki | post-edit | 0 | 317645 |
| enwiki | post-edit | 1 | 910 |
| enwiki | pre-edit | 0 | 309677 |
| enwiki | pre-edit | 1 | 18641 |
| frwiki | control | 0 | 52340 |
| frwiki | control | 1 | 3 |
| frwiki | post-edit | 0 | 47222 |
| frwiki | post-edit | 1 | 202 |
| frwiki | pre-edit | 0 | 46622 |
| frwiki | pre-edit | 1 | 2363 |
| itwiki | control | 0 | 30968 |
| itwiki | control | 1 | 1 |
| itwiki | post-edit | 0 | 27938 |
| itwiki | post-edit | 1 | 154 |
| itwiki | pre-edit | 0 | 28832 |
| itwiki | pre-edit | 1 | 1156 |
+--------+-----------+-----------------+----------+
24 rows in set (2.87 sec)
Sure enough. I'm not getting them. Why? Maybe I'm just not counting right. Let's check total events.
> SELECT
-> wiki,
-> IF(ORD(RIGHT(token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket,
-> total_events > 0,
-> count(*)
-> FROM
-> token_stats
-> GROUP BY 1,2,3;
+--------+-----------+------------------+----------+
| wiki | bucket | total_events > 0 | count(*) |
+--------+-----------+------------------+----------+
| dewiki | control | 0 | 67575 |
| dewiki | control | 1 | 133 |
| dewiki | post-edit | 0 | 61413 |
| dewiki | post-edit | 1 | 120 |
| dewiki | pre-edit | 0 | 23827 |
| dewiki | pre-edit | 1 | 38830 |
| enwiki | control | 0 | 348337 |
| enwiki | control | 1 | 415 |
| enwiki | post-edit | 0 | 318202 |
| enwiki | post-edit | 1 | 353 |
| enwiki | pre-edit | 0 | 133217 |
| enwiki | pre-edit | 1 | 195101 |
| frwiki | control | 0 | 52251 |
| frwiki | control | 1 | 92 |
| frwiki | post-edit | 0 | 47355 |
| frwiki | post-edit | 1 | 69 |
| frwiki | pre-edit | 0 | 19302 |
| frwiki | pre-edit | 1 | 29683 |
| itwiki | control | 0 | 30907 |
| itwiki | control | 1 | 62 |
| itwiki | post-edit | 0 | 28044 |
| itwiki | post-edit | 1 | 48 |
| itwiki | pre-edit | 0 | 9998 |
| itwiki | pre-edit | 1 | 19990 |
+--------+-----------+------------------+----------+
24 rows in set (2.16 sec)
Ha! Now that's even worse. Not a single one of these should be false (0).
I still don't know why this isn't working. I ran the following and events were counted as expected.
SELECT
wiki,
token,
MIN(timestamp) AS first_event,
SUM(event LIKE "CTA impression%") AS cta_impressions,
SUM(event = "CTA impression pre-edit") AS pre_cta_impressions,
SUM(event = "CTA impression post-edit") AS post_cta_impressions,
SUM(event LIKE "link click%") AS link_clicks,
SUM(event LIKE "link click edit%") AS edit_link_clicks,
SUM(event LIKE "link click create%") AS registration_link_clicks,
COUNT(*) AS total_events
FROM (
SELECT
wiki,
event_token AS token,
timestamp,
CONCAT("CTA impression ", event_cta) AS event
FROM
log.SignupExpCTAImpression_8101716
WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
AND event_token IS NOT NULL
UNION ALL
SELECT
wiki,
event_token AS token,
timestamp,
CONCAT("link click ", event_link) AS event
FROM
log.SignupExpPageLinkClick_8101692
WHERE wiki IN ("enwiki", "dewiki", "itwiki", "frwiki")
AND event_token IS NOT NULL
) AS token_events
GROUP BY 1,2
LIMIT 10
OK. Current theory is that I missed a column when specifying the table and MySQL didn't raise any warnings about too many/few columns when loading the data in. I just changed the query so that it will load the token_stats directly from the query into a new table. Let's see how that goes. --Halfak (WMF) (talk) 00:45, 30 May 2014 (UTC)
Got it! Woo!
> SELECT
-> wiki,
-> IF(ORD(RIGHT(token, 1)) <= ORD("J"), "pre-edit", IF(ORD(RIGHT(token, 1)) <= ORD("d"), "post-edit", "control")) AS bucket,
-> total_events > 0,
-> count(*)
-> FROM
-> token_stats
-> GROUP BY 1,2,3;
+--------+-----------+------------------+----------+
| wiki | bucket | total_events > 0 | count(*) |
+--------+-----------+------------------+----------+
| dewiki | control | 1 | 67747 |
| dewiki | post-edit | 1 | 61578 |
| dewiki | pre-edit | 1 | 62690 |
| enwiki | control | 1 | 349371 |
| enwiki | post-edit | 1 | 319176 |
| enwiki | pre-edit | 1 | 328935 |
| frwiki | control | 1 | 52375 |
| frwiki | post-edit | 1 | 47466 |
| frwiki | pre-edit | 1 | 49023 |
| itwiki | control | 1 | 30989 |
| itwiki | post-edit | 1 | 28111 |
| itwiki | pre-edit | 1 | 30007 |
+--------+-----------+------------------+----------+
12 rows in set (4.63 sec)
OK. Back to the thing I was originally going to do -- limit my counts to tokens that showed up during the experiment.
SELECT
wiki,
bucket,
IF(first_user_id IS NULL,
"pure anon",
IF(first_user_registration IS NULL OR first_user_registration <= 20140502000000,
"old user",
IF(first_user_registration <= 20140519180800,
"tracked user",
"experiment user"))) AS editor_class,
COUNT(*) AS tokens
FROM staging.token_info
WHERE link_clicks > 0
GROUP BY 1,2,3;
Generating an intermediary table just got a lot slower. I'll just let the table loading query run for a bit and come back if it isn't too late. --Halfak (WMF) (talk) 01:23, 30 May 2014 (UTC)