Jump to content

Research talk:Anonymous editor acquisition/Signup CTA experiment/Work log/2014-05-28

Add topic
From Meta, a Wikimedia project coordination wiki

Wednesday, May 28th

[edit]

Today I'm bringing together the token work and generating the first funnel bits. First I need to write a query that combines the info I have on token'd users into user classes.

SELECT
    wiki,
    token,
    IF(RIGHT(token, 1) <= "J", "pre-edit", 
      IF(RIGHT(token, 1) <= "d", "post-edit", "control")) AS bucket,
    first_event,
    token_stats.revisions AS tokened_revisions,
    token_stats.account_creations,
    token_stats.creation_impressions,
    token_stats.button_clicks,
    token_stats.cta_impressions,
    token_stats.link_clicks,
    SUM(DISTINCT user_id) AS user_accounts,
    MIN(user_id) AS first_user_id,
    MIN(registration) AS first_user_registration
FROM staging.token_stats 
LEFT JOIN staging.user_token USING (wiki, token)
GROUP BY 1,2;

--Halfak (WMF) (talk) 16:24, 28 May 2014 (UTC)Reply


Looks like I made a mistake with the queries above. I didn't include an old version of the Schema:TrackedPageContentSaveComplete. So I need to include that now and re-run the queries (that took forever yesterday. Ugh.). --Halfak (WMF) (talk) 16:25, 28 May 2014 (UTC)Reply


I have data! OK. Time to pick through it.

> select count(*) from token_info;
+----------+
| count(*) |
+----------+
|  1384785 |
+----------+
1 row in set (0.00 sec)

About 1.4 million tokens appeared in since we started tracking data. Let's get the experiment bounds so that we can draw some cutoffs.

Query
> select left(timestamp, 10) as hour, count(*) from SignupExpCTAImpression_8101716 group by 1;
+------------+----------+
| hour       | count(*) |
+------------+----------+
| 2014041618 |        1 |
| 2014041720 |        1 |
| 2014041721 |        1 |
| 2014041722 |        1 |
| 2014041816 |        9 |
| 2014042120 |        6 |
| 2014042121 |        7 |
| 2014042505 |       36 |
| 2014042516 |        4 |
| 2014050204 |        4 |
| 2014050205 |        1 |
| 2014050206 |       10 |
| 2014050207 |       10 |
| 2014050323 |        5 |
| 2014050401 |       24 |
| 2014050502 |        8 |
| 2014050503 |        9 |
| 2014050504 |        7 |
| 2014051400 |        1 |
| 2014051402 |        3 |
| 2014051417 |        1 |
| 2014051520 |        2 |
| 2014051603 |        1 |
| 2014051604 |        4 |
| 2014051918 |      908 |
| 2014051919 |     1154 |
| 2014051920 |     1639 |
| 2014051921 |     1596 |
| 2014051922 |     1433 |
| 2014051923 |     1201 |
| 2014052000 |     1197 |
| 2014052001 |     1268 |
| 2014052002 |     1241 |
| 2014052003 |     1139 |
| 2014052004 |     1194 |
| 2014052005 |     1409 |
| 2014052006 |     1724 |
| 2014052007 |     2136 |
| 2014052008 |     2473 |
| 2014052009 |     2510 |
| 2014052010 |     2511 |
| 2014052011 |     2584 |
| 2014052012 |     2928 |
| 2014052013 |     3433 |
| 2014052014 |     3603 |
| 2014052015 |     3500 |
| 2014052016 |     3366 |
| 2014052017 |     3424 |
| 2014052018 |     3386 |
| 2014052019 |     3035 |
| 2014052020 |     2790 |
| 2014052021 |     2414 |
| 2014052022 |     1951 |
| 2014052023 |     1707 |
| 2014052100 |     1642 |
| 2014052101 |     1524 |
| 2014052102 |     1554 |
| 2014052103 |     1510 |
| 2014052104 |     1481 |
| 2014052105 |     1581 |
| 2014052106 |     1882 |
| 2014052107 |     2132 |
| 2014052108 |     2508 |
| 2014052109 |     2642 |
| 2014052110 |     2610 |
| 2014052111 |     2563 |
| 2014052112 |     3063 |
| 2014052113 |     3529 |
| 2014052114 |     3483 |
| 2014052115 |     3314 |
| 2014052116 |     3219 |
| 2014052117 |     3283 |
| 2014052118 |     3187 |
| 2014052119 |     2975 |
| 2014052120 |     2675 |
| 2014052121 |     2271 |
| 2014052122 |     1782 |
| 2014052123 |     1691 |
| 2014052200 |     1527 |
| 2014052201 |     1540 |
| 2014052202 |     1572 |
| 2014052203 |     1353 |
| 2014052204 |     1420 |
| 2014052205 |     1503 |
| 2014052206 |     1920 |
| 2014052207 |     2207 |
| 2014052208 |     2583 |
| 2014052209 |     2639 |
| 2014052210 |     2503 |
| 2014052211 |     2550 |
| 2014052212 |     2933 |
| 2014052213 |     3298 |
| 2014052214 |     3352 |
| 2014052215 |     3437 |
| 2014052216 |     3233 |
| 2014052217 |     3070 |
| 2014052218 |     2876 |
| 2014052219 |     2594 |
| 2014052220 |     2512 |
| 2014052221 |     2143 |
| 2014052222 |     1703 |
| 2014052223 |     1541 |
| 2014052300 |     1534 |
| 2014052301 |     1384 |
| 2014052302 |     1340 |
| 2014052303 |     1162 |
| 2014052304 |     1150 |
| 2014052305 |     1329 |
| 2014052306 |     1658 |
| 2014052307 |     1894 |
| 2014052308 |     2026 |
| 2014052309 |     2174 |
| 2014052310 |     2149 |
| 2014052311 |     2154 |
| 2014052312 |     2435 |
| 2014052313 |     2859 |
| 2014052314 |     2796 |
| 2014052315 |     2775 |
| 2014052316 |     2686 |
| 2014052317 |     2463 |
| 2014052318 |     2368 |
| 2014052319 |     2369 |
| 2014052320 |     2292 |
| 2014052321 |     1918 |
| 2014052322 |     1503 |
| 2014052323 |     1325 |
| 2014052400 |     1122 |
| 2014052401 |     1122 |
| 2014052402 |     1060 |
| 2014052403 |     1110 |
| 2014052404 |     1178 |
| 2014052405 |     1230 |
| 2014052406 |     1293 |
| 2014052407 |     1527 |
| 2014052408 |     1747 |
| 2014052409 |     1819 |
| 2014052410 |     1751 |
| 2014052411 |     1869 |
| 2014052412 |     2049 |
| 2014052413 |     2227 |
| 2014052414 |     2292 |
| 2014052415 |     2366 |
| 2014052416 |     2201 |
| 2014052417 |     2117 |
| 2014052418 |     2008 |
| 2014052419 |     2026 |
| 2014052420 |     1888 |
| 2014052421 |     1869 |
| 2014052422 |     1542 |
| 2014052423 |     1397 |
| 2014052500 |     1288 |
| 2014052501 |     1266 |
| 2014052502 |     1170 |
| 2014052503 |     1086 |
| 2014052504 |     1184 |
| 2014052505 |     1179 |
| 2014052506 |     1321 |
| 2014052507 |     1480 |
| 2014052508 |     1745 |
| 2014052509 |     1871 |
| 2014052510 |     1905 |
| 2014052511 |     1970 |
| 2014052512 |     2125 |
| 2014052513 |     2246 |
| 2014052514 |     2424 |
| 2014052515 |     2679 |
| 2014052516 |     2506 |
| 2014052517 |     2600 |
| 2014052518 |     2446 |
| 2014052519 |     2273 |
| 2014052520 |     2266 |
| 2014052521 |     1913 |
| 2014052522 |     1578 |
| 2014052523 |     1581 |
| 2014052600 |     1429 |
| 2014052601 |     1419 |
| 2014052602 |     1353 |
| 2014052603 |     1267 |
| 2014052604 |     1314 |
| 2014052605 |     1334 |
| 2014052606 |     1705 |
| 2014052607 |     1996 |
| 2014052608 |     2317 |
| 2014052609 |     2419 |
| 2014052610 |     2463 |
| 2014052611 |     2478 |
| 2014052612 |     2526 |
| 2014052613 |     2489 |
| 2014052614 |     2688 |
| 2014052615 |     2968 |
| 2014052616 |     3017 |
| 2014052617 |     2929 |
| 2014052618 |     2787 |
| 2014052619 |     2654 |
| 2014052620 |     2463 |
| 2014052621 |     2127 |
| 2014052622 |     1851 |
| 2014052623 |     1662 |
| 2014052700 |     1609 |
| 2014052701 |     1521 |
| 2014052702 |     1551 |
| 2014052703 |     1320 |
| 2014052704 |     1482 |
| 2014052705 |     1497 |
| 2014052706 |     1909 |
| 2014052707 |     2085 |
| 2014052708 |     2230 |
| 2014052709 |     2620 |
| 2014052710 |     2559 |
| 2014052711 |     2486 |
| 2014052712 |     2720 |
| 2014052713 |     2794 |
| 2014052714 |     2810 |
| 2014052715 |     3158 |
| 2014052716 |     3060 |
| 2014052717 |     3077 |
| 2014052718 |     2898 |
| 2014052719 |     2942 |
| 2014052720 |     2899 |
| 2014052721 |     2520 |
| 2014052722 |     1979 |
| 2014052723 |     1077 |
| 2014052800 |      847 |
| 2014052801 |      764 |
| 2014052802 |      626 |
| 2014052803 |      518 |
| 2014052804 |      490 |
| 2014052805 |      467 |
| 2014052806 |      391 |
| 2014052807 |      366 |
| 2014052808 |      254 |
| 2014052809 |      194 |
| 2014052810 |      136 |
| 2014052811 |      139 |
| 2014052812 |      173 |
| 2014052813 |      169 |
| 2014052814 |      199 |
| 2014052815 |      198 |
| 2014052816 |      165 |
| 2014052817 |      160 |
| 2014052818 |      148 |
| 2014052819 |       58 |
+------------+----------+
242 rows in set (0.93 sec)

It looks like the experiment started in the 18th hour of 2014-05-19 and began to ramp down in the 23rd hour of 2014-05-27.

> select left(timestamp, 11) as ten_minutes, count(*) from SignupExpCTAImpression_8101716 where timestamp LIKE "2014051918%" group by 1;
+-------------+----------+
| ten_minutes | count(*) |
+-------------+----------+
| 20140519180 |       30 |
| 20140519181 |      171 |
| 20140519182 |      179 |
| 20140519183 |      198 |
| 20140519184 |      155 |
| 20140519185 |      175 |
+-------------+----------+
6 rows in set (0.00 sec)

Looks like we're looking for the first 10 minutes of the hour.

> select left(timestamp, 12) as minutes, count(*) from SignupExpCTAImpression_8101716 where timestamp LIKE "20140519180%" group by 1;
+--------------+----------+
| minutes      | count(*) |
+--------------+----------+
| 201405191805 |        1 |
| 201405191806 |        1 |
| 201405191807 |        6 |
| 201405191808 |        9 |
| 201405191809 |       13 |
+--------------+----------+
5 rows in set (0.00 sec)

I think that "201405191807" is the closest we're going to get. It's just aim for the next minute to be sure and say the experiment started at 20140519180800 or 2014-05-19 18:08:00 UTC.

So, for the sake of consistency and dealing with the periodic nature of the work week, I need to limit this to 7 days, so, my assumed end date will be 20140526180800 or 2014-05-16 18:00:00 UTC.

Let me make this bold.

Assumed start and end timestamps

[edit]
start
20140519180800
end
20140526180800

--Halfak (WMF) (talk) 22:42, 28 May 2014 (UTC)Reply


First results

[edit]

OK, here's the money. These represent counts of tokens based on user classes.

  • pure anon -- A token that was never associated with a registered account.
  • old user -- A token that is associated with an account that was registered before tracking began
  • tracked user -- A token that is associated with an account that was registered after tracking began, but before the experiment
  • experimental user -- A token that is associated with an account that was registered during the experiment


> SELECT
    ->     wiki,
    ->     bucket,
    ->     IF(first_user_id IS NULL,
    ->         "pure anon",
    ->         IF(first_user_registration IS NULL OR first_user_registration <= 20140502000000,
    ->         "old user",
    ->         IF(first_user_registration <= 20140519180800,
    ->         "tracked user",
    ->         "experiment user"))) AS editor_class,
    ->     COUNT(*) AS tokens
    -> FROM staging.token_info
    -> GROUP BY 1,2,3;
+--------+-----------+-----------------+--------+
| wiki   | bucket    | editor_class    | tokens |
+--------+-----------+-----------------+--------+
| dewiki | control   | experiment user |    723 |
| dewiki | control   | old user        |   4577 |
| dewiki | control   | pure anon       |  60441 |
| dewiki | control   | tracked user    |    630 |
| dewiki | post-edit | experiment user |    711 |
| dewiki | post-edit | old user        |   4342 |
| dewiki | post-edit | pure anon       |  54728 |
| dewiki | post-edit | tracked user    |    566 |
| dewiki | pre-edit  | experiment user |    998 |
| dewiki | pre-edit  | old user        |   4415 |
| dewiki | pre-edit  | pure anon       |  55410 |
| dewiki | pre-edit  | tracked user    |    589 |
| enwiki | control   | experiment user |  11633 |
| enwiki | control   | old user        |  22739 |
| enwiki | control   | pure anon       | 296263 |
| enwiki | control   | tracked user    |   6924 |
| enwiki | post-edit | experiment user |  11047 |
| enwiki | post-edit | old user        |  20737 |
| enwiki | post-edit | pure anon       | 270362 |
| enwiki | post-edit | tracked user    |   6179 |
| enwiki | pre-edit  | experiment user |  14520 |
| enwiki | pre-edit  | old user        |  21651 |
| enwiki | pre-edit  | pure anon       | 275562 |
| enwiki | pre-edit  | tracked user    |   6386 |
| frwiki | control   | experiment user |   1041 |
| frwiki | control   | old user        |   3293 |
| frwiki | control   | pure anon       |  46145 |
| frwiki | control   | tracked user    |    747 |
| frwiki | post-edit | experiment user |   1143 |
| frwiki | post-edit | old user        |   3052 |
| frwiki | post-edit | pure anon       |  41450 |
| frwiki | post-edit | tracked user    |    718 |
| frwiki | pre-edit  | experiment user |   1491 |
| frwiki | pre-edit  | old user        |   3144 |
| frwiki | pre-edit  | pure anon       |  42603 |
| frwiki | pre-edit  | tracked user    |    696 |
| itwiki | control   | experiment user |    382 |
| itwiki | control   | old user        |   1604 |
| itwiki | control   | pure anon       |  27973 |
| itwiki | control   | tracked user    |    309 |
| itwiki | post-edit | experiment user |    470 |
| itwiki | post-edit | old user        |   1546 |
| itwiki | post-edit | pure anon       |  25205 |
| itwiki | post-edit | tracked user    |    252 |
| itwiki | pre-edit  | experiment user |    688 |
| itwiki | pre-edit  | old user        |   1616 |
| itwiki | pre-edit  | pure anon       |  26844 |
| itwiki | pre-edit  | tracked user    |    240 |
+--------+-----------+-----------------+--------+
48 rows in set, 65535 warnings (3.71 sec)

Here, we can already see some differences between the experimental and control conditions become apparent. Let's take itwiki for example.

condition pure anon old user tracked user experiment user
control 27973 1604 309 382
pre-edit 26844 1616 240 688
post-edit 25205 1546 252 470

Here, we see that about 33% more users registered during the experiment in the pre-edit condition than the control. About 25% more user registered during the experiment in the post-edit condition than the control.

OK. That's about it for today. I'll be limiting the pure-anons and experimental users to the experimental time bounds tomorrow to do a more accurate comparison. --Halfak (WMF) (talk) 23:50, 28 May 2014 (UTC)Reply