Research talk:Anonymous editor acquisition/Signup CTA experiment/Work log/2014-06-11

From Meta, a Wikimedia project coordination wiki

Wednesday, June 11th[edit]

Picking up where I left off with revision data. In order to get some basic figures, I'll need to limit my analysis to the week of the experiment. Regretfully, I forgot to include the timestamp field in the last dataset, so I'll need to regenerate.

SELECT
    wiki,
    event_token as token,
    timestamp,
    event_revId as rev_id
FROM log.TrackedPageContentSaveComplete_8535426
UNION
SELECT
    wiki,
    event_token as token,
    timestamp,
    event_revId as rev_id
FROM log.TrackedPageContentSaveComplete_7872558;

--Halfak (WMF) (talk) 16:56, 11 June 2014 (UTC)[reply]


Time to get some ballpark estimates before I go to R and make some visualizations. I'll need to limit the query to users (tokens) who were not registered before the experimental period.

SELECT
    wiki,
    bucket,
    ROUND(EXP(AVG(LOG(experimental_revisions+1)))-1, 3) AS geom_mean_revisions,
    SUM(experimental_revisions > 0) AS editing_clients,
    SUM(experimental_revisions > 0)/COUNT(*) AS editing_prop,
    COUNT(*) AS relevant_tokened_clients
FROM token_info
LEFT JOIN (
    SELECT
        wiki,
        token,
        COUNT(rev_id) AS experimental_revisions
    FROM staging.token_revision
    WHERE timestamp BETWEEN "20140519180800" AND "20140526180800"
    GROUP BY wiki, token
) AS token_revision_count USING (wiki, token)
WHERE (first_user_id IS NULL OR first_user_registration > "20140519180800")
AND link_clicks > 0
GROUP BY wiki, bucket;
+--------+-----------+---------------------+-----------------+--------------+--------------------------+
| wiki   | bucket    | geom_mean_revisions | editing_clients | editing_prop | relevant_tokened_clients |
+--------+-----------+---------------------+-----------------+--------------+--------------------------+
| dewiki | control   |               1.849 |            4093 |       0.0874 |                    46835 |
| dewiki | post-edit |               1.708 |            3788 |       0.0892 |                    42454 |
| dewiki | pre-edit  |               1.896 |            3107 |       0.0717 |                    43319 |
| enwiki | control   |               1.946 |           27036 |       0.1139 |                   237262 |
| enwiki | post-edit |               1.899 |           24738 |       0.1145 |                   216138 |
| enwiki | pre-edit  |               2.043 |           20759 |       0.0921 |                   225354 |
| frwiki | control   |               1.836 |            3915 |       0.1124 |                    34821 |
| frwiki | post-edit |               1.817 |            3601 |       0.1142 |                    31543 |
| frwiki | pre-edit  |               1.966 |            3007 |       0.0912 |                    32984 |
| itwiki | control   |               2.060 |            2615 |       0.1227 |                    21306 |
| itwiki | post-edit |               2.099 |            2381 |       0.1235 |                    19280 |
| itwiki | pre-edit  |               2.199 |            2116 |       0.0997 |                    21232 |
+--------+-----------+---------------------+-----------------+--------------+--------------------------+
12 rows in set (1 min 23.37 sec)

So, it looks like we get fewer people editing in the pre-edit then the control condition, but we get more edits per person. Now to visualize this with some error bars and do some statistical tests to see if the differences are real. --Halfak (WMF) (talk) 17:26, 11 June 2014 (UTC)[reply]