Research talk:Onboarding new Wikipedians/Rollout/Work log/2014-03-20

From Meta, a Wikimedia project coordination wiki

Thursday, March 20th[edit]

First thing first, I need to move my analysis from yesterday to the main page. Then I need to pick up the analysis of the first month of GettingStarted. My primary goal with the first 30 days analysis is to figure out the scale at which GettingStarted operates. It will be very similar to the two weeks funnel analysis that I performed earlier, but I'll focus on keeping it simple and generating cross-wiki comparisons. OK. To main document. --Halfak (WMF) (talk) 14:04, 20 March 2014 (UTC)[reply]


I just added a bunch of content and fleshed out some of the methods section. Also this figure explaining the natural experiment:

A conceptual diagram depicts the sample periods before and after deployment of mw:Extension:GettingStarted.
Natural experiment sample periods. A conceptual diagram depicts the sample periods before and after deployment of mw:Extension:GettingStarted.

--Halfak (WMF) (talk) 19:06, 20 March 2014 (UTC)[reply]


Now it's time to think about what will help us best understand how many newcomers GS is affecting.

I want to know:

  • How many newly registered users showed up on all projects in the month post-deployment?
  • How of those users registered via desktop?
  • How many of those users registered on a GS wiki? (This will be tricky because jawiki came late)
  • How many of those users saw a CTA and which one?
    • No CTA
    • Suggest only
    • Edit only
    • Edit or suggest
  • How many of those users made an article edit in the first 24h

First things first. How many users registered cross-projects? We can use Schema:ServerSideAccountCreation to figure that out.

SET @deployment = "20140211181300";

SELECT
    COUNT(*)
FROM (
    SELECT DISTINCT
        wiki,
        event_userId as user_id
    FROM ServerSideAccountCreation_5487345
    WHERE 
        event_isSelfMade AND
        timestamp BETWEEN 
            @deployment AND
            DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S")
) AS uniques
+----------+
| COUNT(*) |
+----------+
|   336310 |
+----------+
1 row in set (1 min 11.51 sec)

OK. 336,310 newly registered users cross-project.


> SELECT
    ->     not mobile as desktop,
    ->     COUNT(*)
    -> FROM (
    ->     SELECT
    ->         wiki,
    ->         event_userId as user_id,
    ->         event_displayMobile as mobile
    ->     FROM ServerSideAccountCreation_5487345
    ->     WHERE 
    ->         event_isSelfMade AND
    ->         timestamp BETWEEN 
    ->             @deployment AND
    ->             DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S")
    ->     GROUP BY 1,2
    -> ) AS uniques
    -> GROUP BY 1;
+---------+----------+
| desktop | COUNT(*) |
+---------+----------+
|       0 |    63141 |
|       1 |   273169 |
+---------+----------+
2 rows in set (27.24 sec)

273169/336310 = 0.8123 or 81.23% desktop user registrations.

> SELECT
    ->     wiki IN ("astwiki",    "bswiki", "cawiki", "dawiki", "dewiki", "elwiki",
    ->              "enwiki",     "eswiki", "fawiki", "frwiki", "fowiki", "glwiki",
    ->              "hewiki",     "huwiki", "iswiki", "itwiki", "kowiki", "lbwiki",
    ->              "mkwiki",     "mlwiki", "nlwiki", "plwiki", "ptwiki", "ruwiki",
    ->              "simplewiki", "svwiki", "viwiki", "ukwiki", "zhwiki", "jawiki") AS gs_wiki,
    ->     COUNT(*)
    -> FROM (
    ->     SELECT DISTINCT
    ->         wiki,
    ->         event_userId as user_id
    ->     FROM ServerSideAccountCreation_5487345
    ->     WHERE 
    ->         event_isSelfMade AND
    ->         NOT event_displayMobile AND
    ->         timestamp BETWEEN 
    ->             @deployment AND
    ->             DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S")
    -> ) AS uniques
    -> GROUP BY 1;
+---------+----------+
| gs_wiki | COUNT(*) |
+---------+----------+
|       0 |    54201 |
|       1 |   218968 |
+---------+----------+
2 rows in set (20.88 sec)

That's 218968/273169 = 0.8016 or 80.16% of desktop users registered on a GettingStarted wiki.

Now, I'm going to assume that users saw only one version of the CTA. This isn't necessarily true. I'm also going to let jawiki's missing registrations due to the two week delay slip, which will be OK just so long as we don't try to draw proportions across this point in the funnel.


CREATE TEMPORARY TABLE staging.month_gs_potentials
SELECT
    ssac.wiki,
    ssac.event_userId AS user_id
FROM ServerSideAccountCreation_5487345 ssac
WHERE 
    event_isSelfMade AND
    NOT event_displayMobile AND
    ssac.timestamp BETWEEN 
        @deployment AND
        DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S") AND
    ssac.wiki IN ("astwiki",    "bswiki", "cawiki", "dawiki", "dewiki", "elwiki",
                  "enwiki",     "eswiki", "fawiki", "frwiki", "fowiki", "glwiki",
                  "hewiki",     "huwiki", "iswiki", "itwiki", "kowiki", "lbwiki",
                  "mkwiki",     "mlwiki", "nlwiki", "plwiki", "ptwiki", "ruwiki",
                  "simplewiki", "svwiki", "viwiki", "ukwiki", "zhwiki", "jawiki")
GROUP BY 1,2;
CREATE UNIQUE INDEX wiki_user ON staging.month_gs_potentials (wiki, user_id);

CREATE TEMPORARY TABLE staging.month_gs_impressions
SELECT 
    wiki,
    event_userId AS user_id,
    event_ctaType AS cta_type
FROM GettingStartedRedirectImpression_7355552
WHERE timestamp BETWEEN 
    @deployment AND
    DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S")
GROUP BY 1,2;
CREATE UNIQUE INDEX wiki_user ON staging.month_gs_impressions (wiki, user_id);

+---------------------------+----------+
| cta_type                  | COUNT(*) |
+---------------------------+----------+
| NULL                      |    75341 |
| edit current              |    19982 |
| edit current or suggested |    56878 |
| suggested                 |    66767 |
+---------------------------+----------+
4 rows in set (2.00 sec)
  • No CTA: 75341/218968 = 0.3441 or 34.41%
  • Just edit: 19982/218968 = 0.0913 or 9.13%
  • Just suggest: 66767/218968 = 0.3049 or 30.49%
  • Both edit & suggest: 56878/218968 = 0.2598 or 25.98%

That's all I have time for today. --Halfak (WMF) (talk) 21:06, 20 March 2014 (UTC)[reply]