Research talk:Onboarding new Wikipedians/Rollout/Work log/2014-03-24

From Meta, a Wikimedia project coordination wiki

Monday, March 24th[edit]

Back at it. Today, I'd like to finish off my analysis of the first 30 days of GS deployment. The last thing I did was produce a breakdown of the % of editors making it to various places in the funnel. To help me make sense of this, I'll start by making a sketch of what I found so far.

[Registration: 336,310] --> [Desktop: 273,169 (81.23%)] --> [GS wiki: 218,968 (80.16%)] --> [No CTA: 75,341 (34.41%)]
                        \                               \                              '--> [Edit current: 19,982 (9.13%)]
                         '--> (18.77%)                   '--> (19.84%)                 '--> [Suggest only: 66,767 (30.49%)]
                                                                                       '--> [Edit & suggest: 56,878 (25.98%)]

Now, I need to figure out:

  • How many of those users made an article edit in the first 24h?
  • How many of those article editors made a GS edit?
  • What proportion of those GS edits were reverted?

--Halfak (WMF) (talk) 15:40, 24 March 2014 (UTC)[reply]


I have the following query kicked off to gather data on users cross-wiki. I'll have to pull this into our server the contains the ServerSideAccountCreation log in order to filter it down to eligible users.

SELECT
    DATABASE() as wiki,
    user_id,
    SUM(rev_id IS NOT NULL) AS day_revisions,
    SUM(rev_id IS NOT NULL AND page_namespace = 0) AS day_main_revisions,
    SUM(ct_rev_id IS NOT NULL) AS day_gs_revisions
FROM user
LEFT JOIN revision ON
    rev_user = user_id AND
    rev_timestamp BETWEEN 
        user_registration AND 
        DATE_FORMAT(DATE_ADD(user_registration, INTERVAL 1 DAY), "%Y%m%d%H%i%S")
LEFT JOIN page ON
    rev_page = page_id
LEFT JOIN change_tag ON
    ct_rev_id = rev_id
WHERE
    user_registration BETWEEN "20140211181300" AND "20140313181300";

OK. Here's the query that's pulling in eligible users so that I can do a set intersection between the two:

SET @deployment = "20140211181300";

CREATE TEMPORARY TABLE staging.month_gs_impression
SELECT 
    wiki,
    event_userId AS user_id,
    event_ctaType AS cta_type
FROM log.GettingStartedRedirectImpression_7355552
WHERE timestamp BETWEEN 
    @deployment AND
    DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S")
GROUP BY 1,2;
CREATE UNIQUE INDEX wiki_user ON staging.month_gs_impression (wiki, user_id);

CREATE TABLE staging.month_gs_user
SELECT
    ssac.wiki,
    ssac.event_userId AS user_id,
    cta_type,
    timestamp AS user_registration_approx
FROM log.ServerSideAccountCreation_5487345 ssac
LEFT JOIN staging.month_gs_impression impression ON
    ssac.wiki = impression.wiki AND
    ssac.event_userId = impression.user_id
WHERE 
    event_isSelfMade AND
    NOT event_displayMobile AND
    ssac.timestamp BETWEEN 
        @deployment AND
        DATE_FORMAT(DATE_ADD(@deployment, INTERVAL 30 DAY), "%Y%m%d%H%i%S") AND
    ssac.wiki IN ("astwiki",    "bswiki", "cawiki", "dawiki", "dewiki", "elwiki",
                  "enwiki",     "eswiki", "fawiki", "frwiki", "fowiki", "glwiki",
                  "hewiki",     "huwiki", "iswiki", "itwiki", "kowiki", "lbwiki",
                  "mkwiki",     "mlwiki", "nlwiki", "plwiki", "ptwiki", "ruwiki",
                  "simplewiki", "svwiki", "viwiki", "ukwiki", "zhwiki", "jawiki")
GROUP BY 1,2;
CREATE UNIQUE INDEX wiki_user ON staging.month_gs_user (wiki, user_id);

SELECT NOW(), COUNT(*) FROM staging.month_gs_user;

After I intersect the two, then I can answer all but the revert question. --Halfak (WMF) (talk) 23:55, 24 March 2014 (UTC)[reply]