Research talk:Onboarding new Wikipedians/Rollout/Work log/2014-03-25

From Meta, a Wikimedia project coordination wiki

Tuesday, March 25th[edit]

Back to hacking today. I should actually have some plots this time. First I need to check my data for gs_users.

> SELECT
    ->     cta_type,
    ->     COUNT(*) AS users,
    ->     SUM(day_revisions > 0) AS day_editor,
    ->     SUM(day_revisions > 0)/COUNT(*) AS proportion
    -> FROM month_gs_user
    -> INNER JOIN gs_user_day_edits USING (wiki, user_id)
    -> GROUP BY 1;
+---------------------------+-------+------------+------------+
| cta_type                  | users | day_editor | proportion |
+---------------------------+-------+------------+------------+
| NULL                      | 75341 |      22533 |     0.2991 |
| edit current              | 19982 |       6064 |     0.3035 |
| edit current or suggested | 56878 |      19241 |     0.3383 |
| suggested                 | 66767 |      18351 |     0.2749 |
+---------------------------+-------+------------+------------+
4 rows in set (2.63 sec)

--Halfak (WMF) (talk) 15:16, 25 March 2014 (UTC)[reply]

... --> [GS wiki: 218,968 (80.16%)] --> [No CTA:         75,341 (34.41%)] --> [24h edit: 22,533 (29.91%)]
                                   '--> [Edit current:   19,982  (9.13%)] --> [24h edit:  6,064 (30.35%)]
                                   '--> [Suggest only:   66,767 (30.49%)] --> [24h edit: 18,351 (27.49%)]
                                   '--> [Edit & suggest: 56,878 (25.98%)] --> [24h edit: 19,241 (33.83%)]

Now to check who makes a GS edit.

> SELECT
    ->     cta_type,
    ->     COUNT(*) AS users,
    ->     SUM(day_gs_revisions > 0) AS gs_editor,
    ->     SUM(day_gs_revisions > 0)/COUNT(*) AS proportion
    -> FROM month_gs_user
    -> INNER JOIN gs_user_day_edits USING (wiki, user_id)
    -> WHERE day_revisions > 0
    -> GROUP BY 1;
+---------------------------+-------+-----------+------------+
| cta_type                  | users | gs_editor | proportion |
+---------------------------+-------+-----------+------------+
| NULL                      | 22533 |      5374 |     0.2385 |
| edit current              |  6064 |      4064 |     0.6702 |
| edit current or suggested | 19241 |     12444 |     0.6467 |
| suggested                 | 18351 |      3328 |     0.1814 |
+---------------------------+-------+-----------+------------+
4 rows in set (1.02 sec)
... --> [GS wiki: 218,968 (80.16%)] --> [No CTA:         75,341 (34.41%)] --> [24h edit: 22,533 (29.91%)] --> [24h gs edit:  5,374 (23.85%)]
                                   '--> [Edit current:   19,982  (9.13%)] --> [24h edit:  6,064 (30.35%)] --> [24h gs edit:  4,064 (67.02%)]
                                   '--> [Suggest only:   66,767 (30.49%)] --> [24h edit: 18,351 (27.49%)] --> [24h gs edit:  3,328 (18.14%)]
                                   '--> [Edit & suggest: 56,878 (25.98%)] --> [24h edit: 19,241 (33.83%)] --> [24h gs edit: 12,444 (64.67%)]

OK. Time to wrap it up into a figure.

--Halfak (WMF) (talk) 15:26, 25 March 2014 (UTC)[reply]


A proportional funnel is shown for the flow from newly registered users on all projects to wikis with GettingStarted installed (30 wikis) to making edits with GS.
Group funnel proportions. A proportional funnel is shown for the flow from newly registered users on all projects to wikis with GettingStarted installed (30 wikis) to making edits with GS.

It looks like a substantial amount of editors who do not see a CTA are still making an edit that is labeled as "gettingstarted edit". I have two hypotheses:

  1. We flag edits to any page that is redirected back to as a GS edit -- that means when we redirect a user back to a talk page, we'll show them no CTA, but we'll still record edits they make to that talk page as "gettingstarted edit"s.
  2. Some users don't have javascript enabled. We could have shown them a CTA, but we couldn't. However, we still somehow managed to record the fact that they edited the page they were redirected back to.

What's interesting is that the users who saw no CTA were more likely to make a GS edit than those who say only the option to be suggested work. --Halfak (WMF) (talk) 16:01, 25 March 2014 (UTC)[reply]

--Halfak (WMF) (talk) 16:01, 25 March 2014 (UTC)[reply]


Alright, let's look at some examples of a GS edit coming from an editor who saw no CTA. Sadly, I'll have to look at enwiki because it is on the same database server as our logs.

> SELECT
    ->     page_namespace,
    ->     COUNT(*)
    -> FROM staging.month_gs_user
    -> INNER JOIN enwiki.revision ON
    ->     rev_user = user_id AND
    ->     rev_timestamp BETWEEN 
    ->         user_registration_approx AND 
    ->         DATE_FORMAT(DATE_ADD(user_registration_approx, INTERVAL 1 DAY), "%Y%m%d%H%i%S")
    -> INNER JOIN enwiki.change_tag ON
    ->     rev_id = ct_rev_id AND
    ->     ct_tag = "gettingstarted edit"
    -> INNER JOIN enwiki.page ON
    ->     page_id = rev_page
    -> WHERE wiki = "enwiki" AND cta_type IS NULL
    -> GROUP BY 1;
+----------------+----------+
| page_namespace | COUNT(*) |
+----------------+----------+
|              0 |      218 |
+----------------+----------+
1 row in set (1.22 sec)

So, it looks like all of the edits went to ns 0. Let's look at non-null cta_types.


> SELECT
    ->     page_namespace,
    ->     COUNT(*)
    -> FROM staging.month_gs_user
    -> INNER JOIN enwiki.revision ON
    ->     rev_user = user_id AND
    ->     rev_timestamp BETWEEN 
    ->         user_registration_approx AND 
    ->         DATE_FORMAT(DATE_ADD(user_registration_approx, INTERVAL 1 DAY), "%Y%m%d%H%i%S")
    -> INNER JOIN enwiki.change_tag ON
    ->     rev_id = ct_rev_id AND
    ->     ct_tag = "gettingstarted edit"
    -> INNER JOIN enwiki.page ON
    ->     page_id = rev_page
    -> WHERE wiki = "enwiki" AND cta_type IS NOT NULL
    -> GROUP BY 1;
+----------------+----------+
| page_namespace | COUNT(*) |
+----------------+----------+
|              0 |    23700 |
|            118 |        1 |
+----------------+----------+
2 rows in set (13 min 28.71 sec)

Just one edit in 118 ("Draft") that was probably the result of a move. So no leads here. Let's look at whether these "No CTA" users had an impression event with no cta_type or no event at all. --Halfak (WMF) (talk) 18:42, 25 March 2014 (UTC)[reply]


> SELECT
    ->     cta_type,
    ->     impression.id IS NULL,
    ->     COUNT(*)
    -> FROM staging.month_gs_user user
    -> INNER JOIN staging.gs_user_day_edits edits USING (wiki, user_id)
    -> LEFT JOIN log.GettingStartedRedirectImpression_7355552 impression ON
    ->     user.wiki = impression.wiki AND
    ->     user_id = event_userId
    -> WHERE day_gs_revisions > 0
    -> GROUP BY 1,2;
+---------------------------+-----------------------+----------+
| cta_type                  | impression.id IS NULL | COUNT(*) |
+---------------------------+-----------------------+----------+
| NULL                      |                     0 |     4602 |
| NULL                      |                     1 |      996 |
| edit current              |                     0 |     5194 |
| edit current or suggested |                     0 |    13433 |
| suggested                 |                     0 |     3545 |
+---------------------------+-----------------------+----------+
5 rows in set (23 min 9.62 sec)

So, it looks like we have a recorded event for about 4/5ths of users who did not get an impression but still make a GS edit. --Halfak (WMF) (talk) 19:20, 25 March 2014 (UTC)[reply]


It would be nice if I could tell the difference between edits that come through the toolbar and edits that are just an "edit current" on the redirected page. Wait... I can. There's Schema:GettingStartedNavbar.

OK. So that one schema accounts for an asston of different actions, but I think that I don't care. If we have even a single action recorded for GettingStartedNavbar, then we know the user ought to have at least seen it. --Halfak (WMF) (talk)


> SELECT
    ->     navbar_impressions > 0,
    ->     COUNT(*)
    -> FROM (
    ->     SELECT
    ->         user.wiki,
    ->         edits.user_id,
    ->         COUNT(DISTINCT navbar.id) AS navbar_impressions
    ->     FROM staging.month_gs_user user
    ->     INNER JOIN staging.gs_user_day_edits edits ON
    ->         edits.wiki = user.wiki AND
    ->         edits.user_id = user.user_id
    ->     LEFT JOIN log.GettingStartedNavbar_5496876 navbar ON
    ->         user.wiki = navbar.wiki AND
    ->         user.user_id = navbar.event_userId
    ->     WHERE day_gs_revisions > 0 AND cta_type IS NULL 
    ->     GROUP BY 1,2
    -> ) AS user_impressions
    -> GROUP BY 1;
+------------------------+----------+
| navbar_impressions > 0 | COUNT(*) |
+------------------------+----------+
|                      0 |     5374 |
+------------------------+----------+
1 row in set (3 min 33.85 sec)

So, no impressions of the navbar. What the heck? How are these guys saving GS edits!?! --Halfak (WMF) (talk) 00:06, 26 March 2014 (UTC)[reply]


This note from Wed. March 26th: The query above is wrong because logging for GettingStartedNavbar stopped in Oct. 2013. --Halfak (WMF) (talk) 16:19, 26 March 2014 (UTC)[reply]