Research talk:Active editor spike 2015/Work log/2015-03-22

From Meta, a Wikimedia project coordination wiki

Sunday, March 22, 2015[edit]

Got new data. I'm currently updating the bits that tell me how a user registered. That will be important for doing the breakdowns, but for now I can look at overall stats.

Query
> select count(*) from editor_month;
+----------+
| count(*) |
+----------+
| 41013773 |
+----------+
1 row in set (0.02 sec)

Let's look at enwiki's January's

Query
> select month, count(*) from editor_month where wiki = "enwiki" and month like "%01" and revisions >= 5 group by 1;
+--------+----------+
| month  | count(*) |
+--------+----------+
| 200101 |        4 |
| 200201 |      137 |
| 200301 |      561 |
| 200401 |     1724 |
| 200501 |     7406 |
| 200601 |    34839 |
| 200701 |    72916 |
| 200801 |    61884 |
| 200901 |    56696 |
| 201001 |    52340 |
| 201101 |    48680 |
| 201201 |    45255 |
| 201301 |    43505 |
| 201401 |    41935 |
| 201501 |    42920 |
+--------+----------+
15 rows in set (16.60 sec)

How about february's?

Query
> select month, count(*) from editor_month where wiki = "enwiki" and month like "%02" and revisions >= 5 group by 1;
+--------+----------+
| month  | count(*) |
+--------+----------+
| 200102 |        8 |
| 200202 |      168 |
| 200302 |      616 |
| 200402 |     2359 |
| 200502 |     7446 |
| 200602 |    36901 |
| 200702 |    73340 |
| 200802 |    61864 |
| 200902 |    54648 |
| 201002 |    50614 |
| 201102 |    47619 |
| 201202 |    44829 |
| 201302 |    40983 |
| 201402 |    39732 |
| 201502 |    41925 |
+--------+----------+
15 rows in set (16.44 sec)

Definitely something showing up in the year-over-year. Let's try all the wikis.

Query big wikis
> SELECT
    ->   wiki,
    ->   last_year.ae AS last_year_ae,
    ->   this_year.ae - last_year.ae AS change_abs,
    ->   (this_year.ae - last_year.ae)/last_year.ae AS change_prop
    -> FROM (
    ->   SELECT
    ->     wiki,
    ->     COUNT(*) as ae
    ->   FROM editor_month
    ->   WHERE month LIKE "2015%" AND (month LIKE "%01" OR month LIKE "%02")
    ->   AND revisions >= 5
    ->   GROUP BY wiki
    -> ) AS this_year
    -> LEFT JOIN (
    ->   SELECT
    ->     wiki,
    ->     COUNT(*) as ae
    ->   FROM editor_month
    ->   WHERE month LIKE "2014%" AND (month LIKE "%01" OR month LIKE "%02")
    ->   AND revisions >= 5
    ->   GROUP BY wiki
    -> ) AS last_year USING (wiki)
    -> WHERE last_year.ae >= 10000
    -> ORDER BY change_prop DESC
    -> LIMIT 50;
+--------------+--------------+------------+-------------+
| wiki         | last_year_ae | change_abs | change_prop |
+--------------+--------------+------------+-------------+
| wikidatawiki |        10282 |       2229 |      0.2168 |
| commonswiki  |        20338 |       1475 |      0.0725 |
| enwiki       |        81667 |       3178 |      0.0389 |
| frwiki       |        11720 |       -367 |     -0.0313 |
| dewiki       |        16672 |      -1139 |     -0.0683 |
+--------------+--------------+------------+-------------+
5 rows in set (18.09 sec)

We see wikidatawiki expressing a high growth rate given it's size at 20 points. Commonswiki and enwiki also had substantial growth, but frwiki and dewiki saw decline.

What about the next step down in wiki-size?

Query middle wikis
> SELECT
    ->   wiki,
    ->   last_year.ae AS last_year_ae,
    ->   this_year.ae - last_year.ae AS change_abs,
    ->   (this_year.ae - last_year.ae)/last_year.ae AS change_prop
    -> FROM (
    ->   SELECT
    ->     wiki,
    ->     COUNT(*) as ae
    ->   FROM editor_month
    ->   WHERE month LIKE "2015%" AND (month LIKE "%01" OR month LIKE "%02")
    ->   AND revisions >= 5
    ->   GROUP BY wiki
    -> ) AS this_year
    -> LEFT JOIN (
    ->   SELECT
    ->     wiki,
    ->     COUNT(*) as ae
    ->   FROM editor_month
    ->   WHERE month LIKE "2014%" AND (month LIKE "%01" OR month LIKE "%02")
    ->   AND revisions >= 5
    ->   GROUP BY wiki
    -> ) AS last_year USING (wiki)
    -> WHERE last_year.ae BETWEEN 1000 and 10000
    -> ORDER BY change_prop DESC
    -> LIMIT 50;
+----------+--------------+------------+-------------+
| wiki     | last_year_ae | change_abs | change_prop |
+----------+--------------+------------+-------------+
| trwiki   |         1489 |        511 |      0.3432 |
| idwiki   |         1098 |        351 |      0.3197 |
| arwiki   |         1992 |        590 |      0.2962 |
| ukwiki   |         1701 |        358 |      0.2105 |
| kowiki   |         1850 |        321 |      0.1735 |
| cawiki   |         1152 |        164 |      0.1424 |
| zhwiki   |         5130 |        664 |      0.1294 |
| eswiki   |         9901 |       1205 |      0.1217 |
| metawiki |         1725 |        171 |      0.0991 |
| ptwiki   |         3893 |        307 |      0.0789 |
| cswiki   |         1585 |         61 |      0.0385 |
| jawiki   |         8528 |        249 |      0.0292 |
| svwiki   |         1959 |          2 |      0.0010 |
| itwiki   |         6636 |       -113 |     -0.0170 |
| hewiki   |         1919 |        -41 |     -0.0214 |
| plwiki   |         3432 |       -137 |     -0.0399 |
| nlwiki   |         3248 |       -147 |     -0.0453 |
| fiwiki   |         1342 |        -86 |     -0.0641 |
| ruwiki   |         9204 |       -627 |     -0.0681 |
| fawiki   |         2255 |       -194 |     -0.0860 |
| huwiki   |         1389 |       -133 |     -0.0958 |
| nowiki   |         1167 |       -196 |     -0.1680 |
+----------+--------------+------------+-------------+
22 rows in set (17.14 sec)

There's a lot more shrinkage happening here, but still the majority of wikis saw growth. Once we go beneath this level we'll have too many wikis to look at individually, so I'd like some summary statistic. I think I'm going to go with the overall absolute change sum for each group.

Query totals
> SELECT
    ->   IF(jan_counts.last_year_ae < 1000, "small",
    ->     IF(jan_counts.last_year_ae BETWEEN 1000 AND 10000, "medium", "large")) AS size,
    ->   SUM(jan_counts.last_year_ae) last_january,
    ->   SUM(jan_counts.change_abs) AS january_change,
    ->   SUM(jan_counts.change_abs)/SUM(jan_counts.last_year_ae) AS january_change_prop,
    ->   SUM(feb_counts.last_year_ae) last_february,
    ->   SUM(feb_counts.change_abs) AS february_change,
    ->   SUM(feb_counts.change_abs)/SUM(feb_counts.last_year_ae) AS february_change_prop
    -> FROM (
    ->   SELECT
    ->     wiki,
    ->     last_year.ae AS last_year_ae,
    ->     this_year.ae - last_year.ae AS change_abs,
    ->     (this_year.ae - last_year.ae)/last_year.ae AS change_prop
    ->   FROM (
    ->     SELECT
    ->       wiki,
    ->       COUNT(*) as ae
    ->     FROM editor_month
    ->     WHERE month LIKE "2015%" AND (month LIKE "%01")
    ->     AND revisions >= 5
    ->     GROUP BY wiki
    ->   ) AS this_year
    ->   LEFT JOIN (
    ->     SELECT
    ->       wiki,
    ->       COUNT(*) as ae
    ->     FROM editor_month
    ->     WHERE month LIKE "2014%" AND (month LIKE "%01")
    ->     AND revisions >= 5
    ->     GROUP BY wiki
    ->   ) AS last_year USING (wiki)
    -> ) AS jan_counts
    -> LEFT JOIN (
    ->   SELECT
    ->     wiki,
    ->     last_year.ae AS last_year_ae,
    ->     this_year.ae - last_year.ae AS change_abs,
    ->     (this_year.ae - last_year.ae)/last_year.ae AS change_prop
    ->   FROM (
    ->     SELECT
    ->       wiki,
    ->       COUNT(*) as ae
    ->     FROM editor_month
    ->     WHERE month LIKE "2015%" AND (month LIKE "%02")
    ->     AND revisions >= 5
    ->     GROUP BY wiki
    ->   ) AS this_year
    ->   LEFT JOIN (
    ->     SELECT
    ->       wiki,
    ->       COUNT(*) as ae
    ->     FROM editor_month
    ->     WHERE month LIKE "2014%" AND (month LIKE "%02")
    ->     AND revisions >= 5
    ->     GROUP BY wiki
    ->   ) AS last_year USING (wiki)
    -> ) AS feb_counts USING (wiki)
    -> GROUP BY 1;
+--------+--------------+----------------+---------------------+---------------+-----------------+----------------------+
| size   | last_january | january_change | january_change_prop | last_february | february_change | february_change_prop |
+--------+--------------+----------------+---------------------+---------------+-----------------+----------------------+
| large  |        52253 |           2761 |              0.0528 |         49752 |            1892 |               0.0380 |
| medium |        50333 |            480 |              0.0095 |         46438 |            2001 |               0.0431 |
| small  |        22602 |           -248 |             -0.0110 |         19827 |            1603 |               0.0808 |
+--------+--------------+----------------+---------------------+---------------+-----------------+----------------------+
3 rows in set (33.47 sec)
size January February
total ae yoy change yoy prop change total ae yoy change yoy prop change
large 52253 2761 0.0528 49752 1892 0.0380
medium 50333 480 0.0095 46438 2001 0.0431
small 22602 -248 -0.0110 19827 1603 0.0808

OK. So it looks like the trend is shared across big and small wikis when examined as groups. Next, I'll be working on figuring out which sub-group is most affected by this trend. --Halfak (WMF) (talk) 17:15, 22 March 2015 (UTC)[reply]