Research talk:Anonymous mobile editing in Italian Wikipedia/Work log/2014-11-15

From Meta, a Wikimedia project coordination wiki

Saturday, November 15, 2014[edit]

Today, I'm just trying to get a sense for how the rate of anonymous mobile editing has changed over time. We should be able to see two phase shifts -- one when the change was deployed on 20141101 and another when caches were cleared on 20141103.


Edits[edit]

http://quarry.wmflabs.org/query/948

SELECT 
	LEFT(rev_timestamp, 8) AS date,
	COUNT(*)
FROM revision
INNER JOIN change_tag ON 
	ct_rev_id = rev_id AND
	ct_tag = "mobile edit"
WHERE 
	rev_timestamp BETWEEN "20141020" AND "20141114" AND
	rev_user = 0
GROUP BY date;

+----------+----------+
| date     | COUNT(*) |
+----------+----------+
| 20141020 |       48 |
| 20141021 |       54 |
| 20141022 |       43 |
| 20141023 |       45 |
| 20141024 |       70 |
| 20141025 |       40 |
| 20141026 |       67 |
| 20141027 |       46 |
| 20141028 |       63 |
| 20141029 |       39 |
| 20141030 |       97 |
| 20141031 |       66 |
| 20141101 |      161 |
| 20141102 |      235 |
| 20141103 |      140 |
| 20141104 |      138 |
| 20141105 |      151 |
| 20141106 |      148 |
| 20141107 |      173 |
| 20141108 |      150 |
| 20141109 |      226 |
| 20141110 |      167 |
| 20141111 |      151 |
| 20141112 |      184 |
| 20141113 |      166 |
+----------+----------+
25 rows in set (0.60 sec)

Well, we can clearly see the step, but it looks like there was a substantial amount of anon mobile edits before. How did this happen? Let's go look at some.

http://quarry.wmflabs.org/query/949

SELECT
	rev_id,
	rev_user,
	rev_user_text
FROM revision
INNER JOIN change_tag ON
	ct_rev_id = rev_id AND
	ct_tag = "mobile edit"
WHERE
	rev_timestamp BETWEEN "20141025" AND "20141026" AND
	rev_user = 0
LIMIT 10

+----------+----------+----------------+
| rev_id   | rev_user | rev_user_text  |
+----------+----------+----------------+
| 68818976 |        0 | 79.37.191.166  |
| 68819692 |        0 | 37.119.239.89  |
| 68819834 |        0 | 37.119.239.89  |
| 68820067 |        0 | 151.47.42.64   |
| 68820386 |        0 | 5.90.73.55     |
| 68820978 |        0 | 31.156.208.77  |
| 68821394 |        0 | 2.36.220.222   |
| 68821622 |        0 | 151.18.23.60   |
| 68821628 |        0 | 89.204.130.194 |
| 68822287 |        0 | 87.18.148.193  |
+----------+----------+----------------+
10 rows in set (4.55 sec)

... OK. So it looks like these are tagged with "mobile edit" and "mobile app edit". So, it looks like I need to filter those app edits out.

SELECT 
	LEFT(rev_timestamp, 8) AS date,
	COUNT(*)
FROM revision
INNER JOIN tag_summary ON 
	ts_rev_id = rev_id AND
	ts_tags LIKE "%mobile edit%" AND 
    ts_tags NOT LIKE "%mobile app edit%"
WHERE 
	rev_timestamp BETWEEN "20141028" AND "20141114" AND
	rev_user = 0
GROUP BY date;
+----------+----------+
| date     | COUNT(*) |
+----------+----------+
| 20141101 |       74 |
| 20141102 |       93 |
| 20141103 |       92 |
| 20141104 |       72 |
| 20141105 |       77 |
| 20141106 |       62 |
| 20141107 |       80 |
| 20141108 |       86 |
| 20141109 |      114 |
| 20141110 |      116 |
| 20141111 |      103 |
| 20141112 |      110 |
| 20141113 |      114 |
+----------+----------+
13 rows in set (0.40 sec)

OK. That looks much better. We don't see any edits at all until day 1. However, if there is a second event, it looks like it might have happened on 11/09, a Sunday.

Next, I wonder how this compares to the overall rate of editing activity.

http://quarry.wmflabs.org/query/951

SELECT
    date,
    mobile_edits.count,
    edits.count,
    mobile_edits.count/edits.count AS proportion
FROM (
    SELECT 
        LEFT(rev_timestamp, 8) AS date,
        COUNT(*) AS count
    FROM revision
    INNER JOIN tag_summary ON 
        ts_rev_id = rev_id AND
        ts_tags LIKE "%mobile edit%" AND 
        ts_tags NOT LIKE "%mobile app edit%"
    WHERE 
        rev_timestamp BETWEEN "20141028" AND "20141114" AND
        rev_user = 0
    GROUP BY date
) AS mobile_edits
INNER JOIN (
    SELECT 
        LEFT(rev_timestamp, 8) AS date,
        COUNTimestamp BETWEEN "20141101" AND "20141114" AND
        rev_user = 0
    GROUP BY date
) AS edits USING (date);(*) AS count
    FROM revision
    WHERE 
        rev_timestamp BETWEEN "20141101" AND "20141114" AND
        rev_user = 0
    GROUP BY date
) AS edits USING (date);

+----------+-------+-------+------------+
| date     | count | count | proportion |
+----------+-------+-------+------------+
| 20141101 |    74 |  3553 |     0.0208 |
| 20141102 |    93 |  3628 |     0.0256 |
| 20141103 |    92 |  3654 |     0.0252 |
| 20141104 |    72 |  3609 |     0.0200 |
| 20141105 |    77 |  3341 |     0.0230 |
| 20141106 |    62 |  3790 |     0.0164 |
| 20141107 |    80 |  3640 |     0.0220 |
| 20141108 |    86 |  3304 |     0.0260 |
| 20141109 |   114 |  3899 |     0.0292 |
| 20141110 |   116 |  3930 |     0.0295 |
| 20141111 |   103 |  3758 |     0.0274 |
| 20141112 |   110 |  3241 |     0.0339 |
| 20141113 |   114 |  3254 |     0.0350 |
+----------+-------+-------+------------+
13 rows in set (0.36 sec)

It looks like anonymous mobile edits make up 2-3% of all anonymous activity.

At least for recent edits, you can just use the tag "mobile web edit".[1] I don't remember the exact day when the mobile tag was "split". --Nemo 09:09, 16 November 2014 (UTC)[reply]

Registrations[edit]

SELECT
    LEFT(timestamp, 8) AS date,
    COUNT(*) AS registrations
FROM ServerSideAccountCreation_5487345
WHERE 
    wiki = "itwiki" AND
    event_isSelfMade = 1 AND
    event_displayMobile = 1 AND
    timestamp BETWEEN "20141028" AND "20141107"
GROUP BY date;

+----------+---------------+
| date     | registrations |
+----------+---------------+
| 20141028 |            77 |
| 20141029 |            80 |
| 20141030 |            73 |
| 20141031 |            51 |
| 20141101 |            64 |
| 20141102 |            80 |
| 20141103 |            66 |
| 20141104 |            66 |
| 20141105 |            63 |
| 20141106 |            70 |
+----------+---------------+
10 rows in set (0.03 sec)

There doesn't seem to be any dramatic dip in the raw number of registrations, but it could be dropping. I'll need to do some timeseries analysis to extract the seasonal component.

OK. Time to make a datafile. --EpochFail (talk) 19:09, 15 November 2014 (UTC)[reply]

Timeseries analysis[edit]

OK. So I did some simple component extraction (7 days) and plotted the trends.

The number of anonymous edits per day is plotted with a weekly seasonal component modeled and the extracted trend overlayed. A vertical line is plotted on Nov. 1st.
Anonymous edits. The number of anonymous edits per day is plotted with a weekly seasonal component modeled and the extracted trend overlayed. A vertical line is plotted on Nov. 1st.
The number of mobile edits per day is plotted with a weekly seasonal component modeled and the extracted trend overlayed. A vertical line is plotted on Nov. 1st.
Mobile edits. The number of mobile edits per day is plotted with a weekly seasonal component modeled and the extracted trend overlayed. A vertical line is plotted on Nov. 1st.
The number of mobile anonymous edits per day is plotted with a weekly seasonal component modeled and the extracted trend overlayed. A vertical line is plotted on Nov. 1st.
Mobile anonymous edits. The number of mobile anonymous edits per day is plotted with a weekly seasonal component modeled and the extracted trend overlayed. A vertical line is plotted on Nov. 1st.
The number of mobile registration per day is plotted with a weekly seasonal component modeled and the extracted trend overlayed. A vertical line is plotted on Nov. 1st.
Mobile registrations. The number of mobile registration per day is plotted with a weekly seasonal component modeled and the extracted trend overlayed. A vertical line is plotted on Nov. 1st.

Nothing surprising here. Regretfully, we have some trendy behavior well before the test started, so it's hard to see what effect anonymous editing is having. However, there's no clear deviation from the trend for mobile registrations -- which makes sense given the strong wording between mobile editors and saving an anonymous edit.

Next time I want to look at RC patroller workload. --EpochFail (talk) 20:30, 15 November 2014 (UTC)[reply]

I wonder if the October drop in edits corresponds to the CentralNotice fundraising campaign which run for mobile and desktop. CentralNotice is known to have a heavy impact on performance, among other things. According to the fundraising CSV data, it run from 2014-09-29 to 2014-11-03 (but it probably had a varying intensity in the period):
medium    campaign    count    usdstd    stop_date    usdmax    sum    avg    start_date
sitenotice    C14_itIT_dsk_FR    77050    10.505104    2014-11-01T10:12:45    630.1 608259.93    7.894353    2014-09-29T08:04:40
sitenotice    C14_itIT_mob_FR    33199    8.075508    2014-11-03T21:50:44    136.08 204223.4    6.151493    2014-09-29T08:04:46 
--Nemo 09:33, 16 November 2014 (UTC)[reply]