Research talk:VisualEditor's effect on newly registered editors/May 2015 study/Work log/2015-05-28

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Thursday, May 28, 2015[edit]

Today, I hope to gather some basic stats on block rates and do a basic analysis of time to complete edits and edit completion rates.


Block rates and types[edit]

Since it's morning and I'm still working on my coffee, let's do the easy one. Block rates.

> select user_id, user_name, log_comment, log_params from enwiki.logging INNER JOIN enwiki.user ON REPLACE(log_title, "_", " ") = user_name INNER JOIN staging.ve2_pilot_users USING (user_id) WHERE log_type = "block" and log_action = "block" and log_timestamp BETWEEN "2015052115" AND "2015052215" limit 3;
+----------+----------------------------------------------------------------------------+---------------------------------------------------------------------+---------------------------------------------------------------------------+
| user_id  | user_name                                                                  | log_comment                                                         | log_params                                                                |
+----------+----------------------------------------------------------------------------+---------------------------------------------------------------------+---------------------------------------------------------------------------+
| 25251960 | FastechMedia1                                                              | {{uw-spamublock}}                                                   | a:2:{s:11:"5::duration";s:10:"indefinite";s:8:"6::flags";s:8:"nocreate";} |
| 25252283 | Dumdddum                                                                   | [[WP:Vandalism-only account|Vandalism-only account]]                | a:2:{s:11:"5::duration";s:10:"indefinite";s:8:"6::flags";s:8:"nocreate";} |
| 25252071 | Incijanvijnijvcbi2juyhe2y8c28ybhij byv8ibbvy81bhwvye2f8cjbqjne2cbiuejnb2kb | {{uw-vaublock}} <!-- Username violation, vandalism-only account --> | a:2:{s:11:"5::duration";s:10:"indefinite";s:8:"6::flags";s:8:"nocreate";} |
+----------+----------------------------------------------------------------------------+---------------------------------------------------------------------+---------------------------------------------------------------------------+
3 rows in set (0.00 sec)

It looks like we should be able to extract information about whether the block was for vandalism or not.

> select SUM(log_comment RLIKE "(S|s)pam"), SUM(log_comment RLIKE "(V|v)and"), COUNT(*) from enwiki.logging INNER JOIN enwiki.user ON REPLACE(log_title, "_", " ") = user_name INNER JOIN staging.ve2_pilot_users USING (user_id) WHERE log_type = "block" and log_action = "block" and log_timestamp BETWEEN "2015052115" AND "2015052215" limit 3;
+-----------------------------------+-----------------------------------+----------+
| SUM(log_comment RLIKE "(S|s)pam") | SUM(log_comment RLIKE "(V|v)and") | COUNT(*) |
+-----------------------------------+-----------------------------------+----------+
|                                43 |                                30 |      124 |
+-----------------------------------+-----------------------------------+----------+
1 row in set (1.11 sec)

Hmm If not spam or vandalism, what else?

> select user_name, log_comment from enwiki.logging INNER JOIN enwiki.user ON REPLACE(log_title, "_", " ") = user_name INNER JOIN staging.ve2_pilot_users USING (user_id) WHERE log_type = "block" and log_action = "block" and log_timestamp BETWEEN "2015052115" AND "2015052215" AND log_comment NOT RLIKE "(S|s)pam" AND log_comment NOT RLIKE "(V|v)and" limit 10,10;
+------------------------------------------+----------------------------------------------+
| user_name                                | log_comment                                  |
+------------------------------------------+----------------------------------------------+
| HURDEHGURDEHHURDEHGURDEH                 | {{checkuserblock-account}}                   |
| DUNNADUNNADUNNADUNNADUNNADUNNADUNNADUNNA | {{checkuserblock-account}}                   |
| Hgtresgtedgtredgtre                      | {{checkuserblock-account}}                   |
| Clioriosa                                | [[WP:Disruptive editing|Disruptive editing]] |
| Kittywankfaster                          | {{uw-vaublock}}                              |
| Swangshit                                | {{uw-vaublock}}                              |
| Donkeypunch101                           | {{uw-vaublock}}                              |
| Shittycumsquats                          | {{uw-vaublock}}; see the edit filter log     |
| Butt head 458                            | {{uw-vaublock}}                              |
| Huffgrade                                | [[WP:Disruptive editing|Disruptive editing]] |
+------------------------------------------+----------------------------------------------+
10 rows in set (0.56 sec)

Looks like en:Template:Uw-vaublock is about vandalism and so is Disruptive editing. Let's update our regex.

> select user_name, log_comment from enwiki.logging INNER JOIN enwiki.user ON REPLACE(log_title, "_", " ") = user_name INNER JOIN staging.ve2_pilot_users USING (user_id) WHERE log_type = "block" and log_action = "block" and log_timestamp BETWEEN "2015052115" AND "2015052215" AND log_comment NOT RLIKE "(S|s)pam" AND log_comment NOT RLIKE "((V|v)and|(D|d)isrupt|(U|u)w-vaublock|(A|a)bus(e|ing))" AND log_comment NOT RLIKE "((S|s)ock|(U|u)ser)";
+------------------------+-----------------------------------------------------------------------------------------+
| user_name              | log_comment                                                                             |
+------------------------+-----------------------------------------------------------------------------------------+
| JamaicanStonSale       | {{uw-softerblock}}                                                                      |
| $wagggg666420hailsatan | {{uw-ublock}}                                                                           |
| Hamilton4195           | Deliberately triggering the [[WP:Edit filter|Edit filter]]                              |
| Morlvi471              | [[WP:Blocking policy#Evasion of blocks|Block evasion]]                                  |
| 420blaze itdopeswag    | Clearly [[WP:NOTHERE|not here to contribute to the encyclopedia]]                       |
| Gullswealsslslsl       | [[WP:No personal attacks|Personal attacks]] or [[WP:Harassment|harassment]]             |
| MarkCohen65            | Creating [[WP:Attack page|attack pages]]                                                |
| Daytonowen             | [[WP:Blocking policy#Evasion of blocks|Block evasion]]                                  |
| Anandasyarif           | Deliberately triggering the [[WP:Edit filter|Edit filter]]: also clearly [[WP:NOTHERE]] |
+------------------------+-----------------------------------------------------------------------------------------+
9 rows in set (0.04 sec)
> select user_name, log_comment from enwiki.logging INNER JOIN enwiki.user ON REPLACE(log_title, "_", " ") = user_name INNER JOIN staging.ve2_pilot_users USING (user_id) WHERE log_type = "block" and log_action = "block" and log_timestamp BETWEEN "2015052115" AND "2015052215" AND log_comment NOT RLIKE "(S|s)pam" AND log_comment NOT RLIKE "((V|v)and|(D|d)isrupt|(U|u)w-vaublock|(A|a)bus(e|ing)|(A|a)ttack|(D|d)eliberate|NOTHERE)" AND log_comment NOT RLIKE "((S|s)ock|(C|c)heckuser|(E|e)vasion)" AND log_comment NOT RLIKE "softerblock|soft block" AND log_comment NOT RLIKE "(U|u)w-uhblock|user\.\.\.|(U|u)w-ublock";
Empty set (0.10 sec)

OK. Time to encode this into a field.

> SELECT
    ->   LEFT(user_name, 25),
    ->   LEFT(log_comment, 25),
    ->   IF(log_comment RLIKE "(S|s)pam", "spam",
    ->     IF(log_comment RLIKE "((V|v)and|(D|d)isrupt|(U|u)w-vaublock|(A|a)bus(e|ing)|(A|a)ttack|(D|d)eliberate|NOTHERE)", "vandalism",
    ->     IF(log_comment RLIKE "((S|s)ock|(C|c)heckuser|(E|e)vasion)", "sock",
    ->     IF(log_comment RLIKE "softerblock|soft block", "soft username",
    ->     IF(log_comment RLIKE "(U|u)w-uhblock|user\.\.\.|(U|u)w-ublock", "hard username", 
    ->     "other"
    ->   ))))) AS type
    -> FROM enwiki.logging 
    -> INNER JOIN enwiki.user ON 
    ->   REPLACE(log_title, "_", " ") = user_name 
    -> INNER JOIN staging.ve2_pilot_users USING (user_id)
    -> WHERE 
    ->   log_type = "block" AND 
    ->   log_action = "block" AND 
    ->   log_timestamp BETWEEN "2015052115" and "2015052215"
    -> ORDER BY RAND() 
    -> LIMIT 25;
+---------------------------+---------------------------+---------------+
| LEFT(user_name, 25)       | LEFT(log_comment, 25)     | type          |
+---------------------------+---------------------------+---------------+
| MarkCohen65               | Creating [[WP:Attack page | vandalism     |
| TurtleLimited             | {{uw-spamublock}} <!-- Pr | spam          |
| Morlvi471                 | [[WP:Blocking policy#Evas | sock          |
| Agarwalpackers moverss    | [[WP:Spam|Spam]] / [[WP:N | spam          |
| Theherbaliststop          | {{uw-spamublock}} <!-- Pr | spam          |
| Crave Wrestling           | {{uw-spamublock}} <!-- Pr | spam          |
| Art Space Gallery London  | {{uw-softerblock}} <!-- P | soft username |
| Douglasleben              | spammer/spambot           | spam          |
| Chickenfucker1            | {{uw-uhblock}} <!-- Usern | hard username |
| Johnny D in the Land of C | {{blockedsock|John Daker} | sock          |
| CatalystConsulting        | {{uw-softerblock}} <!-- P | soft username |
| Zeshanshabbirofficial     | {{uw-spamublock}} <!-- Pr | spam          |
| Julanjg                   | [[WP:Vandalism-only accou | vandalism     |
| DUNNADUNNADUNNADUNNADUNNA | {{checkuserblock-account} | sock          |
| JamaicanStonSale          | {{uw-softerblock}}        | soft username |
| CJ Software Logistics     | {{uw-softerblock}} <!-- P | soft username |
| Jayron32 and Nyttend are  | user...                   | hard username |
| Himynamename              | Abusing [[WP:Sock puppetr | vandalism     |
| The Original John Daker   | {{blockedsock|John Daker} | sock          |
| TrollntemptationHD        | [[WP:Vandalism-only accou | vandalism     |
| Roy yuan                  | Using Wikipedia for [[WP: | spam          |
| Hamilton4195              | Deliberately triggering t | vandalism     |
| Cjohnson753               | [[WP:Vandalism-only accou | vandalism     |
| Alsancakuniforma          | {{uw-spamublock}} <!-- Pr | spam          |
| Agarwalpackersandmoversdr | {{uw-softerblock}} <!-- P | soft username |
+---------------------------+---------------------------+---------------+
25 rows in set (1.38 sec)

OK. It looks like that is working. Now to formalize it and extract the data.

SELECT
  user_id,
  IF(log_comment RLIKE "(S|s)pam", "spam",
    IF(log_comment RLIKE "((V|v)and|(D|d)isrupt|(U|u)w-vaublock|(A|a)bus(e|ing)|(A|a)ttack|(D|d)eliberate|NOTHERE)", "vandalism",
    IF(log_comment RLIKE "((S|s)ock|(C|c)heckuser|(E|e)vasion)", "sock",
    IF(log_comment RLIKE "softerblock|soft block", "soft username",
    IF(log_comment RLIKE "(U|u)w-uhblock|user\.\.\.|(U|u)w-ublock", "hard username", 
    "other"
  ))))) AS type
FROM enwiki.logging 
INNER JOIN enwiki.user ON 
  REPLACE(log_title, "_", " ") = user_name 
INNER JOIN staging.ve2_pilot_users USING (user_id)
WHERE 
  log_type = "block" AND 
  log_action = "block" AND 
  log_timestamp BETWEEN "2015052115" and "2015052215";
 head -n3 pilot_user_blocks.tsv;wc pilot_user_blocks.tsv 
user_id	type
25251960	spam
25252283	vandalism
 125  272 2182 pilot_user_blocks.tsv

Cool. Looks like that is ready to go. Now to do the hard work of looking at the logging data. --Halfak (WMF) (talk) 09:07, 28 May 2015 (UTC)

Edit session stats[edit]

OK. Now I'm digging into stats about edit completion rates and timing.

> SELECT
    ->   user_id,
    ->   event_editingSessionId AS session_id,
    ->   event_editor AS editor,
    ->   MIN(IF(event_action IN ('init', 'ready'), timestamp, NULL)) AS session_started,
    ->   MAX(timestamp) AS session_ended,
    ->   MIN(IF(event_action = 'ready', timestamp, NULL)) AS editor_ready,
    ->   MIN(IF(event_action IN ('saveAttempt', 'saveSuccess'), timestamp, NULL)) AS first_attempt,
    ->   IF(SUM(event_action = 'abort') > 0,
    ->     IF(SUM(`event_action.abort.type` = 'nochange') > 0, 'abort_nochange', 'abort'),
    ->     IF(SUM(event_action = 'saveSuccess') > 0, 'success', 'other')
    ->   ) AS outcome
    -> FROM staging.ve2_pilot_users
    -> INNER JOIN log.Edit_11448630 ON
    ->   wiki = 'enwiki' AND
    ->   user_id = `event_user.id`
    -> WHERE timestamp BETWEEN "2015052115" and "2015052215"
    -> GROUP BY user_id, session_id
    -> LIMIT 10;
+----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+
| user_id  | session_id                       | editor       | session_started | session_ended  | editor_ready   | first_attempt  | outcome        |
+----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+
| <snip>   | 781557474d435416                 | visualeditor | 20150521150126  | 20150521150601 | 20150521150126 | 20150521150500 | abort_nochange |
| <snip>   | 2c7ca865d011c19a6083c3363723f009 | wikitext     | 20150521181116  | 20150521181130 | 20150521181130 | NULL           | other          |
| <snip>   | 027f8c94e370bf558a85873055bfcd12 | wikitext     | 20150521184516  | 20150521184539 | 20150521184517 | NULL           | abort_nochange |
| <snip>   | 0f2960a6aca5d9196d4627dcb35431b8 | wikitext     | 20150521171238  | 20150521171304 | 20150521171239 | 20150521171304 | success        |
| <snip>   | 0f36411d0d26712cbe43f2dc226b3a8a | wikitext     | 20150521154401  | 20150521155150 | 20150521154402 | 20150521155150 | success        |
| <snip>   | 0ffe919e0b3d1d54638dd80085ba9987 | wikitext     | 20150521205239  | 20150521205248 | 20150521205241 | NULL           | abort_nochange |
| <snip>   | 1a3e0b75a38bd8b5bce36acdb87da48d | wikitext     | 20150521182659  | 20150521182934 | 20150521182701 | 20150521182858 | abort          |
| <snip>   | 1bca0ecc5a3e9ac33aedc2ce07bb39ec | wikitext     | 20150521161304  | 20150521161316 | 20150521161305 | NULL           | abort_nochange |
| <snip>   | 1bcb04f925a1d54bef5d5ca3abf16a37 | wikitext     | 20150521184624  | 20150521184641 | 20150521184625 | 20150521184641 | success        |
| <snip>   | 2243957369cc389fec3cebc50c3fc7d6 | wikitext     | 20150521152112  | 20150521152123 | 20150521152114 | NULL           | abort_nochange |
+----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+
10 rows in set (2.14 sec)

That looks pretty good, but I still see some "other" outcomes. Gotta remind myself what event_actions are possible to know if the "other" bucket is fine.

> select distinct event_action from Edit_11448630 WHERE wiki = 'enwiki' AND timestamp BETWEEN "2015052115" and "2015052215";
+--------------+
| event_action |
+--------------+
| init         |
| saveFailure  |
| saveAttempt  |
| saveSuccess  |
| ready        |
| abort        |
| saveIntent   |
+--------------+
7 rows in set (12.58 sec)

Well, we should never fail on 'saveIntent', but we could on 'saveFailure', so I'll look at that.

> SELECT
    ->   user_id,
    ->   event_editingSessionId AS session_id,
    ->   event_editor AS editor,
    ->   MIN(IF(event_action IN ('init', 'ready'), timestamp, NULL)) AS session_started,
    ->   MAX(timestamp) AS session_ended,
    ->   MIN(IF(event_action = 'ready', timestamp, NULL)) AS editor_ready,
    ->   MIN(IF(event_action IN ('saveAttempt', 'saveSuccess'), timestamp, NULL)) AS first_attempt,
    ->   IF(SUM(event_action = 'abort') > 0,
    ->     IF(SUM(`event_action.abort.type` = 'nochange') > 0, 
    ->       'abort_nochange', 
    ->       'abort'
    ->     ),
    ->     IF(SUM(event_action = 'saveSuccess') > 0, 
    ->       'success', 
    ->     IF(SUM(event_action = 'saveFailure') > 0, 
    ->       'failure',
    ->       'other'
    ->     ))
    ->   ) AS outcome
    -> FROM staging.ve2_pilot_users
    -> INNER JOIN log.Edit_11448630 ON
    ->   wiki = 'enwiki' AND
    ->   user_id = `event_user.id`
    -> WHERE timestamp BETWEEN "2015052115" and "2015052215"
    -> GROUP BY user_id, session_id
    -> LIMIT 10;
+----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+
| user_id  | session_id                       | editor       | session_started | session_ended  | editor_ready   | first_attempt  | outcome        |
+----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+
| <snip>   | 781557474d435416                 | visualeditor | 20150521150126  | 20150521150601 | 20150521150126 | 20150521150500 | abort_nochange |
| <snip>   | 2c7ca865d011c19a6083c3363723f009 | wikitext     | 20150521181116  | 20150521181130 | 20150521181130 | NULL           | other          |
| <snip>   | 027f8c94e370bf558a85873055bfcd12 | wikitext     | 20150521184516  | 20150521184539 | 20150521184517 | NULL           | abort_nochange |
| <snip>   | 0f2960a6aca5d9196d4627dcb35431b8 | wikitext     | 20150521171238  | 20150521171304 | 20150521171239 | 20150521171304 | success        |
| <snip>   | 0f36411d0d26712cbe43f2dc226b3a8a | wikitext     | 20150521154401  | 20150521155150 | 20150521154402 | 20150521155150 | success        |
| <snip>   | 0ffe919e0b3d1d54638dd80085ba9987 | wikitext     | 20150521205239  | 20150521205248 | 20150521205241 | NULL           | abort_nochange |
| <snip>   | 1a3e0b75a38bd8b5bce36acdb87da48d | wikitext     | 20150521182659  | 20150521182934 | 20150521182701 | 20150521182858 | abort          |
| <snip>   | 1bca0ecc5a3e9ac33aedc2ce07bb39ec | wikitext     | 20150521161304  | 20150521161316 | 20150521161305 | NULL           | abort_nochange |
| <snip>   | 1bcb04f925a1d54bef5d5ca3abf16a37 | wikitext     | 20150521184624  | 20150521184641 | 20150521184625 | 20150521184641 | success        |
| <snip>   | 2243957369cc389fec3cebc50c3fc7d6 | wikitext     | 20150521152112  | 20150521152123 | 20150521152114 | NULL           | abort_nochange |
+----------+----------------------------------+--------------+-----------------+----------------+----------------+----------------+----------------+
10 rows in set (17.10 sec)

Still seeing an "other" in the first 10. Let's look at that session.

> SELECT timestamp, event_action FROM Edit_11448630 WHERE event_editingSessionId = "2c7ca865d011c19a6083c3363723f009" AND wiki = "enwiki" and timestamp BETWEEN "20150521181116" AND "20150521181130";
+----------------+--------------+
| timestamp      | event_action |
+----------------+--------------+
| 20150521181116 | init         |
| 20150521181130 | ready        |
+----------------+--------------+
2 rows in set (0.01 sec)

OK. So it looks like maybe the user closed their browser/tab too fast for us to record an abort type. I think it is safe to assume that sessions like this represent a "nochange". I'll label them differently though so we can check how often they happen later. So, here is my query:

SELECT
  user_id,
  event_editingSessionId AS session_id,
  event_editor AS editor,
  MAX(`event_page.revid`) AS rev_id,
  MIN(IF(event_action IN ('init', 'ready'), timestamp, NULL)) AS session_started,
  MAX(timestamp) AS session_ended,
  MIN(IF(event_action = 'ready', timestamp, NULL)) AS editor_ready,
  MIN(IF(event_action IN ('saveAttempt', 'saveSuccess'), timestamp, NULL)) AS first_attempt,
  IF(SUM(event_action = 'abort') > 0,
    IF(SUM(`event_action.abort.type` = 'nochange') > 0, 
      'abort_nochange', 
      'abort'
    ),
    IF(SUM(event_action = 'saveSuccess') > 0, 
      'success', 
    IF(SUM(event_action = 'saveFailure') > 0, 
      'failure',
      'other_abort'
    ))
  ) AS outcome
FROM staging.ve2_pilot_users
INNER JOIN log.Edit_11448630 ON
  wiki = 'enwiki' AND
  user_id = `event_user.id`
WHERE timestamp BETWEEN "2015052115" and "2015052215"
GROUP BY user_id, session_id;
$ head -n 3 pilot_user_editing_sessions.tsv; wc pilot_user_editing_sessions.tsv 
user_id	session_id	editor	rev_id	session_started	session_ended	editor_ready	first_attempt	outcome
25251662	781557474d435416	visualeditor	663414161	20150521150126	20150521150601	20150521150126	20150521150500	abort_nochange
25251680	2c7ca865d011c19a6083c3363723f009	wikitext	0	20150521181116	20150521181130	20150521181130	NULL	other_abort
  3377  30393 392381 pilot_user_editing_sessions.tsv

Cool. Now to get into R and do some analysis. --Halfak (WMF) (talk) 10:46, 28 May 2015 (UTC)

Analysis![edit]

So, I want to make sure that nothing looks totally crazy with this data or that we're seeing a big surge in reverts or block rates for experimental users.

Oh! Wait! While I was looking at the data, I realized that I set the time-bounds on the bucketing dates, not the editors' experimental periods. Time to fix that! --Halfak (WMF) (talk) 11:02, 28 May 2015 (UTC)

Fixing the query timebounds.[edit]

OK. I need to redo the queries above so that their time-bounds are based on the editors' registration dates.


Blocks
SELECT
  user_id,
  IF(log_comment RLIKE "(S|s)pam", "spam",
    IF(log_comment RLIKE "((V|v)and|(D|d)isrupt|(U|u)w-vaublock|(A|a)bus(e|ing)|(A|a)ttack|(D|d)eliberate|NOTHERE)", "vandalism",
    IF(log_comment RLIKE "((S|s)ock|(C|c)heckuser|(E|e)vasion)", "sock",
    IF(log_comment RLIKE "softerblock|soft block", "soft username",
    IF(log_comment RLIKE "(U|u)w-uhblock|user\.\.\.|(U|u)w-ublock", "hard username",
    "other"
  ))))) AS type
FROM enwiki.logging
INNER JOIN enwiki.user ON
  REPLACE(log_title, "_", " ") = user_name
INNER JOIN staging.ve2_pilot_users USING (user_id)
WHERE
  log_type = "block" AND
  log_action = "block" AND
  log_timestamp BETWEEN
    registration AND
    DATE_FORMAT(
      DATE_ADD(registration, INTERVAL 7 DAY),
      "%Y%m%d%H%M%S"
    );
Editing sessions
SELECT
  user_id,
  event_editingSessionId AS session_id,
  event_editor AS editor,
  MAX(`event_page.revid`) AS rev_id,
  MIN(IF(event_action IN ('init', 'ready'), timestamp, NULL)) AS session_started,
  MAX(timestamp) AS session_ended,
  MIN(IF(event_action = 'ready', timestamp, NULL)) AS editor_ready,
  MIN(IF(event_action IN ('saveAttempt', 'saveSuccess'), timestamp, NULL)) AS first_attempt,
  IF(SUM(event_action = 'abort') > 0,
    IF(SUM(`event_action.abort.type` = 'nochange') > 0,
      'abort_nochange',
      'abort'
    ),
    IF(SUM(event_action = 'saveSuccess') > 0,
      'success',
    IF(SUM(event_action = 'saveFailure') > 0,
      'failure',
      'other_abort'
    ))
  ) AS outcome
FROM staging.ve2_pilot_users
INNER JOIN log.Edit_11448630 ON
  wiki = 'enwiki' AND
  user_id = `event_user.id`
WHERE
  timestamp BETWEEN
    registration AND
    DATE_FORMAT(
      DATE_ADD(registration, INTERVAL 7 DAY),
      "%Y%m%d%H%M%S"
    )
GROUP BY user_id, session_id;

OK... Done --Halfak (WMF) (talk) 11:39, 28 May 2015 (UTC)

Analysis again![edit]

First, let's look at block rates. I define "blocked for damage" as either having the reason be "spam" or "vandalism". Really, all of the other blocks couldn't have anything to do with VE because they are either for sock puppets or something to do with the user name.

         bucket via_mobile blocked.k  blocked.p reverted.k reverted.p
1: experimental          0        67 0.04118009        177 0.10878918
2:      control          0        77 0.04735547        152 0.09348093
3:      control          1         8 0.01762115         52 0.11453744
4: experimental          1         7 0.01452282         57 0.11825726
   blocked_for_damage.k blocked_for_damage.p    n
1:                   47          0.028887523 1627
2:                   57          0.035055351 1626
3:                    5          0.011013216  454
4:                    4          0.008298755  482

It looks like, if anything, the experimental condition is seeing fewer blocks. Let's do a quick prop_test to be sure.

> prop.test(c(67, 77), c(1627, 1626))

	2-sample test for equality of proportions with continuity correction

data:  c(67, 77) out of c(1627, 1626)
X-squared = 0.5944, df = 1, p-value = 0.4407
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.020925457  0.008574682
sample estimates:
    prop 1     prop 2 
0.04118009 0.04735547 

> prop.test(c(47, 57), c(1627, 1626))

	2-sample test for equality of proportions with continuity correction

data:  c(47, 57) out of c(1627, 1626)
X-squared = 0.8103, df = 1, p-value = 0.368
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.018871933  0.006536278
sample estimates:
    prop 1     prop 2 
0.02888752 0.03505535 

Nope. No significance there. Now, how about the number of editors who submit an edit that gets reverted.

> prop.test(c(177, 152), c(1627, 1626))

	2-sample test for equality of proportions with continuity correction

data:  c(177, 152) out of c(1627, 1626)
X-squared = 1.9314, df = 1, p-value = 0.1646
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.006021796  0.036638291
sample estimates:
    prop 1     prop 2 
0.10878918 0.09348093

Interestingly, it looks like we're getting close to significance here. We might find that editors who use VE are more likely to have at least one edit reverted. I'm curious whether this can be explained by VE enabling more newcomers to edit at all.

         bucket via_mobile reverted.k editing.k main_editing.k
1: experimental          0        177       534            406
2:      control          0        152       544            391
3:      control          1         52       117             99
4: experimental          1         57       115             98

> prop.test(c(177, 152), c(406, 391))

	2-sample test for equality of proportions with continuity correction

data:  c(177, 152) out of c(406, 391)
X-squared = 1.6421, df = 1, p-value = 0.2
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.02356949  0.11799707
sample estimates:
   prop 1    prop 2 
0.4359606 0.3887468

We certainly see less significance here. I'm not sure that we should expect this to be foreboding. Really, a p-value of 0.2 means that we'd expect to see this error in 1/5 trials if the real proportions were truly the same. Next up is edit success metrics. --Halfak (WMF) (talk) 11:58, 28 May 2015 (UTC)

Edit success[edit]

Time to look at edit success rates. First things first, I must resample. The wikitext logging is sampled at a 25% rate, so I must do the same to visualeditor events. If I did this right and the presence of VE did not dramatically change the edit rates, then we should see balanced counts of events for either condition:

> ve_sessions = nrow(edit_sessions[editor == "visualeditor"])
> resampled_edit_sessions = rbind(
+     edit_sessions[editor == "wikitext",],
+     edit_sessions[editor == "visualeditor"][
+         sample(ve_sessions, ve_sessions/4)
+     ]
+ )
> user_edit_sessions = merge(users, resampled_edit_sessions, by="user_id")
> user_edit_sessions[,
+     list(
+         count = length(user_id)
+     ),
+     list(bucket)
+ ]
         bucket count
1: experimental  1467
2:      control  1568

Well, that's close, but it's not even. We're definitely lacking events in the experimental condition. It could be that we have some users dominating the set though.

> user_edit_sessions[bucket=="experimental",
+     list(
+         count = length(bucket)
+     ),
+     list(user_id)
+ ][order(count, decreasing=T)][1:10]
     user_id count
 1: 25252202    35
 2: 25259984    25
 3: 25256834    23
 4: 25260014    23
 5: 25252274    20
 6: 25257424    15
 7: 25257522    15
 8: 25260224    15
 9: 25252452    14
10: 25255896    14
> 
> user_edit_sessions[bucket=="control",
+     list(
+         count = length(bucket)
+     ),
+     list(user_id)
+ ][order(count, decreasing=T)][1:10]
     user_id count
 1: 25255749    74
 2: 25252589    41
 3: 25259069    40
 4: 25259511    25
 5: 25258777    23
 6: 25256155    21
 7: 25254795    20
 8: 25251911    18
 9: 25251689    17
10: 25253889    15

Yup. that's definitely true. It seems like it would be best if I could look only at up-to the first <some small n> recorder edit sessions for each user. Meeting now. back later. --Halfak (WMF) (talk) 13:26, 28 May 2015 (UTC)


OK back. So, I worked out a simple way to filter the dataset down to the first few observations per user.

> first_5_sessions = ddply(
+     user_edit_sessions,
+     .(user_id),
+     function(sub_dt){
+         sub_dt[order(sub_dt$session_started),][1:min(5, nrow(sub_dt)),]
+     }
+ )
> dim(first_5_sessions)
[1] 2373   13
> dim(user_edit_sessions)
[1] 3035   13

We lost about 650 observations. No biggie.

> first_5_sessions[,
+     list(
+         count = length(user_id)
+     ),
+     list(bucket)
+ ]
         bucket count
1: experimental  1185
2:      control  1188

There we go! OK. Now to observe the differences.

         bucket via_mobile users.n ve.k        ve.p attempted.k attempted.p
1: experimental          0     582  404 0.355008787         425   0.3734622
2:      control          0     586    4 0.003502627         523   0.4579685
3: experimental          1      28    9 0.191489362          11   0.2340426
4:      control          1      23    0 0.000000000          13   0.2826087
   successful.k successful.p changed.n    n
1:          389    0.3418278       736 1138
2:          464    0.4063047       726 1142
3:           10    0.2127660        27   47
4:           13    0.2826087        40   46
> prop.test(c(425,523), c(736, 726))

	2-sample test for equality of proportions with continuity correction

data:  c(425, 523) out of c(736, 726)
X-squared = 32.1325, df = 1, p-value = 1.44e-08
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.19267500 -0.09320505
sample estimates:
   prop 1    prop 2 
0.5774457 0.7203857 

> prop.test(c(425,523), c(1138, 1142))

	2-sample test for equality of proportions with continuity correction

data:  c(425, 523) out of c(1138, 1142)
X-squared = 16.4114, df = 1, p-value = 5.098e-05
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.12569305 -0.04331948
sample estimates:
   prop 1    prop 2 
0.3734622 0.4579685

Here, it looks like we are seeing significantly lower success rates for experimental users than control users -- and it doesn't matter if we filter out nochange aborts or not. This is *very* surprising since we see the opposite for the VE dashboard. We also do not see a reduced amount of edits coming from this group (see my analysis from yesterday). I don't think this is a show stopper, but it is something that we'll need to look into.

Oh! I have an idea. I'll add a new outcome type for switching between the editors. That's captured in the aborts as "switchwith" and "switchwithout" (see Schema:Edit). So, now to update the query:

SELECT
  user_id,
  event_editingSessionId AS session_id,
  event_editor AS editor,
  MAX(`event_page.revid`) AS rev_id,
  MIN(IF(event_action IN ('init', 'ready'), timestamp, NULL)) AS session_started,
  MAX(timestamp) AS session_ended,
  MIN(IF(event_action = 'ready', timestamp, NULL)) AS editor_ready,
  MIN(IF(event_action IN ('saveAttempt', 'saveSuccess'), timestamp, NULL)) AS first_attempt,
  IF(SUM(event_action = 'abort') > 0,
    IF(SUM(`event_action.abort.type` = 'nochange') > 0,
      'abort_nochange',
    IF(SUM(`event_action.abort.type` IN ('switchwith', 'switchwithout')) > 0,
      'switch_editors',
      'abort'
    )),
    IF(SUM(event_action = 'saveSuccess') > 0,
      'success',
    IF(SUM(event_action = 'saveFailure') > 0,
      'failure',
      'other_abort'
    ))
  ) AS outcome
FROM staging.ve2_pilot_users
INNER JOIN log.Edit_11448630 ON
  wiki = 'enwiki' AND
  user_id = `event_user.id`
WHERE
  timestamp BETWEEN "2015052115" and "2015052915" AND
  timestamp BETWEEN
    registration AND
    DATE_FORMAT(
      DATE_ADD(registration, INTERVAL 7 DAY),
      "%Y%m%d%H%M%S"
    )
GROUP BY user_id, session_id;
> first_5_sessions[,
+     list(
+         users.n = length(unique(user_id)),
+         ve.k = sum(editor == "visualeditor"),
+         ve.p = sum(editor == "visualeditor")/length(session_id),
+         attempted.k = sum(outcome == "success" | !is.na(first_attempt)),
+         attempted.p = sum(outcome == "success" | !is.na(first_attempt))/length(session_id),
+         successful.k = sum(outcome == "success"),
+         successful.p = sum(outcome == "success")/length(session_id),
+         changed_and_noswitch.n = sum(outcome != "abort_nochange" & outcome != "switch_editors"),
+         changed.n = sum(outcome != "abort_nochange"),
+         n = length(session_id)
+     ),
+     list(bucket, via_mobile)
+ ]
         bucket via_mobile users.n ve.k        ve.p attempted.k attempted.p
1: experimental          0     558  412 0.361720808         413   0.3625988
2:      control          0     585    5 0.004366812         525   0.4585153
3: experimental          1      28    7 0.159090909           9   0.2045455
4:      control          1      23    0 0.000000000          13   0.2826087
   successful.k successful.p changed_and_noswitch.n changed.n    n
1:          378    0.3318701                    668       732 1139
2:          466    0.4069869                    728       728 1145
3:            8    0.1818182                     23        24   44
4:           13    0.2826087                     40        40   46
> prop.test(c(413, 525), c(668, 728))

	2-sample test for equality of proportions with continuity correction

data:  c(413, 525) out of c(668, 728)
X-squared = 16.2657, df = 1, p-value = 5.505e-05
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.1535023 -0.0522784
sample estimates:
   prop 1    prop 2 
0.6182635 0.7211538 

Yup. Definitely a difference here. It looks like newcomers are just simply less likely to save an edit that they start when VE is enabled by default. Gonna have to talk this one out with Abbey. --Halfak (WMF) (talk) 15:20, 28 May 2015 (UTC)


I wanted to check something. Which editor makes up the sessions that have such a low attempt-rate? It turns out that it is the experimental condition sessions with 'wikitext'!

> first_5_sessions[,
+     list(
+         users.n = length(unique(user_id)),
+         attempted.k = sum(outcome == "success" | !is.na(first_attempt)),
+         attempted.p = sum(outcome == "success" | !is.na(first_attempt))/length(session_id),
+         changed_and_noswitch.n = sum(outcome != "abort_nochange" & outcome != "switch_editors"),
+         changed.n = sum(outcome != "abort_nochange"),
+         n = length(session_id)
+     ),
+     list(via_mobile, editor=as.character(editor), bucket)
+ ][order(via_mobile, editor, bucket)]
   via_mobile       editor       bucket users.n attempted.k attempted.p
1:          0 visualeditor      control       5           6   0.6666667
2:          0 visualeditor experimental     241         177   0.4609375
3:          0     wikitext      control     584         520   0.4561404
4:          0     wikitext experimental     421         226   0.3079019
5:          1 visualeditor experimental       5           3   0.4285714
6:          1     wikitext      control      23          13   0.2826087
7:          1     wikitext experimental      25           7   0.1891892
   changed_and_noswitch.n changed.n    n
1:                      7         7    9
2:                    260       287  384
3:                    723       723 1140
4:                    392       421  734
5:                      4         4    7
6:                     40        40   46
7:                     18        19   37

So, the hypothesis that visualeditor makes editors less likely to successfully save an edit seems to not describe what's actually going on. I that what we are seeing here is simply experimentation with the two editors. --Halfak (WMF) (talk) 15:42, 28 May 2015 (UTC)

Editing users per user with logged edit session[edit]

I wanted to confirm that the assumptions I made earlier by checking of the proportion of users who have at least one edit against the proportion of users with at least one edit session.

> merge(
+     users, 
+     merge(metrics, user_session_count, by="user_id", all=T),
+     by="user_id"
+ )[,
+     list(
+         has_editing_sessions.k = sum(!is.na(editing_sessions) | day_revisions),
+         editing.n = sum(day_revisions > 0)
+     ),
+     list(bucket, via_mobile)
+ ]
         bucket via_mobile has_editing_sessions.k editing.n
1: experimental          0                    773       534
2:      control          0                    774       544
3:      control          1                    128       117
4: experimental          1                    133       115

And it looks like we have reasonable numbers here. Nothing to get scared about anyway. --Halfak (WMF) (talk) 20:22, 28 May 2015 (UTC)