Research talk:VisualEditor's effect on newly registered editors/Work log/2015-05-06

From Meta, a Wikimedia project coordination wiki

Wednesday, May 6, 2015[edit]

Checking on the result of some fixes being pushed. I engaged in a set of edits via the wikitext editor to test logging.

select rev_id, rev_timestamp, rev_comment FROM enwiki.revision INNER JOIN enwiki.page ON rev_page = page_id WHERE rev_timestamp > "20150505" AND page_title = "EpochFail/sandbox" AND rev_user_text = "EpochFail";
+-----------+----------------+--------------------+
| rev_id    | rev_timestamp  | rev_comment        |
+-----------+----------------+--------------------+
| 660966602 | 20150505161754 | Logging test 1     |
| 660966618 | 20150505161804 | Logging test 2     |
| 660966632 | 20150505161812 | Logging test 3     |
| 660966643 | 20150505161820 | Logging test 4     |
| 660966661 | 20150505161831 | Logging test 5     |
| 660966679 | 20150505161841 | New logging test 1 |
| 660966704 | 20150505161852 | New logging test 2 |
| 660966720 | 20150505161901 | New logging test 3 |
| 660966734 | 20150505161908 | New logging test 4 |
+-----------+----------------+--------------------+
9 rows in set (0.06 sec)

These edits organized such that each numbered set represents a sequence of edits and back button usage.

I found the following relevant events in event logging.

> SELECT timestamp, event_editingSessionId, event_action, `event_page.revid` FROM Edit_11448630 WHERE timestamp > "20150505" AND `event_page.title` LIKE "%EpochFail%";
+----------------+----------------------------------+--------------+------------------+
| timestamp      | event_editingSessionId           | event_action | event_page.revid |
+----------------+----------------------------------+--------------+------------------+
| 20150505161819 | 2179053670e1d69d                 | saveAttempt  |        660966632 |
| 20150505161819 | 2179053670e1d69d                 | saveSuccess  |        660966643 |
| 20150505161832 | 0b1df51528e878e03248ddfe2b4c1598 | init         |        660966661 |
| 20150505161833 | 0b1df51528e878e03248ddfe2b4c1598 | ready        |                0 |
| 20150505161841 | 0b1df51528e878e03248ddfe2b4c1598 | saveAttempt  |        660966661 |
| 20150505161841 | 0b1df51528e878e03248ddfe2b4c1598 | saveSuccess  |        660966679 |
| 20150505161843 | 24d5953f20fe5c336f1564c231df6c14 | init         |        660966679 |
| 20150505161844 | 24d5953f20fe5c336f1564c231df6c14 | ready        |                0 |
| 20150505161846 | 24d5953f20fe5c336f1564c231df6c14 | abort        |                0 |
| 20150506143829 | 045ecbb7156aece83e91df721debe257 | ready        |                0 |
+----------------+----------------------------------+--------------+------------------+
11 rows in set (29.52 sec)

Note that some sessions only have an attempt. I never clicked save without a save successfully completing. So that might be an issue. I also never loaded the edit interface without clicking save, so the sessions with no attempt may be an issue.

Allow me to sequence these events with the revision they correspond to.

  • | 20150505161819 | 2179053670e1d69d | saveAttempt | 660966632 |
| 20150505161819 | 2179053670e1d69d | saveSuccess | 660966643 |
It seems that these two events correspond to this | 660966643 | 20150505161820 | Logging test 4 |
It is curious that we don't see an init or ready event. But this should be celebrated because it is the first time that I have seen an edit appear that was the result of the [edit]-->[back]-->[edit] pattern.
  • | 20150505161832 | 0b1df51528e878e03248ddfe2b4c1598 | init | 660966661 |
| 20150505161833 | 0b1df51528e878e03248ddfe2b4c1598 | ready | 0 |
| 20150505161841 | 0b1df51528e878e03248ddfe2b4c1598 | saveAttempt | 660966661 |
| 20150505161841 | 0b1df51528e878e03248ddfe2b4c1598 | saveSuccess | 660966679 |
This looks like a good edit session that corresponds to an initial edit, not a [back] edit | 660966679 | 20150505161841 | New logging test 1 |

So, I'm concerned about the presence of no ready/init for the one [back] edit that I can see. So, I'll do some more edits today to see how then get logged. --Halfak (WMF) (talk) 16:45, 6 May 2015 (UTC)[reply]

OK. Here are my edits:

> select rev_id, rev_timestamp, rev_comment FROM enwiki.revision INNER JOIN enwiki.page ON rev_page = page_id WHERE rev_timestamp > "20150506" AND page_title = "EpochFail/sandbox" AND rev_user_text = "EpochFail";
+-----------+----------------+----------------+
| rev_id    | rev_timestamp  | rev_comment    |
+-----------+----------------+----------------+
| 661121390 | 20150506164530 | Logging test 1 |
| 661121409 | 20150506164538 | Logging test 2 |
| 661121429 | 20150506164546 | Logging test 3 |
| 661121447 | 20150506164557 | Logging test 4 |
| 661121465 | 20150506164606 | Logging test 5 |
+-----------+----------------+----------------+
5 rows in set (0.00 sec)

Now to look at the log:

> SELECT timestamp, event_editingSessionId, event_action, `event_page.revid` FROM Edit_11448630 WHERE timestamp > "20150506" AND `event_page.title` LIKE "%EpochFail/sandbox";
+----------------+----------------------------------+--------------+------------------+
| timestamp      | event_editingSessionId           | event_action | event_page.revid |
+----------------+----------------------------------+--------------+------------------+
| 20150506143829 | 045ecbb7156aece83e91df721debe257 | ready        |                0 |
| 20150506143952 | 045ecbb7156aece83e91df721debe257 | abort        |                0 |
| 20150506164528 | 247f744976aa81af192a24e68749d0db | saveAttempt  |        660966734 |
| 20150506164528 | 247f744976aa81af192a24e68749d0db | saveSuccess  |        661121390 |
| 20150506164556 | 2e750c21d4eb3ba1                 | saveSuccess  |        661121447 |
+----------------+----------------------------------+--------------+------------------+
5 rows in set (9.31 sec)

Here, it seems that we have two logged edits. Again, it appears that the edit that was performed via the [back] button was not logged appropriately.

  • | 20150506164556 | 2e750c21d4eb3ba1 | saveSuccess | 661121447 |
Corresponds to | 661121447 | 20150506164557 | Logging test 4 |, but there is no init, ready or saveSuccess

Since that was easy, I want to try it one more time just in case this is an intermittent issue. --Halfak (WMF) (talk) 16:51, 6 May 2015 (UTC)[reply]

> select rev_id, rev_timestamp, rev_comment FROM enwiki.revision INNER JOIN enwiki.page ON rev_page = page_id WHERE rev_timestamp > "20150506164606" AND page_title = "EpochFail/sandbox" AND rev_user_text = "EpochFail";
+-----------+----------------+-----------------------+
| rev_id    | rev_timestamp  | rev_comment           |
+-----------+----------------+-----------------------+
| 661122309 | 20150506165222 | Logging test again 1. |
| 661122336 | 20150506165233 | Logging test again 2. |
| 661122356 | 20150506165242 | Logging test again 3. |
| 661122377 | 20150506165251 | Logging test again 4. |
| 661122401 | 20150506165259 | Logging test again 5. |
+-----------+----------------+-----------------------+
5 rows in set (0.00 sec)
mysql:research@analytics-store.eqiad.wmnet [log]> SELECT timestamp, event_editingSessionId, event_action, `event_page.revid` FROM Edit_11448630 WHERE timestamp > "20150506164556" AND `event_page.title` LIKE "%EpochFail/sandbox";
+----------------+------------------------+--------------+------------------+
| timestamp      | event_editingSessionId | event_action | event_page.revid |
+----------------+------------------------+--------------+------------------+
| 20150506165236 | 1d43f07f4f01ff2f       | ready        |                0 |
+----------------+------------------------+--------------+------------------+
1 row in set (0.07 sec)

.... Well. That's apparently wrong in a new and interesting way. Let me try that again.

> select rev_id, rev_timestamp, rev_comment FROM enwiki.revision INNER JOIN enwiki.page ON rev_page = page_id WHERE rev_timestamp > "20150506165259" AND page_title = "EpochFail/sandbox" AND rev_user_text = "EpochFail";
+-----------+----------------+-------------------------------+
| rev_id    | rev_timestamp  | rev_comment                   |
+-----------+----------------+-------------------------------+
| 661123259 | 20150506165920 | I guess we're testing again 1 |
| 661123276 | 20150506165927 | I guess we're testing again 2 |
| 661123293 | 20150506165934 | I guess we're testing again 3 |
| 661123312 | 20150506165941 | I guess we're testing again 4 |
| 661123333 | 20150506165949 | I guess we're testing again 5 |
+-----------+----------------+-------------------------------+
5 rows in set (0.01 sec)
> SELECT timestamp, event_editingSessionId, event_action, `event_page.revid` FROM Edit_11448630 WHERE timestamp > "20150506165236" AND `event_page.title` LIKE "%EpochFail/sandbox";
+----------------+----------------------------------+--------------+------------------+
| timestamp      | event_editingSessionId           | event_action | event_page.revid |
+----------------+----------------------------------+--------------+------------------+
| 20150506165909 | 1738acbb40fb443637dad35517b303a7 | init         |        661122401 |
| 20150506165911 | 1738acbb40fb443637dad35517b303a7 | ready        |                0 |
| 20150506165927 | 12039224cbbd6f3e                 | saveAttempt  |        661123259 |
| 20150506165927 | 12039224cbbd6f3e                 | saveSuccess  |        661123276 |
| 20150506165948 | 34636e0eec8e57ce                 | saveSuccess  |        661123333 |
+----------------+----------------------------------+--------------+------------------+
5 rows in set (0.05 sec)

So. This seems to be all sorts of broken.

  • | 20150506165909 | 1738acbb40fb443637dad35517b303a7 | init | 661122401 |
| 20150506165911 | 1738acbb40fb443637dad35517b303a7 | ready | 0 |
This looks like the beginning of a proper session that resulted in the first edit in the set | 661123259 | 20150506165920 | I guess we're testing again 1 | but we don't see a saveSuccess event!
  • | 20150506165927 | 12039224cbbd6f3e | saveAttempt | 661123259 |
| 20150506165927 | 12039224cbbd6f3e | saveSuccess | 661123276 |
This looks like a [back] session like we saw earlier where we register an attempt and success, but no init or ready. It seems to correspond to this edit: | 661123276 | 20150506165927 | I guess we're testing again 2 |
  • | 20150506165948 | 34636e0eec8e57ce | saveSuccess | 661123333 |
And this [back] session only has a saveSuccess event! It doesn't even have the saveAttempt.

So, this is messed up pretty severely. Time to go back to the bug. --Halfak (WMF) (talk) 17:06, 6 May 2015 (UTC)[reply]

Updates[edit]

Working with User:Catrope now, I did some follow-up queries. User:Catrope thought that the issues we see could be related to my filtering on page_title, so I'm trying some queries that filter on the sessionId instead.


First, let's look at a session that only has a "saveSuccess" event:

  • | 20150506165948 | 34636e0eec8e57ce | saveSuccess | 661123333 |
> SELECT timestamp, event_editingSessionId, event_action, `event_page.revid` FROM Edit_11448630 WHERE timestamp > "20150506" AND event_editingSessionId = "34636e0eec8e57ce";
+----------------+------------------------+--------------+------------------+
| timestamp      | event_editingSessionId | event_action | event_page.revid |
+----------------+------------------------+--------------+------------------+
| 20150506165948 | 34636e0eec8e57ce       | saveSuccess  |        661123333 |
+----------------+------------------------+--------------+------------------+
1 row in set (12.55 sec)

Indeed, it looks like that's the only relevant event.

Now, let's check out a session that is missing it's saveAttempt/saveSuccess.

  • | 20150506165909 | 1738acbb40fb443637dad35517b303a7 | init | 661122401 |
| 20150506165911 | 1738acbb40fb443637dad35517b303a7 | ready | 0 |
> SELECT timestamp, event_editingSessionId, event_action, `event_page.revid` FROM Edit_11448630 WHERE timestamp > "20150506" AND event_editingSessionId = "1738acbb40fb443637dad35517b303a7";
+----------------+----------------------------------+--------------+------------------+
| timestamp      | event_editingSessionId           | event_action | event_page.revid |
+----------------+----------------------------------+--------------+------------------+
| 20150506165909 | 1738acbb40fb443637dad35517b303a7 | init         |        661122401 |
| 20150506165911 | 1738acbb40fb443637dad35517b303a7 | ready        |                0 |
+----------------+----------------------------------+--------------+------------------+
2 rows in set (8.08 sec)

So, it looks like I get the same results when I filter of sessionId rather than page title --Halfak (WMF) (talk) 21:32, 6 May 2015 (UTC)[reply]

Test with new user account[edit]

I decided to try testing with a fresh user account. I had just created en:User:Wiki labeler for other reasons, so I picked it up for this purpose.

I made these edits:

> select rev_id, rev_timestamp, rev_comment FROM enwiki.revision INNER JOIN enwiki.page ON rev_page = page_id WHERE rev_timestamp > "20150506220001" AND page_title = "Wiki_labeler/sandbox" AND rev_user_text = "Wiki labeler";
+-----------+----------------+----------------+
| rev_id    | rev_timestamp  | rev_comment    |
+-----------+----------------+----------------+
| 661164891 | 20150506220213 | Test logging 1 |
| 661164919 | 20150506220226 | Test logging 2 |
| 661164936 | 20150506220236 | Test logging 3 |
| 661164951 | 20150506220247 | Test logging 4 |
| 661165184 | 20150506220510 | Test logging 5 |
+-----------+----------------+----------------+
5 rows in set (0.00 sec)

And found these logs:

> SELECT timestamp, event_editingSessionId, event_action, `event_page.revid` FROM Edit_11448630 WHERE timestamp > "20150506220000" AND `event_page.title` like "%labeler/sandbox";
+----------------+----------------------------------+--------------+------------------+
| timestamp      | event_editingSessionId           | event_action | event_page.revid |
+----------------+----------------------------------+--------------+------------------+
| 20150506220158 | 033f629f2a6e9567a0200763521361cf | init         |        661164593 |
| 20150506220159 | 033f629f2a6e9567a0200763521361cf | ready        |                0 |
| 20150506220212 | 033f629f2a6e9567a0200763521361cf | saveAttempt  |        661164593 |
| 20150506220212 | 033f629f2a6e9567a0200763521361cf | saveSuccess  |        661164891 |
| 20150506220215 | 222f35cb304619a4                 | ready        |                0 |
| 20150506220225 | 222f35cb304619a4                 | saveSuccess  |        661164919 |
| 20150506220225 | 222f35cb304619a4                 | saveAttempt  |        661164891 |
| 20150506220522 | 39841fdc883fc69c                 | ready        |                0 |
| 20150506220526 | 39841fdc883fc69c                 | saveAttempt  |        661165184 |
| 20150506220526 | 39841fdc883fc69c                 | saveSuccess  |        661165184 |
+----------------+----------------------------------+--------------+------------------+
10 rows in set (0.05 sec)

And that looks perfect. So, now I want to know what proportion of 16 char event_editingSessionsIds with a saveSuccess don't have a saveAttempt and how many are lacking a "ready". --Halfak (WMF) (talk) 22:12, 6 May 2015 (UTC)[reply]


OK. First thing I want to know is how many 16 char editingSessionIds we have.

> SELECT LENGTH(event_editingSessionId), COUNT(*) FROM (SELECT event_editingSessionId, SUM(event_action = "init") AS inits, SUM(event_action = "ready"), SUM(event_action = "abort") as aborts, SUM(event_action = "saveAttempt") as attempts, SUM(event_action = "saveSuccess") AS successes FROM Edit_11448630 WHERE timestamp > "20150506" AND event_editor = "wikitext" GROUP BY 1) AS sessions GROUP BY 1;
+--------------------------------+----------+
| LENGTH(event_editingSessionId) | COUNT(*) |
+--------------------------------+----------+
|                             16 |    10703 |
|                             32 |   669489 |
|                             33 |        2 |
+--------------------------------+----------+
3 rows in set (1 min 15.42 sec)

Hmm.. Somehow we're getting some 33 char ids. But it's such a small amount that I don't care. It looks like 2% or so of our events are coming in as 16 chars. Let's dig in.

> SELECT LENGTH(event_editingSessionId), COUNT(*), SUM(attempts > 0) AS attempts, SUM(successes > 0) AS success, SUM(inits = 0) AS no_init, SUM(readies = 0) AS no_ready, SUM(successes > 0 AND attempts = 0) AS success_no_attempt, SUM(attempts > 0 AND readies = 0) AS attempt_no_ready  FROM (SELECT event_editingSessionId, SUM(event_action = "init") AS inits, SUM(event_action = "ready") AS readies, SUM(event_action = "abort") as aborts, SUM(event_action = "saveAttempt") as attempts, SUM(event_action = "saveSuccess") AS successes FROM Edit_11448630 WHERE timestamp > "20150506" AND event_editor = "wikitext" GROUP BY 1) AS sessions GROUP BY 1;
+--------------------------------+----------+----------+---------+---------+----------+--------------------+------------------+
| LENGTH(event_editingSessionId) | COUNT(*) | attempts | success | no_init | no_ready | success_no_attempt | attempt_no_ready |
+--------------------------------+----------+----------+---------+---------+----------+--------------------+------------------+
|                             16 |    10811 |      818 |     726 |   10811 |     1158 |                 36 |              124 |
|                             32 |   674965 |    59240 |   52610 |   69919 |   373061 |               3119 |            17774 |
|                             33 |        2 |        0 |       0 |       2 |        0 |                  0 |                0 |
+--------------------------------+----------+----------+---------+---------+----------+--------------------+------------------+
3 rows in set (52.34 sec)

So, 124/818=15% back-dance sessions have a save attempt, but no ready. In the direct edit setting, 17774/59240=30% of sessions have an attempt, but no ready. That's a potentially concerningly high proportion. These could be users with no JS or maybe bots. I think I'll need someone to look into this. I'll file a bug. --Halfak (WMF) (talk) 22:52, 6 May 2015 (UTC)[reply]

33 char session ids[edit]

Just out of curiousity, I wanted to know what those 33 char session ids looked like.

> SELECT timestamp, event_editingSessionId, event_action, event_editor, `event_page.title`, `event_user.id`, `event_page.revid` from Edit_11448630 WHERE timestamp > "20150506" AND LENGTH(event_editingSessionId) > 32;
+----------------+-----------------------------------+--------------+--------------+--------------------------+---------------+------------------+
| timestamp      | event_editingSessionId            | event_action | event_editor | event_page.title         | event_user.id | event_page.revid |
+----------------+-----------------------------------+--------------+--------------+--------------------------+---------------+------------------+
| 20150506091155 | 01 c034d999395bd89ff0254e1d491d29 | ready        | wikitext     | Endymion_(Simmons_novel) |             0 |                0 |
| 20150506091201 | 01 c034d999395bd89ff0254e1d491d29 | abort        | wikitext     | Endymion_(Simmons_novel) |             0 |                0 |
| 20150506113152 | 2 c38134192d5ebdfb465839eacdfcbfa | ready        | wikitext     | Baccarat_(card_game)     |             0 |                0 |
| 20150506113154 | 2 c38134192d5ebdfb465839eacdfcbfa | ready        | wikitext     | Baccarat_(card_game)     |             0 |                0 |
+----------------+-----------------------------------+--------------+--------------+--------------------------+---------------+------------------+
4 rows in set (12.20 sec)

--Halfak (WMF) (talk) 23:02, 6 May 2015 (UTC)[reply]