Research talk:VisualEditor's effect on newly registered editors/May 2015 study

From Meta, a Wikimedia project coordination wiki

Work log



Success criteria[edit]

Just noting the ongoing related discussion at Talk:VisualEditor#Success_Criteria. --Elitre (WMF) (talk) 14:19, 21 April 2015 (UTC)[reply]

Probably referring to mw:Talk:VisualEditor#Success_Criteria. ;) --Halfak (WMF) (talk) 14:53, 21 April 2015 (UTC)[reply]
Or maybe mw:Talk:VisualEditor/Archive_1#Success_Criteria? Kerry Raymond (talk) 01:29, 19 December 2015 (UTC)[reply]

Why run a pilot study[edit]

I figured that I would address this question before it came up. A en:pilot study is a scaled-down version of a larger experiment. The point is to test the conditions of an experiment without having to run the entire thing first. Since this experiment is important and will affect many new editors, it's important that we get it Right(TM) and are able to learn what need to. Running a pilot study is a good way to make sure that everything is ready for the large-scale test. There's so much code wrapped around tests like this that something is bound to go not-quite-right.

So, we'll be running a full-fledged 50/50 split test on English Wikipedia for only 24 hours. I'll be performing an analysis of the results of this short-term test to look for evidence of serious problems (e.g. vandalism skyrockets, the Reichstag is covered in spidermen, etc.). The analysis of this short term experiment will lack the statistical significance of the full study, but it will give us a window into the effects we expect to see. --Halfak (WMF) (talk) 00:10, 22 April 2015 (UTC)[reply]

More[edit]

When you have a minute, would you please post the list of specific things you're planning to measure? (Or whatever's reasonably certain, or the big ones, or whatever. I just want to know more than the little bit that you've posted.  :-) Whatamidoing (WMF) (talk) 20:31, 3 May 2015 (UTC)[reply]

Whatamidoing (WMF) will do. In the meantime, I discuss the measurements I have planned at a high level on the main page for this project Research:VisualEditor's_effect_on_newly_registered_editors. I'll come back soon to take a pass on the methods section here to enumerate the metrics I'll use. I'll ping when that is ready. --Halfak (WMF) (talk) 20:50, 3 May 2015 (UTC)[reply]
Whatamidoing (WMF) ping! It's ready. :D --Halfak (WMF) (talk) 19:10, 8 May 2015 (UTC)[reply]

Did the test go live?[edit]

Halfak (WMF), did the test go live? I zoomed in closely on the realtime dashboards and I can't see any change in Visual Editor usage for the pilot day, nor any change for the (so far) three days of test period. Alsee (talk) 08:05, 7 May 2015 (UTC)[reply]

Hello again, Alsee. No, testing hasn't started yet, as per this notice. HTH! --Elitre (WMF) (talk) 09:24, 7 May 2015 (UTC)[reply]
Ah, thanx. Alsee (talk) 09:42, 7 May 2015 (UTC)[reply]
I understand that there were some difficulties with the logging software, and the design research team is still unhappy with the link tool. Nobody quite knows when it will start (because it depends upon the outcome of Design Research's user testing), but the last thing I heard was that next week is possible, and later this month is likely. Whatamidoing (WMF) (talk) 19:25, 7 May 2015 (UTC)[reply]
+1 to what Whatamidoing (WMF) said. I'll go convert those dates to relative timescales right now to reduce confusion. --Halfak (WMF) (talk) 20:07, 7 May 2015 (UTC)[reply]

Pilot study announced![edit]

We'll start on May 21st. See en:Wikipedia:Village pump (miscellaneous)#A/B Testing for VisualEditor to begin May 21st (diff) --Halfak (WMF) (talk) 00:30, 15 May 2015 (UTC)[reply]

Deployed! See my notes checking the deployment here: Research talk:VisualEditor's effect on newly registered editors/Work log/2015-05-21 --Halfak (WMF) (talk) 15:37, 21 May 2015 (UTC)[reply]
Exciting! Can't wait to see how it works out--VE has come very, very far! --Joe Decker (talk) 21:44, 21 May 2015 (UTC)[reply]
Hi Joe Decker, just wanted to make sure that you saw the results. Whatamidoing (WMF) (talk) 21:25, 17 June 2015 (UTC)[reply]

Complexity of the edits[edit]

Is there any way to measure the complexity of an edit (bytes added? formatting? templates? etc) and to correlate with the time spent in the edit? Are the simple edits taking less/more time to be completed by users using VE? Is VE making it easier/faster to do some kinds of edits while making it slower to do other kinds of edits? Helder 11:38, 16 June 2015 (UTC)[reply]

That's an excellent question. We don't have any solid metrics for edit complexity, but I can propose a few. E.g. when you take a diff of two revisions (to see what happened in the edit), you get back a set of operations that would be required to convert one revision to the other. You get more operations when an edit touches several parts of a document, but not others. I suspect that the more operations, the more complex the change. This will be wrong in the case of substantial content additions that come in the form of a large block. Removals should probably be worth less complexity than insertions.
I think it would also be interesting to examine the type of markup characters affected in a diff (what you might mean by "formatting").
Bytes added/removed would be another good way to look at this. It will also be easier. I've started a query to gather stats for the revisions these users performed in their first week. Regretfully, I think I'll need to look at the data after my upcoming vacation (see #Scientist on vacation until June 24th). I'll upload a dataset containing all of the bucketed users from the experiment in case anyone wants to beat me to the analysis. --Halfak (WMF) (talk) 15:10, 16 June 2015 (UTC)[reply]
@He7d3r: great question indeed. Halfak and I have been floating the idea of prototyping an edit type classifier (as an extension of R:Revscoring) which would allow us to replace pretty coarse metrics like bytes added or number of revisions with more meaningful indicators of type of work. The nice thing is that edit typing could then be used at article-level, editor-level or product level. @Halfak (WMF): when you're back we should start scoping out the project --Dario (WMF) (talk) 07:29, 17 June 2015 (UTC)[reply]
Sounds good. I've already got a little bit of the groundwork in place. While this problem has been tackled in the research literature (e.g. [1]), there are a host of reasons that we can't build off of that work. Why do people solve toy problems!? Arg! --Halfak (WMF) (talk) 14:43, 24 June 2015 (UTC)[reply]

┌─────────────────────────────────┘

@He7d3r: & @Dario (WMF):, I just started up the docs for a new Wiki labels campaign specific to this experiment on English Wikipedia. See en:Wikipedia:Labels/VE experiment edits. I figure that we can use the interface to review edits and share hypotheses. If we come up with something that would lend itself to a quantitative analysis, then we can dig into that too. This will help me prioritize follow-ups so that we don't go off measuring ALL THE THINGS and Halfak doesn't get to do any other science again.

Before we get started, we need to figure out what kinds of questions we might like to ask people while they review edits that we can't answer quantitatively (at least easily). These are usually subjective (e.g. "was this edit any good?"). I started a short list on the campaign talk page. I left the "edit type" question freeform since we don't have a nice category system yet. I'll be proposing one soon, but I thought it would be good to do a en:grounded theory approach to discovering edit categories too. Please comment and propose questions you'd like answered there. A good rule of thumb is: if the answer is objective, we can probably measure it quantitatively (e.g., did this edit add a citation?), but if the answer is subjective, it will be good to have volunteers answer while they are looking at edits (e.g., was this edit productive? is the summary useful?). --Halfak (WMF) (talk) 22:07, 26 June 2015 (UTC)[reply]

And I also just started up a page for tracking work towards an edit classifier. See Research:Automated classification of edit types.  :) --Halfak (WMF) (talk) 22:42, 26 June 2015 (UTC)[reply]
Yeah, when I read the results, this question of the relative complexity of the edits came to my mind. I have noticed myself when using the VE that I tend to do more changes to the article before saving than I do in the markup editor (that is, fewer but "larger" edits). Thinking about my own editing behaviour (which may not reflect the editing behaviour of newbies), I think the difference is due to section editing. I often use section editing in the markup editor, so therefore I might have to edit a number of sections to implement my overall change (i.e. more edits). But the VE only opens the whole document, so I don't have to Save more frequently. But I am not sure if newbies use section editing. If I had to pick a hypothesis, I would say that the newbie source editor user is likely to be more nervous about whether what they did is likely to produce the result they want because of the complexity of markup and so may Save more frequently to check what's happening (although they could use Preview for the same purpose if they understand what it does), whereas the VE newbie is probably more confident that their change is doing what they want because of the WYSIWYG interface and so only SAVE when their current "unit of work" is done. Again, if my hypothesis is correct, we would see fewer but larger edits occuring on the VE. Kerry Raymond (talk) 13:00, 4 July 2015 (UTC)[reply]

Scientist on vacation until June 24th[edit]

The primary predator of the hapless camper.
Mosquito. The primary predator of the hapless camper.

Hey folks, I want to let you know that I have some vacation coming up right after I complete the May 2015 study report. I expect there will be questions and ideas for follow-up analyses. Please don't hold back. :) But know that I won't respond until I get back on Wednesday, June 24th. In the meantime, I'll ask Dario, Whatamidoing and Elitre to address what comments they can.

In case you're curious, I'll be traveling deep into the Boundary Waters Canoe Area Wilderness, portaging some fur trading routes used by the Voyageurs. --Halfak (WMF) (talk) 13:39, 16 June 2015 (UTC)[reply]

Duration of user activity[edit]

If I follow the timeline correctly, you basically looked at totals for metrics for the whole period of the initial 2 weeks of theses users' experience, right?

Have you considered looking at those metrics on a timeline, and over a longer period of time for the users who are in those 2 buckets? It's quite common for new UX features to take some time to be "digested" by users and I've found that productivity/engagement can evolve differently over time. In this case both the wikitext editor and VE are new to these users, but it's possible that after the 1-month mark for example, one group would plateau and the other would keep improving because past a certain level of proficiency with the tool one of them becomes superior. Which wouldn't appear in the first 2 weeks of usage.

--GDubuc (WMF) (talk) 07:58, 17 June 2015 (UTC)[reply]

I believe that he was actually checking the first 7 days for each account. It takes two weeks altogether because some started on "Sunday" and others not until "Saturday", but the "Sunday" accounts get only 7 days of data, not 14.
I think that everyone would like to see this repeated at 30 or 90 days. However, Aaron's a rare and sadly finite resource, so I don't know if he'll be able to do it. Whatamidoing (WMF) (talk) 21:19, 17 June 2015 (UTC)[reply]
Na. I can do the follow-up survival/productivity measurements. Those are super easy to run against the database. I'm sort of skeptical that they'll show any difference, but I'm an empiricist, so I like taking observations. :) --Halfak (WMF) (talk) 22:45, 26 June 2015 (UTC)[reply]

Rolling measurement of these metrics[edit]

Related to my point above, I think that measuring these metrics in a more automated manner is critical. This one-off study was on a specific version of VE. VE evolves every week, wikitext editing doesn't. We should be measuring these effects permanently and creating a new control group every time a VE update is deployed. We should also at the very least have non-VE control groups still running even when we roll out VE to "everyone" on a given wiki. Because the digestion period still applies.

For Media Viewer we made the mistake of quickly reaching the point of it being turned on for everyone, and not having control groups from that point on made it very difficult to establish if the UX issues we were attempting to fix were actually getting fixed, and a few months down the line of that process we definitely had no way to compare it to the old way of viewing images.

I imagine that this study required a lot of manual work, but I urge everyone involved in the VE rollout to make the effort to automate it (or a synthesized version of it), to ideally have separate groups for each VE update and to always keep a non-VE control group until the post-launch period is officially over. The worst thing that can happen to VE is being blind in that respect when UX improvements on a freshly released VE are attempted.

--GDubuc (WMF) (talk) 08:05, 17 June 2015 (UTC)[reply]

Great![edit]

Excellent analysis and a joy to read! Big thanks to halfak and everyone involved! --Atlasowa (talk) 10:21, 17 June 2015 (UTC)[reply]

\o/ I'm happy you got something good from it. :) Thanks for reading. --Halfak (WMF) (talk) 14:46, 24 June 2015 (UTC)[reply]

Burden on Wikipedians[edit]

I see that that metric is only measured by blocked and reverted edits. Of course that number is not going to be different. Do you have a metric of how many of the edits by both the VE editors and the control group needed a follow-up edit to repair the edit, and how many were done by the editors themselves and how many did stay until someone else repaired it / are still there? --Dirk Beetstra T C (en: U, T) 03:28, 23 June 2015 (UTC)[reply]

"Of course that number is not going to be different." -- but it was different. O.o Why would you think that it wouldn't be different? As for looking at "needing a follow-up edit", how do you propose that we detect that? Why do you think it is more common that an edit will need to be cleaned up rather than simply getting reverted? Do you think it is undesirable for newcomers to make somewhat productive edits that experienced editors can clean up? --Halfak (WMF) (talk) 14:31, 24 June 2015 (UTC)[reply]
@Halfak (WMF): Quoting from the mainpage:
TL;DR: Slight decrease in burden on current Wikipedians
Block rates are plotted with binom. approx standard error bars for the VisualEditor experiment by experimental bucket. "for damage" means that the block reason referenced spam or vandalism.
Block rates (by reason). Block rates are plotted with binom. approx standard error bars for the VisualEditor experiment by experimental bucket. "for damage" means that the block reason referenced spam or vandalism.
I don't even know how you can call this a 'slight decrease' when there is no statistical relevant decrease (control and experimental are the same within their margins of error, see graph). And of course that number is not going to be different. And that is similar to the question 'Why do I think that an edit will need to be cleaned up rather than simply getting reverted?'
Let me put it this way: there is a certain percentage of vandals under the new accounts created. Whether a vandal is editing through source or through the VE that does not matter, it is a vandal, and those edits will be reverted. Those edits are bad faith. It may be that a number of those what we call vandals is actually not a vandal but a good faith editor who is just badly breaking stuff, and maybe that is why there is a perceived lower number of vandals active with VE - you see what you break. But someone intending to insert 'poop' will insert that through VE and through source. And whether you are a vandal using Source editing or VE does not matter, you're a vandal, your edits end up reverted, and you may end up being blocked. (I now actually wonder, did you only count the edits that got reverted and editors that got blocked, or did you also check whether there were 'missed' edits that actually needed to be reverted, and whether there are vandals still needed to be blocked?)
The rest of the editors is good faith, they will try something good. We do not revert good faith editors, we mentor them, we repair their mistakes (or leave them to be repaired), we do not warn them, we do not block them. However, being new editors they do not know how WikiSource works and their edits need a follow-up (either by themselves, or by another editor 'unbreaking' their edit but leaving there what is there. That is why that I think that edits need to be cleaned up - we do that you know: trying to keep the good editors by helping them.
What I expect, what I know actually, is that new editors editing Wikisource will 'screw up' because they do not understand (single brackets, templating stuff that is not a template, etc. etc.). If you take 100 newly registered accounts, you will find some good-faith Wikisource code breaking or 'misformatting' (insert <br /> in stead of two newlines, or add an initial space for list-items, broken <ref>s, etc. etc.). You can count that.
Now, that will not happen with VE, but VE in itself may give some problems. And that is what I would like to see 'numbers' for in comparison. Take your 3363 editors, do a diff on their edits, and see if there is any wikisource breaking, and see how often that was repaired by the editor themselves (either through a subsequent VE edit, or through a wikisource edit), and how often it was repaired by RC patroller. And how often it was left 'broken'. Thát is how the burden on Wikipedians is measured, the burden of vandals is not going to change significantly through VE (and the latter is what you showed). --Dirk Beetstra T C (en: U, T) 05:58, 28 June 2015 (UTC)[reply]
Hey Dirk Beetstra. I'm sorry that I lost this thread in a group of updates to this page. First, it seems there was confusion about the burden measurements. The block rate was not significantly different, as you suggest. However the revert rate and reverted edit counts were significantly lower for editors in the VE condition.
Now as for looking for *why* a revision was reverted, it sounds like you are spec'ing a whole new research project -- which is fine, but beyond the kind of time and energy I plan to put into this avenue of investigation. I think that human eyes will be a much more useful measurement device for addressing this question. See en:WP:Labels/VE experiment edits. Let me know if you want to help out. --Halfak (WMF) (talk) 14:20, 6 July 2015 (UTC)[reply]
Hi Halfak. There is no significant difference between those numbers, as you can see in the graph, the error bars overlap. You can not say that the number you measured for the control is not actually 0.028, nor can you say that for the experimental group.
So the answer from WMF is: "Look, we made VE, here are some easy statistics that we can find and those show no difference, so enable it. And if you want more data, go figure it out yourself". I find that a very typical behaviour for WMF, it speaks for your customer service. WMF is the only one who has access to the data (a table of diffs (say, first 10 edits by each editor) by both the control group and the experimental group).
To me the answer is clear: you know that if you dig deeper, the answer is that VE is significantly worse than the source editor (and that I see from the en.wikipedia poll as well), giving a higher burden on the regulars. --Dirk Beetstra T C (en: U, T) 03:43, 7 July 2015 (UTC)[reply]
I think "burden on existing Wikipedians" has to be understood as an issue independent of the VE. Attracting more new contributors by any means will create a larger burden on existing Wikipedians because sometimes new contributors do the wrong thing. It might a deliberately bad thing (e.g. vandalism) through to some good-faith Manual of Style issue like capitalisation. If we want new contributors, we have to put up with their mistakes. We were all new once. So, if we take it as an axiom that new contributors create some burden for the existing community, then we need to rephrase the question as "how do different mechanisms for attracting new contributors affect the newcomer burden on the community when adjusted for the numbers of new contributors attracted?" (a slight variant might "when adjusted for the extent of new contributions"). For example, I teach Wikipedia edit training sessions, typically 10-15 people per group. These trainee newcomers are unlikely to do really bad things (e.g. I've never known a single one to vandalise) but certainly in one session they won't know the MoS backwards (I don't know the MoS backwards for that matter) so I would think the burden per trainee newcomer is going to be lower than for "normal" newcomers. But the total number of folks I teach is small so my reduction on the burden is very small in absolute terms. The introduction of the VE is likely to impact a lot more new contributors and the way they contribute because of the VE will be different to the source editor. The VE won't stop them vandalising or making capitalisation errros, but it will reduce the likelihood of breaking the markup (OK, there is still some scope in the fields of templates but there's a lot less scope), so yes there should be some improvement in the burden per newcomer as it has largely removed one type of problem to be fixed. I suspect irritation towards newcomers using the VE is partly caused by the tagging of those edits as being from the VE. If we are serious about measuring the burden on the community from newcomers with the VE, we need to turn those tags off so the community's behaviour is not influenced by the knowledge that they are fixing a VE edit. Kerry Raymond (talk) 05:06, 7 July 2015 (UTC)[reply]
Personally I am looking forward to switching from training using the source editor to the VE as soon as I can, which is why I am actively using the VE myself so I can teach it well. Unfortunately my own editing uses chunks of generated/canned markup for citations which cannot be input through the VE so I am forced to use the source editor most of the time; this is a barrier for Australians like me who use the Wikipedia-formatted citations generated by NLA Trove to switch to the VE. And, to be honest, the VE is still a bit unreliable; it has periods when it just won't load any pages and sometimes it won't save them either. These are the things holding me back from training with the VE straightaway. I think training with the VE will be a lot easier than training with the source editor. Kerry Raymond (talk) 05:18, 7 July 2015 (UTC)[reply]
@Kerry Raymond: - With the (old) source editor I come regularly around editors who leave 'plain html' pages behind, or use formatting which is more consistent if it would be done with WikiMarkup. It generally comes to other editors to clean that up for them - as they are newbies they do not know how to format it 'properly'. That is a certain burden that is on regulars.
With VE, you will likely not see 'plain html copies' (I should actually try that) but you will see copy-paste jobs that need to be put into WikiMarkup afterwards (e.g., I know it is easy to make a bullet-list with VE, but I can still do a 'manual' bullet list using VE). Those will also be a burden on Wikipedians.
I am here not assuming that that will be worse, the same, or even better - I'd like to see the statistics. And I disagree with the current statistics as they do not show anything that can be reliably expected to be different (as I say, whether a vandal edits using source or a vandal edits edits using VE, it is a vandal and it needs to be reverted - the number of vandals will however be the same).
I do agree that having a VE could be an asset to new editors, but if the results (that can be collected) show that starting with VE the burden on Wikipedians significantly increases
You say ".. to be honest, the VE is still a bit unreliable; it has periods when it just won't load any pages and sometimes it won't save them either" - do we know how many newbies ran into this, and decided to step away if it failed to load or failed to save? We have a 'time to save' but that goes for successful saves. (editors who continue to edit is the same - so there could be a base-number of editors who go through the effort of getting things to work (being it source or VE), and with source you have a significant number that step away because they don't understand the source, and here you have a significant number that step away because VE does not load, and those are accidentally the same, resulting in no net gain/net loss? How many editors had editing problems with VE and gave up, and how many had editing problems but went on - do we have numbers for how many cases the VE did not load vs. how often the source editor did not load and how often a source edit did not get saved vs. how often a VE edit did not get saved?). --Dirk Beetstra T C (en: U, T) 07:57, 7 July 2015 (UTC)[reply]
"do we know how many newbies ran into this" – Probably none, because Kerry's talking about a recent problem with the servers. It actually affects everything, but it seems to affect Javascript-based programs (and therefore VisualEditor) more noticeably than some others. Whatamidoing (WMF) (talk) 05:42, 8 July 2015 (UTC)[reply]
'Probably none' - but you have numbers for that, right .. how many people clicked edit for the source, and how many people clicked edit for the VE, and how many of those in the end also clicked 'save' for each. --Dirk Beetstra T C (en: U, T) 06:37, 8 July 2015 (UTC)[reply]
Correct, I am mentioning a recent problem that worries me about starting training with the VE, and not something that was occurring during the monitoring period for this experiment. FWIW, the source editor is also a bit strange lately. I save changes and sometimes I see the article as it was unchnaged, but if I force it to reload, I do see the changes I expected to see. I don't know if it's a related issue. Kerry Raymond (talk) 07:31, 8 July 2015 (UTC)[reply]
On "you probably have numbers": yes, but they come with piles and piles of disclaimers, can't be used as is with confidence. https://edit-analysis.wmflabs.org/compare/ --Nemo 10:38, 8 July 2015 (UTC)[reply]
Ah .. so the numbers can't be used as is with confidence, but it is probable that the answer is none. If it is that probable, then there is a reason why right, someone does have reliable data on that. --Dirk Beetstra T C (en: U, T) 10:55, 8 July 2015 (UTC)[reply]
Hi Dirk Beetstra, I'm not involved with this particular research project and don't have specific knowledge of the data, but I would like to point out that your repeated argument why the difference is not significant is erroneous in two ways:
  1. You conclude that overlapping error bars mean that a difference is not statistically significant ("There is no significant difference between those numbers, as you can see in the graph, the error bars overlap"). That's based on an insufficient understanding of the statistics involved. It's a common fallacy, so you are in good company, but it would still be nice to check assumptions more carefully before launching into such accusations.
  2. You keep talking about the graph for block rates, but Aaron already said that the statistically significant difference he found was for a different measure, namely revert rates.
Regards, Tbayer (WMF) (talk) 20:40, 7 July 2015 (UTC)[reply]
@Tbayer (WMF): Those error bars mean that the real value is with a certain certainty within those limits. You can say that with a 95% certainty the real value is within certain limits. If a value is with 95% certainty between 20 and 30, then it most likely is 25, but there is a 5% chance that the real value is actually 31. Or what you represent there are not the certainty limits, or the values are with ..% certainty (I hope those limits are the 2 sigma or 3 sigma values) similar. Can you please clarify? Or better, can you give me the mean, and can you give me the sigma? Thanks. --Dirk Beetstra T C (en: U, T) 03:28, 8 July 2015 (UTC)[reply]
It anyway does not take away my real concern - how many edits are left broken after a source-edit (I know that happens) and how many are left over broken after a VE edit, and how many of each are repaired by the editor itself, and how many are repaired by someone else? --Dirk Beetstra T C (en: U, T) 03:30, 8 July 2015 (UTC)[reply]

Re: critical issues[edit]

On «not because of real struggle – or we would have seen [...] critical issues raised during user testing», there are some critical issues reported in Phabricator. If there is a contrast between struggle symptoms and struggle reporting, I tend to believe we have gaps in reporting: we know that our communication lines are heavily interrupted and most editors never manage to report their technically issues and wishes, especially in non-English projects. --Nemo 08:26, 26 June 2015 (UTC)[reply]

Hi there, thanks for your feedback. Aaron is referring to usability issues, which as you noticed are reported on Phabricator and actively being worked on, with the support of Abbey Ripstra’s team (see an example at https://phabricator.wikimedia.org/T101166, which is also listed as a “blocker” for the Q1 quarter). Triaging blockers is still an open, weekly initiative and aimed precisely at identifying issues which may complicate the life of a new editor (as per quarterly goals) :) What Aaron is saying is that if there really were critical usability issues, we'd see it in the productivity measures. As for the struggle to report problems and feature requests, you may be right, although I like to think that the liaisons’ work in that regard has helped at least a little bit in the last few years? We do support several non-English communities, but of course there’s always a lot to be done in that area, and of course nobody can know how many VE bugs are still not reported in Phabricator - that’s why, among other thing, we run specifically tailored communications initiatives. If you have related suggestions about better ways to engage the communities, we’re always all ears. Thank you. --Elitre (WMF) (talk) 16:40, 28 June 2015 (UTC)[reply]

Time to save[edit]

Regarding "time to save", another possibility is that users are reading through the article in VE and editing as they go. I know I have done that in some cases. If so, it's including some reading time in the metric. Mattflaschen-WMF (talk) 22:56, 26 June 2015 (UTC)[reply]

Hi Mattflaschen-WMF, That's a really good point. If that was happening, I'd expect to find many minor changes strewn throughout the page in a single revision rather than many small revisions. We can look for that in the manual reviewing campaign I'm spooling up. See en:WP:Labels/VE experiment edits. --Halfak (WMF) (talk) 19:17, 27 June 2015 (UTC)[reply]
@Halfak (WMF):, in some cases definitely. However, it's also possible someone finds a typo, clicks 'Edit' (the VE one), fixes the single typo, keeps reading, and doesn't find anything else to fix (or doesn't bother to if they do), then finally saves. In that case, they would probably still not go ahead and edit that article immediately again though. Mattflaschen-WMF (talk) 23:29, 29 June 2015 (UTC)[reply]
Good point indeed. I think I did that in the past too. Helder 19:40, 27 June 2015 (UTC)[reply]
I wonder if that could be measured through an automated system, according to the number of lines in each diff (the diff as it displays). If I add one long sentence, that should produce one line (or three, if you include the lines before and after my new sentence) in the diff. If I correct six typos in six different paragraphs, then that should produce six lines in the diff (or up to 18, if you count unchanged lines of wikitext and none of them are adjacent), even if each typo causes a zero-byte size for the change. Whatamidoing (WMF) (talk) 02:14, 29 June 2015 (UTC)[reply]