Research talk:Visual editor for anonymous users, 2016

Comments[edit]

Latest comment: 8 years ago6 comments2 people in discussion

The May_2015_study found that Visual Editor had almost no effect. There are hopes that VE will be beneficial, as well as concerns that it might be detrimental. Instead of "more likely to register accounts or are more productive after registering" I suggest more neutral language, and taking the null hypothesis to be no change. Changes in positive or negative directions would be equally noteworthy.

I realize that you'll only catch a limited number of new account signups, but to the extent possible I hope you will look for longer term outcome (a few months). A single editor who learns what they are doing and starts making many experienced edits is vastly more valuable to the community than a large number of people who make a few newbie edits and disappear.

Most people will end up using whatever editor they are defaulted to, but be sure to check the stats for people who end up saving with the opposite editor. Alsee (talk) 13:05, 11 February 2016 (UTC)Reply

@Alsee: I think that's a fair point about using neutral language, so I changed that phrase to say "...whether giving anonymous editors the visual editor affects their likelihood of registering accounts or their productivity after registering".

However, I would add some points that I think are worth noting:

In traditional hypothesis testing, the null hypothesis is pretty much always no change, so no matter how I phrased that sentence or what it revealed about my expectations, I would have used no change as the null hypothesis (ironically, many people have pointed out that the failure to take prior belief into account is one of the great weaknesses of traditional hypothesis testing).
That sentence describes something that I'm not going to do, although I need the update the explanation for why not.
One reason is that I've actually already looked at longer-term (60 day) outcomes from VE in an expanded follow-up to that May 2015 study. There was absolutely no difference in productivity or survival. However, I did find, just like Aaron, that VE users were reverted significantly less often. I still haven't published that follow-up, because I want to better understand what that difference in reverts means, and I haven't had time since I did the work in late December.

—Neil P. Quinn-WMF (talk) 23:36, 24 February 2016 (UTC)Reply

Thanx. There's an idea I had for further analysis, but I dropped it because I didn't want to impose on an investigation that appeared to have wrapped up. The goal of the revert metric was to measure the burden on existing editors having to "clean up" after an edit. A revert is a very blunt tool. Trying to capture a more accurate "clean up burden" metric is tricky, but I had an idea on how to do it. For the Control group and the Experimental group, select article edits that were followed by an edit by a different user. (We're discarding edits revising their own work, typically in a single session.) Then look at the time interval between the study-edit and the following editor. For a no-burden edit that will be the "usual" randomish interval between edits. An edit needing cleanup will generally attract a prompt fix, either via a recent edit feed or people catching it on their watchlist.

One hypothesis is that VE makes it less likely for a new user to leave behind problems needing fixing. (Perhaps people avoid mangling the wikitext.)
Another hypothesis is that VE makes it more likely for a new user to leave behind problems needing fixing. (Perhaps due to bugs or limitations of VE, or because editors don't see and learn the underlying wikitext.)

If either of those are true then the difference in short interval cleanup edits may be visible in a graph, like the time-to-save plots from the last study. Alsee (talk) 07:30, 26 February 2016 (UTC)Reply

@Alsee: Hmm. I definitely agree that a lot of the metrics I'm using (like reverts) are a very blunt instruments, and your idea seems like a promising approach to improving that. However, I think we have something that's likely to be even more powerful: ORES scores, which are based on human judgements about the quality of different edits and also differentiate between bad-faith and good-faith-but-damaging edits. Our current idea is for average ORES scores in the different groups to be one of the major outcomes we look at.—Neil P. Quinn-WMF (talk) 23:39, 26 February 2016 (UTC)Reply

I only have a superficial familiarity with ORES, but as I understand it, it's mainly focused on distinguishing between good faith edits and bad faith edits. My general line of thought was that VE would have little effect on the preexisting good or bad intent of someone showing up to edit. My suggestion was more in line of trying to identify whether VE was a better or worse tool for editing (and learning to edit). The original rationale for VE was to bring in more editors, based on a theory that wikitext was an obstacle and that VE would be easier/better. That theory hasn't panned out. A lot of the community-negativity about VE is based on the theory that wikitext is better than VE. The edit-interval analysis would catch malicious edits, but mostly I was trying to catch good faith edits done badly and needing cleanup (because learning via VE is better or worse than Wikitext). Alsee (talk) 13:35, 27 February 2016 (UTC)Reply

To follow up on my above comment, I recently cleaned up a bunch of VE edits. VE has a habit of leaving behind useless nowikis, or leaving badly formed links with nowikis in them, and just plain nullifying links by wrapping them completely inside nowikis. VE kills ISBN links in particular. All of the VE edits were good faith edits, but it places a burden on other editors to clean up the messes that VE leaves behind. Almost all edits in the recent-edits feed, tagged with both Visual and nowiki-added, need cleanup of one sort or another. I've also had to clean up after VE inserting invisible characters... they're linefeeds or something.... they don't affect the displayed article but they screw up the wikitext for other editors. None of those cleanups are reverts. The revert metric is completely blind to that. Many of these problems linger in the articles for a long time, but I was thinking that at least a fraction of them may attract immediate cleanup for other editors, and that might be detectable by the uncommonly low edit interval. Alsee (talk) 13:33, 15 April 2016 (UTC)Reply

Primary Editor for new accounts[edit]

Latest comment: 7 years ago2 comments1 person in discussion

There is a bit of a mess at the moment with the WMF setting the wrong Primary editor on EnWiki.[1]

A problem with the IP-Edit research is that most metrics are meaningless or ambiguous because you can't track what effect it has over time. How about tracking the effect of Wikitext-primary-default / VE-primary-default for new accounts? That would give valuable data on whether VE-primary increases/decreases the ability to make a first edit, editor retention, medium term and long term total contributions, learning to use refs/images/templates, and how many people switch and mostly use the opposite editor.

On one hand the data might help convince people who are skeptical of VE that it should be the primary default, or on the other hand it could help convince VE advocates that it shouldn't be the primary default. Either way, hard data is good. Alsee (talk) 11:17, 18 April 2016 (UTC)Reply

It's over a year later, and I'd like to renew the above suggestion. The research project defined on the main page here doesn't really have much value, particularly in regard to whether or not Visual Editor should be made the default. Trying to use ORES for this purpose is almost certainly invalid. There could be any number of factors that could skew ORES results higher or lower, not the least of which is that fact that it's trained on skewed data. The vast majority of training data are wikitext edits, and the very population of people making Visual Editor edits is distinctly skewed from the general population of editors. Any random feature that ORES latches onto could turn the result into garbage for our purposes here. The primary thing we should all be concerned with is the long term outcome.... how many people become experienced and productive contributors. Does VE help or hinder that? Do people embrace VE as the best tool for the job? Or do they abandon it and adopt wikitext as the best tool for the job?

While it wasn't a controlled experiment, there was a time period, cited above, when Visual Editor was made the default for all new users. Later it was switched, the wikitext editor was made the default. We really should be examining that accidental test to see what long term effect, if any, it had. For new accounts created during the Visual-Editor-default period, compared to the time before and after, did more or less of them go on to become long term contributors? Have they made more or fewer total contributions? Did they keep using Visual Editor, or did they largely abandon it for wikitext?

If the uncontrolled data from that time period is considered unacceptable, and you really want a controlled experiment, then you either need to run the experiment defaulting new accounts into different edit modes, or you need to use checkuser data to track the fate of anons after you default them into different editors. This is important. The community is not going to buy into a VE-default for anons based on this proposed study. When the WMF tried defaulting new users into VE, the community response was to write a sitewide-javascript hack to override that default. We didn't deploy that javascript, but we would have if the WMF didn't reverse the default. Alsee (talk) 23:24, 8 May 2017 (UTC)Reply