Research talk:Article feedback/Stage 3/Conversion and newcomer quality

From Meta, a Wikimedia project coordination wiki

Familiarity loses[edit]

That was an interesting test, and 4e with its call to action does seem a clear winner. Are we going to roll that out instead of AFT?

I'm concerned that the control may not be giving us a realistic test here. We have hundreds of millions of readers, many of whom will be very familiar with our current look and feel. The probability is that any change will have a short term "honeymoon" period before it becomes part of the furniture and is unconsciously disregarded by most editors. So simply replacing the word "Edit" with the calls to action "Improve this" or "Correct this" would probably get a short term boost to editing, even if after a few weeks the number of new editors per day dropped to or below the baseline numbers of new editors per day. WereSpielChequers (talk) 14:12, 24 July 2012 (UTC)[reply]

I'm not sure I'd consider 4e a clear winner given the inherent tradeoffs the proportion of "initially productive" editors. However, as a scholar, I'm likely to prefer to learn from these sorts of experiments without worrying about which decisions should be made. My primary concern, however, is for those editors who enter via the 4e form not being ready to make edits and therefore being more likely to have their work rejected. I'd like to explore the pattern of edits and quality of work with better metrics before advising toward any particular variant of the form interface.
I think you are exactly right about the honeymoon period. I'm not really worried about the effect of the honeymoon period on the conclusions drawn in the writeup, but I fear that attempting to extrapolate the results across the entire wiki may suffer great error due to the honeymoon effect. --EpochFail (talk) 14:25, 24 July 2012 (UTC)[reply]
You have something of a point there. We have a problem with unsourced goodfaith edits, as a proportion of patrollers now work on the basis that it is better to be safe than sorry and despite policy to the contrary they revert even the most uncontentious unsourced change. However I wouldn't hold out hope for the AFT being any better there, my suspicion is that unsourced suggested changes may even be less likely to result in an article change than unsourced edits. Remember there will be a bunch of patrollers who won't necessarily revert an unsourced uncontentious change, but they wouldn't make that change on behalf of someone who posts it into AFT unless there was a source. Once people realise that this is less likely to see an article change than doing so themselves then I predict it will lose some of the goodfaith users. WereSpielChequers (talk) 16:30, 19 August 2012 (UTC)[reply]

reverts[edit]

There's another issue that I think needs addressing. In the test those edits that were reverted were ignored, but not the reverts themselves. But for the community, vandalism reversion is an overhead. We value the time people spend on it, but it isn't of itself building the pedia, and if something results in extra vandalism reversions being required we shouldn't measure that as a positive. If anything it is a cost. Ideally we should measure additional reversions as a cost of volunteer time, but we certainly shouldn't as at present treat them as a positive. For a more realistic valuation of these two tests we should at least measure them discarding both reverted edits and reverts. That way we have a better chance of measuring extra article improvement. WereSpielChequers (talk) 16:43, 24 July 2012 (UTC)[reply]

I'm confused. The last figure shows the raw numbers of reverted vs. "productive" edits. We most certainly consider reverted edits as non-productive. I actually refer to them as anti-productivity for the reasons you specified -- it takes extra work to fix the damage caused. We didn't look for reverting edits from these newcomers, however, I'd argue that any reverting edits (that are themselves not reverted) are productive. Vandalism is a law of nature in Wikipedia. Although I'd agree that reverting vandalism doesn't build the encyclopedia, that doesn't mean it isn't just as essential an activity. --EpochFail (talk) 17:08, 24 July 2012 (UTC)[reply]
Reverting vandalism is indeed useful activity but it is additional work for our volunteers rather than additional improvement to the pedia. What we should measure here is additional productive edits that are triggered by AFT. If AFT provokes someone to vandalise wikipedia then the reversion of the vandalism is extra work the community has had to do because of AFT. Currently it is being measured as a benefit - "if we can prompt our readers to do x thousand extra vandalisms we can extract another x thousand useful vandalism reverts from our volunteers". We should be including it in the price "if we do this we can prompt our readers to do y extra useful edits. But they will also do x extra vandalisms and this requires another z vandalism reverts from our volunteers". Depending on the values of x and y this may be a good or bad move. WereSpielChequers (talk) 13:14, 25 July 2012 (UTC)[reply]
I'm confused about why you think we are considering vandalism performed by newly converted users to be productive. I specifically label such edits as "unproductive" and refer to these edits as "additional work (unproductive edits) incurred by the incoming new editor" in the write-up. The associated blog post (which I recommend for synthesis until the manuscript passes peer review) finishes with:
A decision about whether such an approach is desirable should weigh the value of a larger number of productive new editors and useful feedback with the cost of an increase in unproductive edits. --EpochFail (talk) 13:52, 25 July 2012 (UTC)[reply]
When I said " In the test those edits that were reverted were ignored, but not the reverts themselves." I acknowledged that you are screening out the vandalism, as the reverted edits will largely correlate with vandalism. My point is that the reversion of that vandalism is being measured as a positive result. Actually it is extra work that the community has had to do, and part of the cost of this change rather than the benefit. Of course part of this cost is very low - if a bot reverts vandalism then the cost is insignificant. And I'm not disputing that one person adding a useful sentence to an article makes it worthwhile having to clear up several vandalisms. My point is that the cleanup of several vandalisms is part of the price we pay for that extra edit not part of the benefit. So how could I rephrase "In the test those edits that were reverted were ignored, but not the reverts themselves." to make it clear that I acknowledge that the test has, however imperfectly, screened out vandalism. But it has treated the reversion of that vandalism as a positive result from AFT rather than part of the price that the community is paying for AFT? WereSpielChequers (talk) 15:08, 25 July 2012 (UTC)[reply]
Vandalism was not ignored. Vandalism was measured as unproductive edits under the assumption that each one represented some amount of additional work for Wikipedians and was, therefore, undesirable. The reversion of such vandalism was *not* treated as a positive result. It's becoming clear to me that I've miscommunicated something in the writeup, but I can't figure out where that miscommunication is. --EpochFail (talk) 15:40, 25 July 2012 (UTC)[reply]
The relevant part of the write-up is "We assume that, if a revision was reverted by another editor within 48 hours, it was likely to have been unproductive. On the contrary, we assume that revisions that were not reverted within 48 hours were likely to have contributed some value to the articles they changed.". I take that as meaning that reverted edits were presumed to be vandalism and ignored from the stats. But my interpretation of that sentence is that the reversion of vandalism is included, provided that the reversion isn't itself reverted within 48 hours. WereSpielChequers (talk) 16:44, 25 July 2012 (UTC)[reply]
I'm pretty sure that our new users were not reverting much vandalism in their first week of activity (I hadn't checked), but even if they were, I'd consider that productive work. We weren't measuring *everyone* who edited AFT articles -- just the new users who started editing via the experimental cases. --EpochFail (talk) 17:15, 25 July 2012 (UTC)[reply]