Research talk:Newbie reverts and subsequent editing behavior
Good and Bad edits 
A large proportion of reverts will be for vandalism and spam, is there any way to filter these out? Also is it possible that what your figures show is that either one or both of these are becoming more common or we are becoming more efficient at deterring vandals and spammers from the project? WereSpielChequers 12:39, 15 June 2011 (UTC)
Whilst we'd like to reform vandals and get them to edit productively, in practice we are quite happy if most vandals simply go away after having their vandalism reverted. But we don't want to lose goodfaith newbies. If the theory is correct that vandalism has been increasing and goodfaith edits declining, then this research may simply be telling us that we are becoming more efficient at dealing with an increasing number of vandals and spammers. However the picture may also be more complex with decreasing tolerance for unsourced additions resulting in more goodfaith but unsourced edits being reverted, it would be interesting to know if "revert unsourced" and edit summaries to that effect were more common and how offputting they were to Newbies. WereSpielChequers 10:54, 30 June 2011 (UTC)
Question regarding reverts and conclusion 
Hi, first I I have to say that I find your research question and the results very interesting. I'm doing something in the same direction.
It would be nice if you could answer me some questions I have:
1. regarding the notion of a "revert"
a) You say that "Each editor's first edit was compared with a revision table to determine if it had been reverted." --> how exactly was that done? I'm interested in reverts myself and constructed a script using MD5hashs and Text-DIFFs on the revisions in the dump, but I'm not quite content with the results.
b) Related, I would like to know how you defined a "revert": is it the complete deletion of all the characters entered by an editor in an edit? what about editors that revert others or delete content? do you treat their edits as being reverted if the deleted content gets reintroduced? Did you take into account location of the words in the text or did you use a bag-of-words model? I read some papers and tool documentations that use "reverts", and some mention their method (while many don't), while it seems almost no-one describes their definition what a "revert" actually is. Maybe you got some pointers.
2. regarding the result of your research:
- you say that "The negative effect of a revert has increased over time" --> where do you take this conclusion from? For when I'm (just quickly, manually, without any statistic method) checking your percentages of retained editors, the differences between first-edit-reverted and non-first-edit-reverted for all amounts of days decrease over the years. But maybe I misunderstood what you meant.
- also, you talk about an "effect" of the first edit being reverted and I like the approach and idea behind it. But, as often stated, correlation is no causation, and it might e.g. be that very bad (or as stated in the other discussion entry above, vandalistic) editors who are not motivated in long-term contribution and which have bad editing skills get reverted in their first and also their subsequent edits and therefore drop out because of the natural resistance of Wikipedia to bad quality. I.e. an editor getting his first edit reverted could be caused by his bad quality which also causes him to stop editing, because he gets reverted more (or loses interest).
I'm aware this is no full research paper, just wanted to point this out, maybe it is of help :) I would appreciate your feedback, especially on the revert-related questions.
Fabian Flöck 12:17, 18 August 2011 (UTC)