Research:Revert

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

A revert is a type of edit which removes the effects of a previous edit. This action typically results in the article being restored to a version that existed sometime previously. A partial revert involves reversing only part of a prior edit, while retaining other parts of it. In Wikipedia, reverts are commonly used to remove inappropriate changes to articles. This page describes types of reverting actions and revert detection methods.

Types of reverting actions[edit]

Identity revert[edit]

An identity revert is an edit to an article that creates a new revision that exactly matches a previous revision -- removing the changes made by any intervening edits. According to work by Kittur et al. [1] and Flöck et al. [2] , this is the most common type of revert.

For example, in the sequence of revisions below (1-3), revision #3 reverts revision #2 by creating an exact copy of revision #1:

  1. "This is an article"
  2. "This is not an article"
  3. "This is an article"

In this case, revision #3 is referred to as the reverting revision, revision #2 is the reverted revision and #1 is the reverted-to revision.

Full revert plus changes[edit]

Another potential editing pattern is where an editor chooses to restore an old revision, similar to an identity revert above, but before saving the old version, the editor makes another change. Thus, the intermediate edit's contributions are fully reverted, but no identical revision is produced.

For example:

  1. "This is an article"
  2. "This is not an article"
  3. "This is an encyclopedia article"

Although revision #3 removed all changes made by revision #2 (removing "not"), it also adds the word "encyclopedia". In this case, it is clear that revision #3 is a reverting revision, revision #2 is a reverted revision, but revision #1 wasn't exactly reverted-to.

Partial revert[edit]

A partial revert refers to an edit that removes some part of a change made by another revision, but not the entire change.

For example:

  1. "This is an article"
  2. "This is not an encyclopedia article"
  3. "This is an encyclopedia article"

In this case, revision #2 makes two changes by adding the words "not" and "encyclopedia" to the article. Revisions #3 only removes the word "not" from the article. In this example, revision #3 is a partially reverting revision, revision #2 was partially reverted, but revision #1 wasn't exactly reverted-to.

Detection[edit]

Since the MediaWiki software does not flag reverts or provide any structured information about them, researchers have developed methods for detecting them.

Identity revert via checksum with history[edit]

Research by Kittur et al. suggests that 94% of reverts can be detected by matching MD5 checksums of revision content historically[1].

Python code
revisions = [] #result of http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=User:EpochFail&rvprop=ids|sha1&format=jsonfm&rvlimit=500

class History:
	'''
	A datastructure for efficiently storing and retrieving a 
	limited number of historical records
	'''
	
	def __init__(self, maxlen=15):
		'''Maxlen specifies the maximum amount of history to keep'''
		self.maxlen=maxlen
		self.d = {} #Dictionary to allow fast lookup based on keys
		self.l = [] #List to preserve order for history
	
	def add(self, key, value):
		'''Adds a new key-value pair. Returns any discarded values.'''
		self.l.append((key, value))
		sublist = self.d.get(key, [])
		sublist.append(value)
		self.d[key] = sublist
		
		if len(self.l) > self.maxlen:
			okey, ovalue = self.l.pop(0)
			self.d[okey].pop(0)
			if len(self.d[okey]) == 0: del self.d[okey]
			return ovalue
		
	
	def __contains__(self, key):
		'''Checks if the key is contained in the history using the "in" keyword'''
		return key in self.d
		
	
	def get(self, key):
		'''Gets the most recently added value for a key'''
		return self.d[key][-1]
	
	def upTo(self, key):
		'''Gets the recently inserted values up to a key'''
		for okey, ovalue in reversed(self.l):
			if okey == key: break
			else: yield ovalue
	
	

history = History(15) #History capped at 15 revisions (common practice)
for rev in revisions:
	if rev['sha1'] in history: #Identity revision found in history
		reverted = list(history.upTo(rev['sha1']))
		if len(reverted) > 0: #Found reverted revisions
			print "reverting: %s, reverted: %s, reverted-to: %s" % (
				rev['revid'],
				[r['revid'] for r in reverted],
				history.get(rev['sha1'])['revid']
			)
		else: #noop -- same checksum as last revision
			pass
		
	
	history.add(rev['sha1'], rev)

Full and partial revert detection via diffing[edit]

Diffing-based strategies for partial revert & full revert with changes detection have been developed [2]. These strategies increase accuracy of revert detection at the cost of performance (due to the computational complexity of difference detection). While Kittur et al.'s work suggests that only 6% of reverts are not identifiable via identity match[1], more recent work by Flöck et al. suggests that this method identifies 12% more reverts ("full reverts plus changes" ) than the checksum method alone and can on top identify partial reverts that are not detectable with full-revision-checksum approaches.[2]

See [1] for python code demonstrating such a strategy.

Time to revert cutoff[edit]

Since it is theoretically possible that a revision could be reverted years after it was originally saved, observations taken at any time would truncate any future reverts (specifically, they would be right censored). In order to minimize this issue and compare editors' contributions fairly, it's necessary to choose a cutoff time and count only reverts within occurred within that period after the original edit. 48 hours is a common cutoff, as recent research suggests that, at least for the English Wikipedia, nearly all reverts take place within 48 hours.[3]

See also[edit]

References[edit]

  1. a b c Aniket Kittur, Bongwon Suh, Bryan A. Pendleton & Ed H. Chi (2007). He says, she says: conflict and coordination in Wikipedia. In Proceedings of the CHI'07. 453-462. DOI=10.1145/1240624.1240698
  2. a b c Fabian Flöck, Denny Vrandecic and Elena Simperl. Reverts Revisited – Accurate Revert Detection in Wikipedia. HT’12, June 25–28, 2012, Milwaukee, Wisconsin, USA. pdf
  3. R. Stuart Geiger & Aaron Halfaker. (2013). When the Levee Breaks: Without Bots, What Happens to Wikipedia's Quality Control Processes? WikiSym. [ pdf ]

Further reading[edit]