From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

A revert is a type of edit which removes the effects of a previous edit. This action typically results in the article being restored to a version that existed sometime previously. A partial revert involves reversing only part of a prior edit, while retaining other parts of it. In Wikipedia, reverts are commonly used to remove inappropriate changes to articles. This page describes types of reverting actions and revert detection methods.

Types of reverting actions[edit]

Identity revert[edit]

An identity revert is an edit to an article that creates a new revision that exactly matches a previous revision -- removing the changes made by any intervening edits. According to work by Kittur et al. [1] and Flöck et al. [2] , this is the most common type of revert.

For example, in the sequence of revisions below (1-3), revision #3 reverts revision #2 by creating an exact copy of revision #1:

  1. "This is an article"
  2. "This is not an article"
  3. "This is an article"

In this case, revision #3 is referred to as the reverting revision, revision #2 is the reverted revision and #1 is the reverted-to revision.

Full revert plus changes[edit]

Another potential editing pattern is where an editor chooses to restore an old revision, similar to an identity revert above, but before saving the old version, the editor makes another change. Thus, the intermediate edit's contributions are fully reverted, but no identical revision is produced.

For example:

  1. "This is an article"
  2. "This is not an article"
  3. "This is an encyclopedia article"

Although revision #3 removed all changes made by revision #2 (removing "not"), it also adds the word "encyclopedia". In this case, it is clear that revision #3 is a reverting revision, revision #2 is a reverted revision, but revision #1 wasn't exactly reverted-to.

Partial revert[edit]

A partial revert refers to an edit that removes some part of a change made by another revision, but not the entire change.

For example:

  1. "This is an article"
  2. "This is not an encyclopedia article"
  3. "This is an encyclopedia article"

In this case, revision #2 makes two changes by adding the words "not" and "encyclopedia" to the article. Revisions #3 only removes the word "not" from the article. In this example, revision #3 is a partially reverting revision, revision #2 was partially reverted, but revision #1 wasn't exactly reverted-to.


Since the MediaWiki software does not flag reverts or provide any structured information about them, researchers have developed methods for detecting them.

Identity revert via checksum with history[edit]

Research by Kittur et al. suggests that 94% of reverts can be detected by matching MD5 checksums of revision content historically[1].

Python code
revisions = [] #result of|sha1&format=jsonfm&rvlimit=500

class History:
	A datastructure for efficiently storing and retrieving a 
	limited number of historical records
	def __init__(self, maxlen=15):
		'''Maxlen specifies the maximum amount of history to keep'''
		self.d = {} #Dictionary to allow fast lookup based on keys
		self.l = [] #List to preserve order for history
	def add(self, key, value):
		'''Adds a new key-value pair. Returns any discarded values.'''
		self.l.append((key, value))
		sublist = self.d.get(key, [])
		self.d[key] = sublist
		if len(self.l) > self.maxlen:
			okey, ovalue = self.l.pop(0)
			if len(self.d[okey]) == 0: del self.d[okey]
			return ovalue
	def __contains__(self, key):
		'''Checks if the key is contained in the history using the "in" keyword'''
		return key in self.d
	def get(self, key):
		'''Gets the most recently added value for a key'''
		return self.d[key][-1]
	def upTo(self, key):
		'''Gets the recently inserted values up to a key'''
		for okey, ovalue in reversed(self.l):
			if okey == key: break
			else: yield ovalue

history = History(15) #History capped at 15 revisions (common practice)
for rev in revisions:
	if rev['sha1'] in history: #Identity revision found in history
		reverted = list(history.upTo(rev['sha1']))
		if len(reverted) > 0: #Found reverted revisions
			print "reverting: %s, reverted: %s, reverted-to: %s" % (
				[r['revid'] for r in reverted],
		else: #noop -- same checksum as last revision
	history.add(rev['sha1'], rev)

Full and partial revert detection via diffing[edit]

Diffing-based strategies for partial revert & full revert with changes detection have been developed [2]. These strategies increase accuracy of revert detection at the cost of performance (due to the computational complexity of difference detection). While Kittur et al.'s work suggests that only 6% of reverts are not identifiable via identity match[1], more recent work by Flöck et al. suggests that this method identifies 12% more reverts ("full reverts plus changes" ) than the checksum method alone and can on top identify partial reverts that are not detectable with full-revision-checksum approaches.[2]

See [1] for python code demonstrating such a strategy.

Cutoffs for time to revert and edit radius[edit]

Since it is theoretically possible that a revision could be reverted years after it was originally saved, observations taken at any time would truncate any future reverts (specifically, they would be right censored). In order to minimize this issue and compare editors' contributions fairly, it's necessary to choose a cutoff time and count only reverts within occurred within that period after the original edit. 48 hours is a common cutoff, as recent research suggests that, at least for the English Wikipedia, nearly all reverts take place within 48 hours.[3] Furthermore, it is common practice to also only count reverts that happen within a certain number of subsequent edits on the same page (in the mwreverts package, this is called the "radius", with a default value of 15, but this example and other analyses based on it have been using a lower value of 5).

Data sources[edit]

More datasets are always in demand for revert identification. wikitech:Analytics/Data Lake/Edits/Mediawiki history has fields such as "revision_is_identity_reverted".

See also[edit]


  1. a b c Aniket Kittur, Bongwon Suh, Bryan A. Pendleton & Ed H. Chi (2007). He says, she says: conflict and coordination in Wikipedia. In Proceedings of the CHI'07. 453-462. DOI=10.1145/1240624.1240698 PDF
  2. a b c Fabian Flöck, Denny Vrandecic and Elena Simperl. Reverts Revisited – Accurate Revert Detection in Wikipedia. HT’12, June 25–28, 2012, Milwaukee, Wisconsin, USA. PDF
  3. R. Stuart Geiger & Aaron Halfaker. (2013). When the Levee Breaks: Without Bots, What Happens to Wikipedia's Quality Control Processes? WikiSym. [ pdf ]

Further reading[edit]