Research:Revert

From Meta, a Wikimedia project coordination wiki
(Redirected from Research:Revert detection)
Jump to navigation Jump to search

A revert is a type of edit which removes the effects of a previous edit. This action typically results in the article being restored to a version that existed sometime previously. A partial revert involves reversing only part of a prior edit, while retaining other parts of it. In Wikipedia, reverts are commonly used to remove inappropriate changes to articles. This page describes types of reverting actions and revert detection methods.

Types of reverting actions[edit]

Identity revert[edit]

An identity revert is an edit to an article that creates a new revision that exactly matches a previous revision -- removing the changes made by any intervening edits. According to work by Kittur et al. [1] and Flöck et al. [2] , this is the most common type of revert. Yasseri et al. have used this definition to devise a measure of controversy [3].

For example, in the sequence of revisions below (1-3), revision #3 reverts revision #2 by creating an exact copy of revision #1:

  1. "This is an article"
  2. "This is not an article"
  3. "This is an article"

In this case, revision #3 is referred to as the reverting revision, revision #2 is the reverted revision and #1 is the reverted-to revision.

Full revert plus changes[edit]

Another potential editing pattern is where an editor chooses to restore an old revision, similar to an identity revert above, but before saving the old version, the editor makes another change. Thus, the intermediate edit's contributions are fully reverted, but no identical revision is produced.

For example:

  1. "This is an article"
  2. "This is not an article"
  3. "This is an encyclopedia article"

Although revision #3 removed all changes made by revision #2 (removing "not"), it also adds the word "encyclopedia". In this case, it is clear that revision #3 is a reverting revision, revision #2 is a reverted revision, but revision #1 wasn't exactly reverted-to.

Partial revert[edit]

A partial revert refers to an edit that removes some part of a change made by another revision, but not the entire change.

For example:

  1. "This is an article"
  2. "This is not an encyclopedia article"
  3. "This is an encyclopedia article"

In this case, revision #2 makes two changes by adding the words "not" and "encyclopedia" to the article. Revisions #3 only removes the word "not" from the article. In this example, revision #3 is a partially reverting revision, revision #2 was partially reverted, but revision #1 wasn't exactly reverted-to.

Detection[edit]

Edit tags[edit]

The MediaWiki software automatically adds edit tags to reverts performed using the "undo" (examples) and "rollback" (examples) features. In 2020, an additional "manual revert" tag was introduced to detect reverts not done through either of these two features (examples), see phab:T256001 for details. Like any edit tag, these can be also be queried in the change tag database table (for Wikimedia projects, see also Research:Data regarding database access).

Researchers have earlier developed various methods for detecting reverts:

Identity revert via checksum with history[edit]

Research by Kittur et al. suggests that 94% of reverts can be detected by matching MD5 checksums of revision content historically[1]. Yasseri et al. detected some 5 million reverts using this method in their study of edit wars.[3]

Python code
revisions = [] #result of http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=User:EpochFail&rvprop=ids|sha1&format=jsonfm&rvlimit=500

class History:
	'''
	A datastructure for efficiently storing and retrieving a 
	limited number of historical records
	'''
	
	def __init__(self, maxlen=15):
		'''Maxlen specifies the maximum amount of history to keep'''
		self.maxlen=maxlen
		self.d = {} #Dictionary to allow fast lookup based on keys
		self.l = [] #List to preserve order for history
	
	def add(self, key, value):
		'''Adds a new key-value pair. Returns any discarded values.'''
		self.l.append((key, value))
		sublist = self.d.get(key, [])
		sublist.append(value)
		self.d[key] = sublist
		
		if len(self.l) > self.maxlen:
			okey, ovalue = self.l.pop(0)
			self.d[okey].pop(0)
			if len(self.d[okey]) == 0: del self.d[okey]
			return ovalue
		
	
	def __contains__(self, key):
		'''Checks if the key is contained in the history using the "in" keyword'''
		return key in self.d
		
	
	def get(self, key):
		'''Gets the most recently added value for a key'''
		return self.d[key][-1]
	
	def upTo(self, key):
		'''Gets the recently inserted values up to a key'''
		for okey, ovalue in reversed(self.l):
			if okey == key: break
			else: yield ovalue
	
	

history = History(15) #History capped at 15 revisions (common practice)
for rev in revisions:
	if rev['sha1'] in history: #Identity revision found in history
		reverted = list(history.upTo(rev['sha1']))
		if len(reverted) > 0: #Found reverted revisions
			print "reverting: %s, reverted: %s, reverted-to: %s" % (
				rev['revid'],
				[r['revid'] for r in reverted],
				history.get(rev['sha1'])['revid']
			)
		else: #noop -- same checksum as last revision
			pass
		
	
	history.add(rev['sha1'], rev)

Revert patterns[edit]

Revert patterns in the edit history can be used to identify whether a revert did indeed remove inappropriate changes to articles. Research by Kiesel et al. [4] suggests that 6% of all identity reverts are so-called pseudo reverts (to a blank page or to the previous revision). They also analyzed cases for which it is unclear whether inappropriate changes were removed. For example if someone reverts their own work (9%) or edit wars (11%).

Full and partial revert detection via diffing[edit]

Diffing-based strategies for partial revert & full revert with changes detection have been developed [2]. These strategies increase accuracy of revert detection at the cost of performance (due to the computational complexity of difference detection). While Kittur et al.'s work suggests that only 6% of reverts are not identifiable via identity match[1], more recent work by Flöck et al. suggests that this method identifies 12% more reverts ("full reverts plus changes" ) than the checksum method alone and can on top identify partial reverts that are not detectable with full-revision-checksum approaches.[2]

See [2] for python code demonstrating such a strategy.

Cutoffs for time to revert and edit radius[edit]

Since it is theoretically possible that a revision could be reverted years after it was originally saved, observations taken at any time would truncate any future reverts (specifically, they would be right censored). In order to minimize this issue and compare editors' contributions fairly, it's necessary to choose a cutoff time and count only reverts within occurred within that period after the original edit. 48 hours is a common cutoff, as recent research suggests that, at least for the English Wikipedia, nearly all reverts take place within 48 hours.[5] Furthermore, it is common practice to also only count reverts that happen within a certain number of subsequent edits on the same page (in the mwreverts package, this is called the "radius", with a default value of 15, but this example and other analyses based on it have been using a lower value of 5).

Edit tag for reverted edits[edit]

In 2020, an edit tag marking reverted edits - as opposed to reverting edits - was introduced in MediaWiki (examples), see phab:T254074 for details.

Data sources[edit]

Tracked in Phabricator:
task T152434

More datasets are always in demand for revert identification. wikitech:Analytics/Data Lake/Edits/Mediawiki history has fields such as "revision_is_identity_reverted".

About 5 million identity reverts detected by analysing the text of the articles are provided by Yasseri et al. here.

See also[edit]

References[edit]

  1. a b c Aniket Kittur, Bongwon Suh, Bryan A. Pendleton & Ed H. Chi (2007). He says, she says: conflict and coordination in Wikipedia. In Proceedings of the CHI'07. 453-462. DOI=10.1145/1240624.1240698 PDF
  2. a b c Fabian Flöck, Denny Vrandecic and Elena Simperl. Reverts Revisited – Accurate Revert Detection in Wikipedia. HT’12, June 25–28, 2012, Milwaukee, Wisconsin, USA. PDF
  3. a b Yasseri, T., Sumi, R., Rung, A., Kornai, A., & Kertész, J. Plos One (2012). Dynamics of conflicts in Wikipedia. Plos ONE. DOI=10.1371/journal.pone.0038869
  4. Johannes Kiesel, Martin Potthast, Matthias Hagen & Benno Stein (2017). Spatio-temporal Analysis of Reverted Wikipedia Edits. In Proceedings of the ICWSM'17. 122-131. [1]
  5. R. Stuart Geiger & Aaron Halfaker. (2013). When the Levee Breaks: Without Bots, What Happens to Wikipedia's Quality Control Processes? WikiSym. [ pdf ]

Further reading[edit]