Edit Review Improvements/ReviewStream

From mediawiki.org

ReviewStream is a machine-readable feed that’s designed to be used by a variety of edit-review tools. Like wikitech:RCStream, it will broadcast recent changes from MediaWiki wikis. To the information currently in RCStream, ReviewStream adds additional data designed to improve the edit-review process. Notably, edit-scoring data from ORES will help reviewers identify edits by good-faith new users, a group that research shows have special requirements. ORES data will also make edit-review processes in general more efficient by enabling reviewers to better prioritize their work. Finally, by directly incorporating data that currently has to be looked up in separate processes, the ReviewStream will make creating mobile and other downstream edit-review tools easier and performance of these tools faster.

General Implementation Strategy[edit]

One Feed[edit]

Although the team discussed the creation of multiple, special-purpose feeds, ultimately it was decided that users’ needs will be better served by one feed that contains all the relevant data.

General Implementation Strategy—Public Events Feeds[edit]

The working theory is that we will build the new feed within the architecture being created for Public Events Streams. This enables us to take advantage of the infrastructure and features that project will supply.

Distinguishing Data/Characteristics[edit]

To make life easier for downstream tool developers and to make tools that consume this feed more performant, the new feed will include various types of data not currently in RCStream.

Include ORES Good-Faith scores and set simplified labels to aid their interpretation[edit]

Goal: Empower users who want to provide assistance and encouragement to good-faith users.

  • Include a good-faith score for every edit in the feed.
  • In addition, to make ORES scores more readily understandable,  create a simplified system of discrete levels within the continuum of scores that aids interpretation and include flags for these thresholds in the feed.
Include ORES Damaging (or reverted) scores and set simplified labels to aid their interpretation[edit]

Goal: By emphasizing (good-faith) edits that are likely to be problematic, we create an opportunity for reviewers to find users who need support. We also enable reviewers of all kinds to more effectively target their efforts.

  • Two ORES scores are relevant: damaging and reverted. Since Damaging is more useful and important, we should use Reverted only when the Damaging model isn't available (yet) on a particular wiki.
  • Include scores for all edits in the feed.
  • As above, define and include a simplified system of flags/thresholds that aid interpretation of scores.
  • To avoid prejudicing users, devise and employ more neutral language than “Damaging.” Also find a way to emphasize that these are predictions, not final determinations.
Set a experience level flags, including “Newcomer”[edit]

Goal: Research shows that new editors are particularly vulnerable to rejection. To add prominence to the special needs of new users, standardize a new-user definition that’s useful for reviewing purposes, and enable downstream tools to identify and filter for new users more easily, we’ll calculate and flag new user status explicitly in the feed.

  • Research shows that the first few days of activity are when users are most vulnerable to rejection. We therefore propose the following as a Newcomer definition for reviewing purposes: fewer than 10 edits and 4 days of activity.
  • To help with other types of reviewing and enable reviewers to more easily exclude Newcomers, we have defined additional experience levels, which should also be included in the feed. For definitions see T145159
    • Experienced Users
    • More Experienced Users
  • All three ranges should be easily configurable on a per-wiki basis.
Set separate flag for “good-faith edits by new users”[edit]

Goal: Sheltering “Good-faith edits by new users” from review processes not designed for them is a high-priority goal of the ERI project. Although downstream tools can identify such edits by cross-referencing the relevant data themselves, doing that work for them and explicitly flagging this category will, we hope, make the special needs of good-faith new users more prominent and make it more likely that tool developers will feature and filter by this category.

  • As an imagined use case: an anti-vandalism tool might highlight such edits with a special signifier or offer a setting that would exclude them.
  • The thresholds should be easily configured and should be configurable on a per-wiki basis.

Other Feed Data and Features[edit]

MediaWiki data about edits, pages and users[edit]

Basic information about pages, edits and users will be useful in a variety of ways. The lists below show the data the team has determined to be useful. Items without bracketed annotations are currently in RCFeed. Annotated items are not in RCFeed but were judged to be easily available—with one exception (Change tags).

Metadata about edits

  • Whether the user marked their edit minor
  • Whether the user marked their edit as a bot edit.
  • Whether it is a page creation
  • Date and time of edit
  • User who made the edit
  • Size of edit (in bytes -- can be derived)
  • Patrolled [not enabled on most wikis at present]
  • Edit summary
  • Change tags [not readily available]

Metadata about pages

  • Namespace
  • Title ( in two parts, namespace:pagetitle)
  • Length (Size of page)
  • Page views [not in feed; could be added]
  • Edit protection level  (e.g. full protection, semi-protection/autoconfirmed users only, no protection) [not yet in feed but readily available]
  • Whether the page is a redirect
  • Whether the page is a disambiguation page. [not in feed; could be added]

Metadata about users

  • Registered/Anonymous
  • Number of edits [could be added, available in user record]
  • Registration date
  • What user-groups the user is in (e.g. sysop/administrator, patroller, researcher, rollbacker, autoconfirmed).
Instrument feeds to monitor Usage[edit]

Goal: To understand user behavior we need to know if people are using the feed. This should include whether they are accessing via tools like Huggle (i.e., not on a page).