Research:Revision scoring as a service
Many of Wikipedia's most powerful tools rely on machine classification of edit quality. Regrettably, few of these tools publish a public API for consuming the scores they generate – and those are only available for English Wikipedia. In this project, we'll construct a public queryable API of machine classified scores for revisions. It's our belief that by providing such a service, we would make it much easier to build new powerful wiki tools and extend current tools to new wikis.
We're an open project team. There are many ways you can get in contact with us.
- Email: ahalfakerwikimedia.org
- Mailing list: ai archive - options - send
- IRC: #wikimedia-ai
- Phabricator: phabricator.wikimedia.org/tag/revscoring/
- The talk page: Research talk:Revision scoring as a service
English Wikipedia has a lot of machine learning classifiers applied to individual edits for the purposes of quality control:
- ClueBot NG is a powerful counter-vandalism bot that uses Bayesian language learning machine classification.
- Huggle triages edits for human review, in a local fashion, based on extremely simple metadata
- STiki calculates its own vandalism probabilities using metadata, consumes those of ClueBot NG, and makes both available as "queues" for human review in a GUI tool.
Availability of scores
All of these tools rely on machine generated revision quality scores -- yet obtaining such scores is not trivial in most cases. STiki is the only system that provides a queryable API to its internal scorings. ClueBot NG provides an IRC feed of its scores, but not a querying interface. The only one of these tools that runs outside of English Wikipedia is Huggle, but Huggle produces no feed or querying service for its internal scores.
Importance of scores
This lack of a general, accessible revision quality scoring service is a hindrance to the development of new tools and the expansion of current tools to non-English Wikis. For example, Snuggle takes advantage of STiki's web API to perform its own detection of good-faith newcomers. Porting a system like Snuggle to non-English wikis would require a similar queryable source of revision quality scores.
Scoring as a service
We can do better. In this project, we'll develop and deploy a general query scoring service that would provide access to quality scoring algorithms and pre-generated models via a web API. We feel that the presence of such a system would allow new tools to be developed and current tools to be ported to new projects more easily.
The system has four scoring models:
- Reverted: This model is automatically trained based on reverted/non-reverted edits.
- Damaging: This model predicts whether an edit is damaging or not. It's trained on user-labelled damaging edits and is more accurate than the reverted model.
- Good faith: This model predicts whether an edit was done in good faith or not.
- wp10: This model rates an article based on wp10 rating scale.
Objective revision evaluation service (ORES)
The primary way that wiki tool developers will take advantage of this project is via a restful web service and scoring system we call ORES. ORES provides a web service that will generate scores for revisions on request. For example, https://ores.wikimedia.org/scores/enwiki?revids=34854258&models=reverted asks for the score of the "reverted" model for revision #34854258 in English Wikipedia.
To support ORES and to enable python developers who would rather apply revision scoring models directly, we provide a high quality python library with some key features to make the construction of new, powerful scoring strategies easy.
- Scorer abstraction allows for manual rule-based scoring and machine learning models
- Feature extraction garden enables new features to be developed based on old with ease
- Pre-trained model files will be made available so that others can use our machine learning models for their own purposes without needing to re-train them.
Most models will need to be trained on a per-language/wiki basis. If a new wiki-language community wants to have access to scores, we'd ask them to provide us with a random sample of labeled revisions from which we can train/test new models. To make this work easier, we are constructing a human computation interface to make this type of data easy to crowd-source. Since this is a common problem, we're keeping an eye on generalizability of the system to a wide range of hand-coding/labeling problems.
Tools that use ORES
- pt:Wikipédia:Scripts/FastButtons (pt:MediaWiki:Gadget-fastbuttons.js)
- mw-gadget-ScoredRevisions (User:He7d3r/Tools/ScoredRevisions.js)
- Raun (for projects that support ORES)
- Real-Time Recent Changes
- en:Wikipedia:WikiProject X via en:User:Reports bot
- en:Wiki Education Foundation – Student quality/productivity measurements
- en:User:SuggestBot uses ORES to predict article quality
- crosswatch – cross-wiki watchlist
- mw:Extension:ORES (gerrit)
- en:User:DataflowBot uses ORES to predict article quality
- WikiEdu recent student activity dashboard
- WikiEdu article finder
- CopyPatrol tool
Other possible uses
- Research:Automated classification of article quality
- Research:Automated classification of edit types
- Revscoring @ Wikitech
- Revscoring @ Phabricator
- Wiki Artificial Intelligence @ Github
- Revscoring IEG grant (Dec. 2014 - May 2015)
- Other subpages of this page