Machine learning models/Proposed/Multilingual revert risk

From Meta, a Wikimedia project coordination wiki
Model card
This page is an on-wiki machine learning model card.
A diagram of a neural network
A model card is a document about a machine learning model that seeks to answer basic questions about the model.
Model Information Hub
Model creator(s)Mykola Trokhymovych, Muniza Aslam, Ai-Jou Chou, and Diego Saez-Trumper
Model owner(s)Diego Saez-Trumper
Codetraining and inference
Uses PIINo
In production?Yes
This model uses revision content and metadata to predict the risk of being reverted.

How can we help editors to identify revisions that need to be “patrolled”?

The goal of this model is to detect revisions that might be reverted independently if they were made in good faith or with the intention of creating damage. Wikipedia has a group of dedicated volunteer editors, known as patrollers, who work to ensure the accuracy and integrity of the information on the site. These patrollers review and edit articles, monitor for vandalism, and enforce community guidelines. However, their work is not easy, as they have to keep up with the fast pace and language diversity of Wikipedia, where on average, around 16 pages are edited per second in 250+ languages [1]. The aim of this model is to help patrollers quickly identify potential problems, prioritize the work, and revert damaging edits when needed.

This model is deployed on LiftWing. Right now, it is available for internal usage. This model can be used to detect revisions that might need to be reverted.


Knowledge Integrity is one of the strategic programs of Wikimedia Research with the goal of identifying and addressing threats to content on Wikipedia, increasing the capabilities of patrollers, and providing mechanisms for assessing the reliability of sources[2]. The main goal of the project is to create a new generation of patrolling models, improving accuracy, fairness, and maintainability compared to previous state-of-the-art ORES[3].

The current model is able to work on almost any Wikipedia article in any of the 47 chosen languages: ['ka', 'lv', 'ta', 'ur', 'eo', 'lt', 'sl', 'hy', 'hr', 'sk', 'eu', 'et', 'ms', 'az', 'da', 'bg', 'sr', 'ro', 'el', 'th', 'bn', 'no', 'hi', 'ca', 'hu', 'ko', 'fi', 'vi', 'uz', 'sv', 'cs', 'he', 'id', 'tr', 'uk', 'nl', 'pl', 'ar', 'fa', 'it', 'zh', 'ru', 'es', 'ja', 'de', 'fr', 'en']

Users and uses[edit]

Use this model for
  • Define the revert risk of Wikipedia article revision
Don't use this model for
  • making predictions on language editions of Wikipedia that are not in the listed 47 languages or other Wiki projects (Wiktionary, Wikinews, Wikidata, etc.)
  • making predictions on the revisions that are created by bots
  • making predictions on the revisions that create a new article (the first revision of a page)
  • making predictions on a revision that is the only one for a page
  • Using a model as a stand-alone tool (without a human patroller in the loop)
Current uses

Ethical considerations, caveats, and recommendations[edit]

  • This model was developed to improve the performance of it's Language Agnostic (RRLA) version. The Multilingual version shows a better performance, especially for IP edits. However, it requires more processing power, and might be slower (or given timeouts).


The presented model is based on content features extracted using fine-tuned language model mBERT[4], mwedittypes[5] based features, along with user and page metadata. It is built in a paradigm of having one generalized model for all covered languages, which is currently the 47 most frequently edited languages in Wikipedia. The system includes the following steps:

1. Text features preparation:

  • Process wikitext and compare with parent revision
  • Extract mwedittypes-based features
  • Extract texts that were added, removed, and changed

2. Masked Language Models (MLMs) features extraction:

  • Pass each of the texts that were added, removed, or changed to the pre-trained classification model
  • Apply mean and max pooling to the list of scores of each signal to extract the final unified feature set

3. Final Classification

  • Combine all extracted features with user and revision metadata
  • Pass the features to the final classifier
System design. Inference



The presented model is a multistage solution that includes the fine-tuned masked language model (mBERT) for feature extraction and the final classifier (CatBoost) for getting the probability of being reverted based on the extracted features.

Model architecture

mBERT models tunning (four models for the title, changes, inserts, and removes):

  • Learning rate: 2e-5
  • Weight Decay: 0.01
  • Epochs: 5
  • Maximum input length: 512
  • Number of encoder attention layers: 12
  • Number of decoder attention layers: 12
  • Number of attention heads: 12
  • Length of encoder embedding: 768


  • Iterations: 5000
  • Learning Rate: 0.01
  • Loss: Logloss
Output schema
  lang: <language code string>,
  rev_id: <revision_id string>,
  score: {
     prediction: <boolean decision result>
     probability: {
        true: <probability of being reverted>,
        false: <probability of being NOT reverted>
Example input and output


GET ......


  lang: "ru",
  rev_id: 123855516,
  score: {
     prediction: true
     probability: {
        true: 0.9392203688621521,
        false: 0.0607796311378479


The model was trained on a dataset collected using the two tables from the Wikimedia Data Lake. We used the MediaWiki History table, and the Wikitext History one. Snapshot dated 2022-07 was used with the observation period from 2022-01-01 to 2022-07-01 (6 months) for training and the following week for testing. We also filtered out revisions related to edit wars and revisions created by bots.

Data Pipeline

The data was collected using Wikimedia Data Lake and Wikimedia Analytics cluster.

For each language, we collected revisions data. Then we merged the wikitext data and extracted the required features from the content using udf functions. Data collection pipeline for one language can be found in data collection script
Training data
  • Data period: 6 months
  • Number of revisions: 8,586,362
  • IP users edits rate: 0.17
  • Revert rate: 0.08
  • Random sample of up to 300,000 revisions per language
Test data
  • Data period: 1 week
  • Number of revisions: 1,079,265
  • IP users edits rate: 0.19
  • Revert rate: 0.07



Cite this model as: ... to be added soon.


  2. Zia, Leila and Johnson, Isaac and Mansurov, Bahodir and Morgan, Jonathan and Redi, Miriam and Saez-Trumper, Diego and Taraborelli, Dario. 2019. Knowledge Integrity.