Jump to content

Machine learning models/Production/RevertRisk Wikidata

From Meta, a Wikimedia project coordination wiki
Model card
This page is an on-wiki machine learning model card.
A diagram of a neural network
A model card is a document about a machine learning model that seeks to answer basic questions about the model.
Model Information Hub
Model creator(s)Mykola Trokhymovych, Kevin Bazira, and Diego Saez-Trumper
Model owner(s)Diego Saez-Trumper
Model interfaceWikimedia API Portal
PublicationsACL 2025, Industry track paper
Codetraining and inference
Uses PIINo
In production?Yes
This model uses Wikidata revision content and metadata to predict the risk of being reverted.


How can we help Wikidata editors to identify revisions that need to be “patrolled”?

The goal of this model is to detect Wikidata revisions that might be reverted independently, regardless of whether they were made in good faith or with the intention of creating damage. Wikidata has a group of dedicated volunteer editors, known as patrollers, who work to ensure the accuracy and integrity of the information on the site. These patrollers review and edit pages, monitor for vandalism, and enforce community guidelines. However, their work is not easy, as they must keep up with the fast pace and language diversity of Wikidata, where, on average, around 10 pages are edited per second, with content that may include over 300 languages.[1] The aim of this model is to help patrollers quickly identify potential problems, prioritize the work, and revert damaging edits when needed.

This model is deployed on LiftWing. This model can be used to detect revisions that might need to be reverted.

Motivation

[edit]

Knowledge Integrity is one of the strategic programs of Wikimedia Research, aimed at identifying and addressing threats to knowledge across Wikimedia projects, strengthening the capabilities of patrollers, and providing mechanisms to assess the reliability of sources.[2] Wikimedia Foundation’s AI strategy highlights AI-assisted workflows for moderators and patrollers as one of its primary areas of focus. The main goal of this project is to create a new generation of patrolling models for Wikidata, improving accuracy, fairness, and maintainability compared to the previous system in production, ORES.[3]

The current model supports analysis of almost any Wikidata revision, examining changes to both the structured content and the textual elements of the page (e.g., labels, descriptions, etc.) across multiple languages.

Users and uses

[edit]
Use this model for
  • Define the revert risk of the Wikidata item page revision
Don't use this model for
  • Making predictions on other Wiki projects (Wikipedia, Wiktionary, Wikinews, etc.).
  • Making predictions on the revisions that are created by bots.
  • Making predictions on the revisions that involve an editor undoing their own previous edits (a.k.a., self-reversion).
  • Making predictions on the revisions that create a new Wikidata item (the first revision of a page).
  • As with any AI/ML model, we recommend keeping humans in the loop and not considering the model's predictions as training data for other ML models.
Current uses

Ethical considerations, caveats, and recommendations

[edit]
  • This model relies on Multilingual Bert, a Large Language Model, that might contain certain biases.
  • Multilingual Bert originally supports approximately 100 languages, selected based on the top 100 languages with the largest Wikipedias. While fine-tuning is not limited to these languages, performance on languages not included in the base model may be lower due to limited pretraining data.
  • The data preparation process can be improved by expanding parsing coverage, such as including changes in qualifiers or rankings. Also, using labels in non-English languages for mapping Wikidata IDs to text may enhance model performance by increasing coverage and diversifying the data (approximately 9% of IDs that lack corresponding labels).



Model

[edit]
Wikidata vandalism detection system schema
Wikidata vandalism detection system schema
Text processing schema
Text processing schema

The tool DeepDiff is used to parse revision content changes as the initial step in feature preparation. The model relies on three types of features: revision metadata, triple modifications, and textual modifications.

Wikidata triples consist of three components: an entity, a property, and a value. The entity and property are represented by their Wikidata IDs. The value may also be an ID, but it can alternatively be free text, a date, a number, or other data types. To process triple changes alongside textual modifications, triples are converted into text by mapping their IDs to the corresponding English labels. On the other hand, textual modifications refer to changes in elements such as entity labels, descriptions, or aliases.

Both textual and triple modifications are processed using a fine-tuned multilingual language model, mBERT.[4] The system follows a paradigm of using a single, generalized multilingual model for all content changes and change types (e.g., insert, remove, change). To enable the model to handle different change types within a unified architecture, a corresponding prefix is prepended to each input sequence, following the “text-to-text” approach used in the T5 model.[5]

For the final classification step, a CatBoost classifier is used. This model is trained on both the revision metadata and the aggregated outputs of mBERT, and it produces a probability score indicating the likelihood that a revision will be reverted.

In summary, the system includes the following steps:

1. Content features preparation:

  • Process wikitext and compare with parent revision.
  • Extract texts and triples that were added, removed, and changed.
  • Convert Triples into text by mapping Wikidata IDs to English labels.

2. Masked Language Models (MLMs) features extraction:

  • Pass each of the texts that were added, removed, or changed to the pre-trained classification model.
  • Apply mean and max pooling to the list of scores of each signal to extract the final unified feature set.

3. Final Classification:

  • Combine all extracted features with revision metadata
  • Pass the features to the final classifier

Performance

[edit]

Area under the ROC curve (AUC) is used as the primary metric for model evaluation. In addition, the Filter Rate at a given recall level (FR@) is used to measure the proportion of edits that can be removed from the review backlog while still retaining a specified percentage of all vandalism in the remaining revisions. The model is evaluated against the previous production model (ORES) on both the hold-out testing dataset and an expert-labeled dataset. The expert-labeled dataset consists of approximately 1,000 revisions sampled from the hold-out set, drawn from ten score-based bins constructed separately using ORES and presented model outputs.

System performance on expert-labeled data
Model AUC CI FR@99 FR90 FR70
ORES 0.885 [0.879, 0.892] 0.593 0.799 0.881
Presented model 0.932 [0.926, 0.937] 0.698 0.846 0.918


Implementation

[edit]

The presented model is a multistage solution that includes the fine-tuned masked language model (mBERT) for feature extraction and the final classifier (CatBoost) for getting the probability of being reverted based on the extracted features.

Model architecture

mBERT model tunning:

  • Learning rate: 2e-5
  • Weight Decay: 0.01
  • Epochs: 5
  • Maximum input length: 512
  • Number of encoder attention layers: 12
  • Number of decoder attention layers: 12
  • Number of attention heads: 12
  • Length of encoder embedding: 768

CatBoost:

  • Iterations: 2500
  • Learning Rate: 0.005
  • Loss: Logloss
Output schema
{
  "model_name": "revertrisk-wikidata",
  "model_version": <model version>,
  "revision_id": <revision_id string>,
  "output": {
    "prediction": <boolean decision result>,
    "probabilities": {
      "true": <probability of being reverted>,
      "false": <probability of being NOT reverted>
    }
  }
}
Example input and output

Input

$ curl https://api.wikimedia.org/service/lw/inference/v1/models/revertrisk-wikidata:predict -X POST -d '{"rev_id": 1945516043}' -H "Content-type: application/json"

Output

{
  "model_name": "revertrisk-wikidata",
  "model_version": "2",
  "revision_id": 1945516043,
  "output": {
    "prediction": false,
    "probabilities": {
      "true": 0.2718899239377954,
      "false": 0.7281100760622046
    }
  }
}

Data

[edit]

The model was trained on a dataset collected using the multiple tables from the Wikimedia Data Lake. We used the MediaWiki History, Wikitext History, Wikidata item page link, Wikidata entity tables. Snapshots dated 2024-04 were used, with the observation period spanning from September 1, 2021, to September 1, 2023 (a 2-year period) for training and testing. The last three months were used as a hold-out validation dataset. We also filtered out revisions related to edit wars, self-reverts, and revisions created by bots. To ensure that the revisions are human-created, we also filter for revisions tagged with ‘wikidata-ui’. Full dataset is available in Zenodo.

Data pipeline

The data was collected using Wikimedia Data Lake and Wikimedia Analytics cluster. Firstly, we collected revisions data. Then we merged the wikitext data and extracted the required features from the content using udf functions. Data collection pipeline can be found in data collection notebook

Training data
  • Data period: 21 months
  • Balancing strategy: retaining all reverted revisions and supplementing them with a random sample of unreverted revisions at a ratio of 1:5
  • 4,197,231 revision
  • 80% used for mBERT fine tuning, and 20% for final classifier
  • IP users edits rate: 10.7%
  • Revert rate: 7.9%
  • There are about 200 languages represented with at least 100 revisions.
Test data
  • Data period: 3 months
  • 645,264 revision
  • IP users edits rate: 8.3%
  • Revert rate: 6.2%

Licenses

[edit]

Citation

[edit]

Cite this model as:

@inproceedings{trokhymovych-etal-2025-graph,
    title = "Graph-Linguistic Fusion: Using Language Models for {W}ikidata Vandalism Detection",
    author = "Trokhymovych, Mykola  and
      Pintscher, Lydia  and
      Baeza-Yates, Ricardo  and
      S{\'a}ez Trumper, Diego",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-industry.21/",
    doi = "10.18653/v1/2025.acl-industry.21",
    pages = "284--294"
}

References

[edit]
  1. https://stats.wikimedia.org/
  2. Zia, Leila and Johnson, Isaac and Mansurov, Bahodir and Morgan, Jonathan and Redi, Miriam and Saez-Trumper, Diego and Taraborelli, Dario. 2019. Knowledge Integrity. https://doi.org/10.6084/m9.figshare.7704626
  3. https://www.mediawiki.org/wiki/ORES
  4. https://huggingface.co/bert-base-multilingual-cased
  5. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR, arXiv:1910.10683.