Machine learning models/Production/Hindi Wikipedia damaging edit

From Meta, a Wikimedia project coordination wiki


Model card
This page is an on-wiki machine learning model card.
A diagram of a neural network
A model card is a document about a machine learning model that seeks to answer basic questions about the model.
Model Information Hub
Model creator(s)Aaron Halfaker (User:EpochFail) and Amir Sarabadani
Model owner(s)WMF Machine Learning Team (ml@wikimediafoundation.org)
Model interfaceOres homepage
CodeORES Github, ORES training data, and ORES model binaries
Uses PIINo
In production?Yes
Which projects?Hindi Wikipedia
This model uses data about a revision to predict the likelihood that the revision is damaging.


Motivation[edit]

Some goodfaith edits are damaging to an article, and not all damaging edits are in bad faith. This model (together with a goodfaith model) is intended to differentiate between edits that are intentionally harmful (badfaith/vandalism) and edits that are intended to be harmful (good edits/goodfaith damage).

This model helps to prioritize review of potentially damaging edits or vandalism. It provides a prediction on whether or not a given revision is damaging, and provides some probabilities to serve as a measure of its confidence level.

Users and uses[edit]

Use this model for
  • This model should be used for prioritizing the review and potential reversion of vandalism on Hindi Wikipedia.
  • This model should be used for detecting damaging contributions by editors on Hindi Wikipedia.
Don't use this model for
  • This model should not be used as an ultimate arbiter of whether or not an edit ought to be considered damaging.
  • The model should not be used outside of Hindi Wikipedia.
Current uses
  • Hindi Wikipedia uses the model as a service for facilitating efficient vandalism triage, edit reviews, or newcomer support.
  • On an individual basis, anyone can submit a properly-formatted API call to ORES for a given revision and get back the result of this model.
Example API call:
https://ores.wikimedia.org/v3/scores/hiwiki/5794331/damaging

Ethical considerations, caveats, and recommendations[edit]

Hindi Wikipedia decided to use this model. Over time, the model has been validated through use in the community.

This model is known to give newer editors higher probability of damaging edits.

Internal or external changes that could make this model deprecated or no longer usable are:

  • Data drift means training data for the model is no longer usable.
  • Doesn't meet desired performance metrics in production.
  • Hindi Wikipedia community decides to not use this model anymore.

Model[edit]

Performance[edit]

Test data confusion matrix:

Label n ~True ~False
True 1178 730 448
False 8673 504 8169

Test data sample rates:

Rate Sample Population
sample 0.12 0.88
population 0.122 0.878

Test data performance:

Statistic True False
match_rate 0.126 0.874
filter_rate 0.874 0.126
recall 0.62 0.942
precision 0.596 0.947
f1 0.608 0.944
accuracy 0.903 0.903
fpr 0.058 0.38
roc_auc 0.929 0.938
pr_auc 0.637 0.988

Implementation[edit]

Model architecture
{
    "type": "GradientBoosting",
    "params": {
        "min_impurity_decrease": 0.0,
        "warm_start": false,
        "min_weight_fraction_leaf": 0.0,
        "scale": true,
        "n_iter_no_change": null,
        "n_estimators": 700,
        "presort": "deprecated",
        "max_depth": 7,
        "tol": 0.0001,
        "validation_fraction": 0.1,
        "random_state": null,
        "min_impurity_split": null,
        "multilabel": false,
        "ccp_alpha": 0.0,
        "labels": [
            true,
            false
        ],
        "subsample": 1.0,
        "criterion": "friedman_mse",
        "min_samples_split": 2,
        "max_leaf_nodes": null,
        "loss": "deviance",
        "learning_rate": 0.1,
        "verbose": 0,
        "population_rates": null,
        "min_samples_leaf": 1,
        "max_features": "log2",
        "init": null,
        "center": true
    }
}
Output schema
{
    "properties": {
        "probability": {
            "description": "A mapping of probabilities onto each of the potential output labels",
            "properties": {
                "true": {
                    "type": "number"
                },
                "false": {
                    "type": "number"
                }
            },
            "type": "object"
        },
        "prediction": {
            "description": "The most likely label predicted by the estimator",
            "type": "boolean"
        }
    },
    "type": "object",
    "title": "Scikit learn-based classifier score with probability"
}
Example input and output
Input:
https://ores.wikimedia.org/v3/scores/hiwiki/5794331/damaging

Output:

{
    "hiwiki": {
        "models": {
            "damaging": {
                "version": "0.5.0"
            }
        },
        "scores": {
            "5794331": {
                "damaging": {
                    "score": {
                        "prediction": false,
                        "probability": {
                            "false": 0.9428529318295721,
                            "true": 0.057147068170427896
                        }
                    }
                }
            }
        }
    }
}

Data[edit]

Data pipeline
Tabular data about edits is collected from the Mediawiki API, preprocessed (via log-transformations, joining with public editor data, etc.), and joined with user-generated goodfaith/damaging labels.
Training data
This model was trained using hand-labeled training data that is several years old.
Test data
The statistics reported here were calculated by selecting a random partition of the training data to hold out from the training process. The model then makes a prediction on that data, which is compared to the underlying ground truth.

Licenses[edit]

Citation[edit]

Cite this model card as:

@misc{
  Triedman_Bazira_2023_Hindi_Wikipedia_damaging,
  title={ Hindi Wikipedia damaging model card },
  author={ Triedman, Harold and Bazira, Kevin },
  year={ 2023 },
  url={ https://meta.wikimedia.org/wiki/Machine_learning_models/Production/Hindi_Wikipedia_damaging_edit }
}