Objective Revision Evaluation Service/goodfaith

From Meta, a Wikimedia project coordination wiki

One of the most critical concerns about Wikimedia's open projects is the detection and removal of damaging contributions. This model was trained on human judgement[1] for whether or not an edit was probably made in good-faith. It is useful for directing newcomer socialization efforts (e.g. en:User:HostBot) and detecting vandals & spammers.

This model is trained to predict good-faith edits. Note that, due to limitations in the field of natural language processing sarcasm and other types of cleverness in vandalism are likely to fool the model. Keep this in mind when consuming scores.

Contexts (wikis)[edit]

English Wikipedia (enwiki)[edit]

https://ores.wmflabs.org/v2/scores/enwiki/goodfaith/?model_info

ScikitLearnClassifier
 - type: GradientBoosting
 - params: subsample=1.0, max_features="log2", loss="deviance", learning_rate=0.01, center=true, verbose=0, warm_start=false, presort="auto", max_depth=7, scale=true, min_weight_fraction_leaf=0.0, balanced_sample_weight=true, random_state=null, init=null, n_estimators=700, min_samples_leaf=1, max_leaf_nodes=null, min_samples_split=2, balanced_sample=false
 - version: 0.3.0
 - trained: 2017-01-06T19:35:15.426659

Table:
                 ~False    ~True
        -----  --------  -------
        False       428      212
        True       1699    17194

Accuracy: 0.902
Precision:
        -----  -----
        False  0.201
        True   0.988
        -----  -----

Recall:
        -----  -----
        False  0.667
        True   0.91
        -----  -----

PR-AUC:
        -----  -----
        False  0.383
        True   0.993
        -----  -----

ROC-AUC:
        -----  -----
        False  0.907
        True   0.905
        -----  -----

Recall @ 0.1 false-positive rate:
        label      threshold    recall    fpr
        -------  -----------  --------  -----
        False          0.475     0.688  0.098
        True           0.88      0.704  0.097

Recall @ 0.98 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False           0.96     0.046        1
        True            0.24     0.977        0.981

Recall @ 0.9 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.957     0.053        0.99
        True           0.038     1            0.968

Recall @ 0.45 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.808     0.364        0.481
        True           0.038     1            0.968

Recall @ 0.15 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.322     0.777        0.155
        True           0.038     1            0.968

Persian Wikipedia (fawiki)[edit]

https://ores.wmflabs.org/v2/scores/fawiki/goodfaith/?model_info

ScikitLearnClassifier
 - type: GradientBoosting
 - params: presort="auto", max_features="log2", scale=true, max_leaf_nodes=null, init=null, verbose=0, random_state=null, learning_rate=0.01, balanced_sample_weight=true, n_estimators=700, balanced_sample=false, subsample=1.0, warm_start=false, min_samples_leaf=1, max_depth=7, min_weight_fraction_leaf=0.0, loss="deviance", center=true, min_samples_split=2
 - version: 0.3.0
 - trained: 2017-01-06T20:21:04.924687

Table:
                 ~False    ~True
        -----  --------  -------
        False        87       77
        True        472    19168

Accuracy: 0.972
Precision:
        -----  -----
        False  0.158
        True   0.996
        -----  -----

Recall:
        -----  -----
        False  0.532
        True   0.976
        -----  -----

PR-AUC:
        -----  -----
        False  0.211
        True   0.995
        -----  -----

ROC-AUC:
        -----  -----
        False  0.974
        True   0.964
        -----  -----

Recall @ 0.1 false-positive rate:
        label      threshold    recall    fpr
        -------  -----------  --------  -----
        False           0.09     0.939  0.077
        True            0.89     0.922  0.079

Recall @ 0.98 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.953     0.102        1
        True           0.051     1            0.992

Recall @ 0.9 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.953     0.102        1
        True           0.051     1            0.992

Recall @ 0.45 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.936     0.095        0.633
        True           0.051     1            0.992

Recall @ 0.15 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.439     0.652        0.162
        True           0.051     1            0.992

Dutch Wikipedia (nlwiki)[edit]

https://ores.wmflabs.org/v2/scores/nlwiki/goodfaith/?model_info

ScikitLearnClassifier
 - type: GradientBoosting
 - params: loss="deviance", max_features="log2", center=true, warm_start=false, subsample=1.0, scale=true, random_state=null, presort="auto", max_depth=5, min_samples_leaf=1, balanced_sample=false, n_estimators=700, min_samples_split=2, learning_rate=0.01, max_leaf_nodes=null, init=null, min_weight_fraction_leaf=0.0, balanced_sample_weight=true, verbose=0
 - version: 0.3.0
 - trained: 2017-01-06T21:54:13.608947

Table:
                 ~False    ~True
        -----  --------  -------
        False       601       70
        True       1500    17293

Accuracy: 0.919
Precision:
        -----  -----
        False  0.286
        True   0.996
        -----  -----

Recall:
        -----  -----
        False  0.896
        True   0.92
        -----  -----

PR-AUC:
        -----  -----
        False  0.677
        True   0.995
        -----  -----

ROC-AUC:
        -----  -----
        False  0.971
        True   0.971
        -----  -----

Recall @ 0.1 false-positive rate:
        label      threshold    recall    fpr
        -------  -----------  --------  -----
        False          0.361     0.935  0.094
        True           0.5       0.922  0.094

Recall @ 0.98 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.967     0.198        1
        True           0.072     0.996        0.981

Recall @ 0.9 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.954     0.302        0.92
        True           0.024     1            0.969

Recall @ 0.45 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.803     0.756        0.466
        True           0.024     1            0.969

Recall @ 0.15 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.091     0.975        0.171
        True           0.024     1            0.969

Polish Wikipedia (plwiki)[edit]

https://ores.wmflabs.org/v2/scores/plwiki/goodfaith/?model_info

ScikitLearnClassifier
 - type: RF
 - params: center=true, n_estimators=320, max_depth=null, balanced_sample_weight=true, min_samples_split=2, min_samples_leaf=1, verbose=0, min_weight_fraction_leaf=0.0, criterion="entropy", oob_score=false, n_jobs=1, class_weight=null, max_leaf_nodes=null, random_state=null, scale=true, max_features="log2", balanced_sample=false, bootstrap=true, warm_start=false
 - version: 0.3.0
 - trained: 2017-01-06T22:30:20.768873

Table:
                 ~False    ~True
        -----  --------  -------
        False       527       67
        True          4    11998

Accuracy: 0.994
Precision:
        -----  -----
        False  0.991
        True   0.994
        -----  -----

Recall:
        -----  -----
        False  0.888
        True   1
        -----  -----

PR-AUC:
        -----  -----
        False  0.953
        True   0.995
        -----  -----

ROC-AUC:
        -----  -----
        False  0.985
        True   0.989
        -----  -----

Recall @ 0.1 false-positive rate:
        label      threshold    recall    fpr
        -------  -----------  --------  -----
        False          0.047     0.974  0.062
        True           0.675     0.995  0.086

Recall @ 0.98 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.4       0.918        0.991
        True           0.293     1            0.989

Recall @ 0.9 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.252     0.923        0.944
        True           0.133     1            0.974

Recall @ 0.45 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.062     0.962        0.595
        True           0.133     1            0.974

Recall @ 0.15 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.013     0.988        0.199
        True           0.133     1            0.974

Portuguese Wikipedia (ptwiki)[edit]

https://ores.wmflabs.org/v2/scores/ptwiki/goodfaith/?model_info

ScikitLearnClassifier
 - type: GradientBoosting
 - params: scale=true, balanced_sample_weight=true, learning_rate=0.01, min_weight_fraction_leaf=0.0, max_depth=7, center=true, random_state=null, max_leaf_nodes=null, init=null, presort="auto", warm_start=false, min_samples_leaf=1, subsample=1.0, min_samples_split=2, verbose=0, loss="deviance", balanced_sample=false, n_estimators=700, max_features="log2"
 - version: 0.3.0
 - trained: 2017-01-06T22:48:01.162565

Table:
                 ~False    ~True
        -----  --------  -------
        False       935      258
        True       2173    16447

Accuracy: 0.877
Precision:
        -----  -----
        False  0.301
        True   0.985
        -----  -----

Recall:
        -----  -----
        False  0.784
        True   0.883
        -----  -----

PR-AUC:
        -----  -----
        False  0.522
        True   0.992
        -----  -----

ROC-AUC:
        -----  -----
        False  0.937
        True   0.932
        -----  -----

Recall @ 0.1 false-positive rate:
        label      threshold    recall    fpr
        -------  -----------  --------  -----
        False          0.554     0.744  0.099
        True           0.729     0.807  0.096

Recall @ 0.98 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.959     0.053         1
        True           0.396     0.916         0.98

Recall @ 0.9 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.95      0.091        0.969
        True           0.034     1            0.941

Recall @ 0.45 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.749     0.577        0.457
        True           0.034     1            0.941

Recall @ 0.15 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.065     0.976        0.157
        True           0.034     1            0.941

Turkish Wikipedia (trwiki)[edit]

https://ores.wmflabs.org/v2/scores/trwiki/goodfaith/?model_info

ScikitLearnClassifier
 - type: GradientBoosting
 - params: n_estimators=700, min_samples_leaf=1, scale=true, center=true, learning_rate=0.01, init=null, subsample=1.0, max_depth=7, min_weight_fraction_leaf=0.0, balanced_sample_weight=true, max_features="log2", warm_start=false, max_leaf_nodes=null, loss="deviance", balanced_sample=false, random_state=null, verbose=0, presort="auto", min_samples_split=2
 - version: 0.3.0
 - trained: 2017-01-06T23:29:38.432498

Table:
                 ~False    ~True
        -----  --------  -------
        False       714      191
        True       2678    16148

Accuracy: 0.855
Precision:
        -----  -----
        False  0.21
        True   0.988
        -----  -----

Recall:
        -----  -----
        False  0.787
        True   0.858
        -----  -----

PR-AUC:
        -----  -----
        False  0.292
        True   0.992
        -----  -----

ROC-AUC:
        -----  -----
        False  0.914
        True   0.908
        -----  -----

Recall @ 0.1 false-positive rate:
        label      threshold    recall    fpr
        -------  -----------  --------  -----
        False          0.659     0.656  0.099
        True           0.764     0.794  0.095

Recall @ 0.98 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.923     0.021         1
        True           0.315     0.91          0.98

Recall @ 0.9 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.923     0.021        1
        True           0.08      1            0.955

Recall @ 0.45 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.883     0.111        0.491
        True           0.08      1            0.955

Recall @ 0.15 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.128     0.936        0.156
        True           0.08      1            0.955

Wikidata (wikidatawiki)[edit]

https://ores.wmflabs.org/v2/scores/wikidatawiki/goodfaith/?model_info

ScikitLearnClassifier
 - type: GradientBoosting
 - params: balanced_sample=false, center=true, verbose=0, presort="auto", scale=true, init=null, subsample=1.0, random_state=null, min_samples_leaf=1, max_depth=5, loss="deviance", min_weight_fraction_leaf=0.0, max_features="log2", learning_rate=0.1, n_estimators=300, warm_start=false, min_samples_split=2, max_leaf_nodes=null, balanced_sample_weight=true
 - version: 0.3.0
 - trained: 2017-01-07T00:57:21.651623

Table:
                 ~False    ~True
        -----  --------  -------
        False      2091      155
        True       1009    21177

Accuracy: 0.952
Precision:
        -----  -----
        False  0.675
        True   0.993
        -----  -----

Recall:
        -----  -----
        False  0.931
        True   0.955
        -----  -----

PR-AUC:
        -----  -----
        False  0.792
        True   0.994
        -----  -----

ROC-AUC:
        -----  -----
        False  0.987
        True   0.979
        -----  -----

Recall @ 0.1 false-positive rate:
        label      threshold    recall    fpr
        -------  -----------  --------  -----
        False          0.093     0.986  0.096
        True           0.277     0.965  0.096

Recall @ 0.98 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.993     0.034         1
        True           0.077     0.974         0.98

Recall @ 0.9 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.99      0.087        0.934
        True           0.006     1            0.909

Recall @ 0.45 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.054     0.992        0.471
        True           0.006     1            0.909

Recall @ 0.15 precision:
        label      threshold    recall    precision
        -------  -----------  --------  -----------
        False          0.006         1        0.245
        True           0.006         1        0.909

References[edit]

  1. See en:Wikipedia:Labels/Edit quality for the English Wikipedia manual labeling campaign