Objective Revision Evaluation Service/damaging
This model was trained on human judgement[1] for whether or not an edit is damaging. It is useful for quality control tools (e.g. en:WP:Huggle and en:User:ClueBot NG)
This model is trained to predict damaging edits. Not all damaging edits are bad-faith. Consume scores with this in mind and see ORES/goodfaith for a model which predicts if an edit was made in good-faith.
Contexts (wikis)
[edit]These stats are slightly out of date. For most up to date stats, follow the link to "?model_info" linked for each model. |
English Wikipedia (enwiki)
[edit]https://ores.wmflabs.org/v2/scores/enwiki/damaging/?model_info
ScikitLearnClassifier
- type: GradientBoosting
- params: loss="deviance", balanced_sample=false, subsample=1.0, warm_start=false, learning_rate=0.01, n_estimators=700, max_features="log2", balanced_sample_weight=true, presort="auto", init=null, min_samples_split=2, center=true, min_weight_fraction_leaf=0.0, scale=true, max_leaf_nodes=null, max_depth=7, min_samples_leaf=1, random_state=null, verbose=0
- version: 0.3.0
- trained: 2017-01-06T19:29:12.793824
Table:
~False ~True
----- -------- -------
False 17203 1664
True 211 455
Accuracy: 0.904
Precision:
----- -----
False 0.988
True 0.215
----- -----
Recall:
----- -----
False 0.912
True 0.683
----- -----
PR-AUC:
----- -----
False 0.993
True 0.391
----- -----
ROC-AUC:
----- -----
False 0.918
True 0.919
----- -----
Recall @ 0.1 false-positive rate:
label threshold recall fpr
------- ----------- -------- -----
False 0.862 0.76 0.093
True 0.469 0.713 0.098
Recall @ 0.98 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.259 0.972 0.98
True 0.955 0.068 1
Recall @ 0.9 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.04 1 0.967
True 0.945 0.091 0.98
Recall @ 0.45 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.04 1 0.967
True 0.831 0.326 0.468
Recall @ 0.15 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.04 1 0.967
True 0.281 0.828 0.154
Persian Wikipedia (fawiki)
[edit]https://ores.wmflabs.org/v2/scores/fawiki/damaging/?model_info
ScikitLearnClassifier
- type: GradientBoosting
- params: min_samples_leaf=1, warm_start=false, verbose=0, min_weight_fraction_leaf=0.0, max_leaf_nodes=null, min_samples_split=2, scale=true, max_depth=7, center=true, max_features="log2", balanced_sample=false, init=null, loss="deviance", n_estimators=700, random_state=null, balanced_sample_weight=true, subsample=1.0, presort="auto", learning_rate=0.01
- version: 0.3.0
- trained: 2017-01-06T20:15:09.126168
Table:
~False ~True
----- -------- -------
False 18876 685
True 98 145
Accuracy: 0.96
Precision:
----- -----
False 0.995
True 0.178
----- -----
Recall:
----- -----
False 0.965
True 0.607
----- -----
PR-AUC:
----- -----
False 0.995
True 0.228
----- -----
ROC-AUC:
----- -----
False 0.959
True 0.968
----- -----
Recall @ 0.1 false-positive rate:
label threshold recall fpr
------- ----------- -------- -----
False 0.866 0.913 0.09
True 0.092 0.927 0.09
Recall @ 0.98 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.042 1 0.988
True 0.963 0.043 1
Recall @ 0.9 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.042 1 0.988
True 0.963 0.043 1
Recall @ 0.45 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.042 1 0.988
True 0.922 0.143 0.592
Recall @ 0.15 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.042 1 0.988
True 0.313 0.797 0.156
Dutch Wikipedia (nlwiki)
[edit]https://ores.wmflabs.org/v2/scores/nlwiki/damaging/?model_info
ScikitLearnClassifier
- type: GradientBoosting
- params: warm_start=false, learning_rate=0.01, n_estimators=700, loss="deviance", verbose=0, max_leaf_nodes=null, min_weight_fraction_leaf=0.0, init=null, presort="auto", scale=true, min_samples_split=2, max_depth=5, center=true, balanced_sample=false, random_state=null, max_features="log2", balanced_sample_weight=true, min_samples_leaf=1, subsample=1.0
- version: 0.3.0
- trained: 2017-01-06T21:49:27.818262
Table:
~False ~True
----- -------- -------
False 16763 1714
True 105 882
Accuracy: 0.907
Precision:
----- -----
False 0.994
True 0.339
----- -----
Recall:
----- -----
False 0.907
True 0.895
----- -----
PR-AUC:
----- -----
False 0.994
True 0.646
----- -----
ROC-AUC:
----- -----
False 0.96
True 0.963
----- -----
Recall @ 0.1 false-positive rate:
label threshold recall fpr
------- ----------- -------- -----
False 0.528 0.905 0.096
True 0.446 0.912 0.098
Recall @ 0.98 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.152 0.969 0.98
True 0.966 0.109 1
Recall @ 0.9 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.03 1 0.952
True 0.952 0.211 0.936
Recall @ 0.45 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.03 1 0.952
True 0.8 0.739 0.455
Recall @ 0.15 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.03 1 0.952
True 0.1 0.977 0.161
Polish Wikipedia (plwiki)
[edit]https://ores.wmflabs.org/v2/scores/plwiki/damaging/?model_info
ScikitLearnClassifier
- type: GradientBoosting
- params: warm_start=false, max_depth=7, presort="auto", n_estimators=700, random_state=null, min_samples_leaf=1, min_weight_fraction_leaf=0.0, min_samples_split=2, init=null, max_features="log2", learning_rate=0.01, scale=true, verbose=0, max_leaf_nodes=null, loss="deviance", balanced_sample_weight=true, subsample=1.0, center=true, balanced_sample=false
- version: 0.3.0
- trained: 2017-01-06T22:25:42.924606
Table:
~False ~True
----- -------- -------
False 11527 337
True 56 676
Accuracy: 0.969
Precision:
----- -----
False 0.995
True 0.667
----- -----
Recall:
----- -----
False 0.972
True 0.923
----- -----
PR-AUC:
----- -----
False 0.995
True 0.924
----- -----
ROC-AUC:
----- -----
False 0.983
True 0.977
----- -----
Recall @ 0.1 false-positive rate:
label threshold recall fpr
------- ----------- -------- -----
False 0.366 0.987 0.093
True 0.304 0.95 0.086
Recall @ 0.98 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.124 0.999 0.981
True 0.857 0.719 0.99
Recall @ 0.9 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.052 1 0.956
True 0.736 0.859 0.911
Recall @ 0.45 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.052 1 0.956
True 0.371 0.942 0.494
Recall @ 0.15 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.052 1 0.956
True 0.097 0.979 0.198
Portuguese Wikipedia (ptwiki)
[edit]https://ores.wmflabs.org/v2/scores/ptwiki/damaging/?model_info
ScikitLearnClassifier
- type: GradientBoosting
- params: center=true, min_samples_split=2, balanced_sample_weight=true, scale=true, min_samples_leaf=1, warm_start=false, n_estimators=700, subsample=1.0, init=null, learning_rate=0.01, random_state=null, verbose=0, max_features="log2", presort="auto", loss="deviance", max_leaf_nodes=null, max_depth=7, min_weight_fraction_leaf=0.0, balanced_sample=false
- version: 0.3.0
- trained: 2017-01-06T22:41:56.417405
Table:
~False ~True
----- -------- -------
False 16008 2439
True 276 1090
Accuracy: 0.863
Precision:
----- -----
False 0.983
True 0.309
----- -----
Recall:
----- -----
False 0.868
True 0.798
----- -----
PR-AUC:
----- -----
False 0.991
True 0.52
----- -----
ROC-AUC:
----- -----
False 0.926
True 0.931
----- -----
Recall @ 0.1 false-positive rate:
label threshold recall fpr
------- ----------- -------- -----
False 0.711 0.79 0.097
True 0.591 0.727 0.098
Recall @ 0.98 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.439 0.891 0.98
True 0.957 0.039 1
Recall @ 0.9 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.033 1 0.932
True 0.952 0.056 0.946
Recall @ 0.45 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.033 1 0.932
True 0.734 0.578 0.453
Recall @ 0.15 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.033 1 0.932
True 0.048 0.991 0.154
Russian Wikipedia (ruwiki)
[edit]https://ores.wmflabs.org/v2/scores/ruwiki/damaging/?model_info
ScikitLearnClassifier
- type: GradientBoosting
- params: balanced_sample=false, warm_start=false, subsample=1.0, n_estimators=700, max_depth=5, scale=true, learning_rate=0.01, random_state=null, min_samples_leaf=1, init=null, balanced_sample_weight=true, presort="auto", max_features="log2", loss="deviance", max_leaf_nodes=null, min_weight_fraction_leaf=0.0, center=true, min_samples_split=2, verbose=0
- version: 0.3.0
- trained: 2017-01-06T22:57:34.396636
Table:
~False ~True
----- -------- -------
False 15856 2825
True 115 939
Accuracy: 0.851
Precision:
----- -----
False 0.993
True 0.25
----- -----
Recall:
----- -----
False 0.849
True 0.89
----- -----
PR-AUC:
----- -----
False 0.992
True 0.436
----- -----
ROC-AUC:
----- -----
False 0.933
True 0.94
----- -----
Recall @ 0.1 false-positive rate:
label threshold recall fpr
------- ----------- -------- -----
False 0.542 0.843 0.094
True 0.737 0.728 0.099
Recall @ 0.98 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.232 0.916 0.98
True 0.951 0.031 1
Recall @ 0.9 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.044 1 0.947
True 0.949 0.043 0.985
Recall @ 0.45 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.044 1 0.947
True 0.855 0.406 0.456
Recall @ 0.15 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.044 1 0.947
True 0.088 0.995 0.166
Turkish Wikipedia (trwiki)
[edit]https://ores.wmflabs.org/v2/scores/trwiki/damaging/?model_info
ScikitLearnClassifier
- type: GradientBoosting
- params: balanced_sample_weight=true, n_estimators=700, learning_rate=0.01, subsample=1.0, warm_start=false, verbose=0, loss="deviance", max_leaf_nodes=null, max_features="log2", min_weight_fraction_leaf=0.0, min_samples_split=2, presort="auto", init=null, min_samples_leaf=1, center=true, balanced_sample=false, scale=true, max_depth=7, random_state=null
- version: 0.3.0
- trained: 2017-01-06T23:24:25.903527
Table:
~False ~True
----- -------- -------
False 16069 2688
True 216 758
Accuracy: 0.853
Precision:
----- -----
False 0.987
True 0.22
----- -----
Recall:
----- -----
False 0.857
True 0.777
----- -----
PR-AUC:
----- -----
False 0.992
True 0.309
----- -----
ROC-AUC:
----- -----
False 0.909
True 0.914
----- -----
Recall @ 0.1 false-positive rate:
label threshold recall fpr
------- ----------- -------- -----
False 0.758 0.798 0.096
True 0.651 0.664 0.099
Recall @ 0.98 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.337 0.904 0.98
True 0.928 0.021 1
Recall @ 0.9 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.067 1 0.951
True 0.928 0.021 1
Recall @ 0.45 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.067 1 0.951
True 0.872 0.168 0.459
Recall @ 0.15 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.067 1 0.951
True 0.104 0.944 0.153
Wikidata (wikidatawiki)
[edit]https://ores.wmflabs.org/v2/scores/wikidatawiki/damaging/?model_info
ScikitLearnClassifier
- type: GradientBoosting
- params: balanced_sample=false, min_samples_split=2, verbose=0, warm_start=false, center=true, random_state=null, presort="auto", scale=true, balanced_sample_weight=true, min_weight_fraction_leaf=0.0, init=null, max_depth=7, max_leaf_nodes=null, min_samples_leaf=1, max_features="log2", n_estimators=700, loss="deviance", learning_rate=0.01, subsample=1.0
- version: 0.3.0
- trained: 2017-01-07T00:52:58.800624
Table:
~False ~True
----- -------- -------
False 21094 690
True 150 2498
Accuracy: 0.966
Precision:
----- -----
False 0.993
True 0.784
----- -----
Recall:
----- -----
False 0.968
True 0.943
----- -----
PR-AUC:
----- -----
False 0.994
True 0.885
----- -----
ROC-AUC:
----- -----
False 0.986
True 0.991
----- -----
Recall @ 0.1 false-positive rate:
label threshold recall fpr
------- ----------- -------- -----
False 0.13 0.984 0.094
True 0.124 0.988 0.092
Recall @ 0.98 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.05 0.986 0.98
True 0.987 0.062 0.998
Recall @ 0.9 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.015 0.999 0.903
True 0.945 0.497 0.923
Recall @ 0.45 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.012 1 0.894
True 0.057 0.996 0.467
Recall @ 0.15 precision:
label threshold recall precision
------- ----------- -------- -----------
False 0.012 1 0.894
True 0.008 1 0.256
References
[edit]- ↑ See en:Wikipedia:Labels/Edit quality for the English Wikipedia manual labeling campaign