Objective Revision Evaluation Service/damaging
This model was trained on human judgement[1] for whether or not an edit is damaging. It is useful for quality control tools (e.g. en:WP:Huggle and en:User:ClueBot NG)
This model is trained to predict damaging edits. Not all damaging edits are bad-faith. Consume scores with this in mind and see ORES/goodfaith for a model which predicts if an edit was made in good-faith.
Contexts (wikis)
[edit]These stats are slightly out of date. For most up to date stats, follow the link to "?model_info" linked for each model. |
English Wikipedia (enwiki)
[edit]https://ores.wmflabs.org/v2/scores/enwiki/damaging/?model_info
ScikitLearnClassifier - type: GradientBoosting - params: loss="deviance", balanced_sample=false, subsample=1.0, warm_start=false, learning_rate=0.01, n_estimators=700, max_features="log2", balanced_sample_weight=true, presort="auto", init=null, min_samples_split=2, center=true, min_weight_fraction_leaf=0.0, scale=true, max_leaf_nodes=null, max_depth=7, min_samples_leaf=1, random_state=null, verbose=0 - version: 0.3.0 - trained: 2017-01-06T19:29:12.793824 Table: ~False ~True ----- -------- ------- False 17203 1664 True 211 455 Accuracy: 0.904 Precision: ----- ----- False 0.988 True 0.215 ----- ----- Recall: ----- ----- False 0.912 True 0.683 ----- ----- PR-AUC: ----- ----- False 0.993 True 0.391 ----- ----- ROC-AUC: ----- ----- False 0.918 True 0.919 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.862 0.76 0.093 True 0.469 0.713 0.098 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.259 0.972 0.98 True 0.955 0.068 1 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.04 1 0.967 True 0.945 0.091 0.98 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.04 1 0.967 True 0.831 0.326 0.468 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.04 1 0.967 True 0.281 0.828 0.154
Persian Wikipedia (fawiki)
[edit]https://ores.wmflabs.org/v2/scores/fawiki/damaging/?model_info
ScikitLearnClassifier - type: GradientBoosting - params: min_samples_leaf=1, warm_start=false, verbose=0, min_weight_fraction_leaf=0.0, max_leaf_nodes=null, min_samples_split=2, scale=true, max_depth=7, center=true, max_features="log2", balanced_sample=false, init=null, loss="deviance", n_estimators=700, random_state=null, balanced_sample_weight=true, subsample=1.0, presort="auto", learning_rate=0.01 - version: 0.3.0 - trained: 2017-01-06T20:15:09.126168 Table: ~False ~True ----- -------- ------- False 18876 685 True 98 145 Accuracy: 0.96 Precision: ----- ----- False 0.995 True 0.178 ----- ----- Recall: ----- ----- False 0.965 True 0.607 ----- ----- PR-AUC: ----- ----- False 0.995 True 0.228 ----- ----- ROC-AUC: ----- ----- False 0.959 True 0.968 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.866 0.913 0.09 True 0.092 0.927 0.09 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.042 1 0.988 True 0.963 0.043 1 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.042 1 0.988 True 0.963 0.043 1 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.042 1 0.988 True 0.922 0.143 0.592 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.042 1 0.988 True 0.313 0.797 0.156
Dutch Wikipedia (nlwiki)
[edit]https://ores.wmflabs.org/v2/scores/nlwiki/damaging/?model_info
ScikitLearnClassifier - type: GradientBoosting - params: warm_start=false, learning_rate=0.01, n_estimators=700, loss="deviance", verbose=0, max_leaf_nodes=null, min_weight_fraction_leaf=0.0, init=null, presort="auto", scale=true, min_samples_split=2, max_depth=5, center=true, balanced_sample=false, random_state=null, max_features="log2", balanced_sample_weight=true, min_samples_leaf=1, subsample=1.0 - version: 0.3.0 - trained: 2017-01-06T21:49:27.818262 Table: ~False ~True ----- -------- ------- False 16763 1714 True 105 882 Accuracy: 0.907 Precision: ----- ----- False 0.994 True 0.339 ----- ----- Recall: ----- ----- False 0.907 True 0.895 ----- ----- PR-AUC: ----- ----- False 0.994 True 0.646 ----- ----- ROC-AUC: ----- ----- False 0.96 True 0.963 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.528 0.905 0.096 True 0.446 0.912 0.098 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.152 0.969 0.98 True 0.966 0.109 1 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.03 1 0.952 True 0.952 0.211 0.936 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.03 1 0.952 True 0.8 0.739 0.455 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.03 1 0.952 True 0.1 0.977 0.161
Polish Wikipedia (plwiki)
[edit]https://ores.wmflabs.org/v2/scores/plwiki/damaging/?model_info
ScikitLearnClassifier - type: GradientBoosting - params: warm_start=false, max_depth=7, presort="auto", n_estimators=700, random_state=null, min_samples_leaf=1, min_weight_fraction_leaf=0.0, min_samples_split=2, init=null, max_features="log2", learning_rate=0.01, scale=true, verbose=0, max_leaf_nodes=null, loss="deviance", balanced_sample_weight=true, subsample=1.0, center=true, balanced_sample=false - version: 0.3.0 - trained: 2017-01-06T22:25:42.924606 Table: ~False ~True ----- -------- ------- False 11527 337 True 56 676 Accuracy: 0.969 Precision: ----- ----- False 0.995 True 0.667 ----- ----- Recall: ----- ----- False 0.972 True 0.923 ----- ----- PR-AUC: ----- ----- False 0.995 True 0.924 ----- ----- ROC-AUC: ----- ----- False 0.983 True 0.977 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.366 0.987 0.093 True 0.304 0.95 0.086 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.124 0.999 0.981 True 0.857 0.719 0.99 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.052 1 0.956 True 0.736 0.859 0.911 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.052 1 0.956 True 0.371 0.942 0.494 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.052 1 0.956 True 0.097 0.979 0.198
Portuguese Wikipedia (ptwiki)
[edit]https://ores.wmflabs.org/v2/scores/ptwiki/damaging/?model_info
ScikitLearnClassifier - type: GradientBoosting - params: center=true, min_samples_split=2, balanced_sample_weight=true, scale=true, min_samples_leaf=1, warm_start=false, n_estimators=700, subsample=1.0, init=null, learning_rate=0.01, random_state=null, verbose=0, max_features="log2", presort="auto", loss="deviance", max_leaf_nodes=null, max_depth=7, min_weight_fraction_leaf=0.0, balanced_sample=false - version: 0.3.0 - trained: 2017-01-06T22:41:56.417405 Table: ~False ~True ----- -------- ------- False 16008 2439 True 276 1090 Accuracy: 0.863 Precision: ----- ----- False 0.983 True 0.309 ----- ----- Recall: ----- ----- False 0.868 True 0.798 ----- ----- PR-AUC: ----- ----- False 0.991 True 0.52 ----- ----- ROC-AUC: ----- ----- False 0.926 True 0.931 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.711 0.79 0.097 True 0.591 0.727 0.098 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.439 0.891 0.98 True 0.957 0.039 1 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.033 1 0.932 True 0.952 0.056 0.946 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.033 1 0.932 True 0.734 0.578 0.453 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.033 1 0.932 True 0.048 0.991 0.154
Russian Wikipedia (ruwiki)
[edit]https://ores.wmflabs.org/v2/scores/ruwiki/damaging/?model_info
ScikitLearnClassifier - type: GradientBoosting - params: balanced_sample=false, warm_start=false, subsample=1.0, n_estimators=700, max_depth=5, scale=true, learning_rate=0.01, random_state=null, min_samples_leaf=1, init=null, balanced_sample_weight=true, presort="auto", max_features="log2", loss="deviance", max_leaf_nodes=null, min_weight_fraction_leaf=0.0, center=true, min_samples_split=2, verbose=0 - version: 0.3.0 - trained: 2017-01-06T22:57:34.396636 Table: ~False ~True ----- -------- ------- False 15856 2825 True 115 939 Accuracy: 0.851 Precision: ----- ----- False 0.993 True 0.25 ----- ----- Recall: ----- ----- False 0.849 True 0.89 ----- ----- PR-AUC: ----- ----- False 0.992 True 0.436 ----- ----- ROC-AUC: ----- ----- False 0.933 True 0.94 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.542 0.843 0.094 True 0.737 0.728 0.099 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.232 0.916 0.98 True 0.951 0.031 1 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.044 1 0.947 True 0.949 0.043 0.985 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.044 1 0.947 True 0.855 0.406 0.456 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.044 1 0.947 True 0.088 0.995 0.166
Turkish Wikipedia (trwiki)
[edit]https://ores.wmflabs.org/v2/scores/trwiki/damaging/?model_info
ScikitLearnClassifier - type: GradientBoosting - params: balanced_sample_weight=true, n_estimators=700, learning_rate=0.01, subsample=1.0, warm_start=false, verbose=0, loss="deviance", max_leaf_nodes=null, max_features="log2", min_weight_fraction_leaf=0.0, min_samples_split=2, presort="auto", init=null, min_samples_leaf=1, center=true, balanced_sample=false, scale=true, max_depth=7, random_state=null - version: 0.3.0 - trained: 2017-01-06T23:24:25.903527 Table: ~False ~True ----- -------- ------- False 16069 2688 True 216 758 Accuracy: 0.853 Precision: ----- ----- False 0.987 True 0.22 ----- ----- Recall: ----- ----- False 0.857 True 0.777 ----- ----- PR-AUC: ----- ----- False 0.992 True 0.309 ----- ----- ROC-AUC: ----- ----- False 0.909 True 0.914 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.758 0.798 0.096 True 0.651 0.664 0.099 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.337 0.904 0.98 True 0.928 0.021 1 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.067 1 0.951 True 0.928 0.021 1 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.067 1 0.951 True 0.872 0.168 0.459 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.067 1 0.951 True 0.104 0.944 0.153
Wikidata (wikidatawiki)
[edit]https://ores.wmflabs.org/v2/scores/wikidatawiki/damaging/?model_info
ScikitLearnClassifier - type: GradientBoosting - params: balanced_sample=false, min_samples_split=2, verbose=0, warm_start=false, center=true, random_state=null, presort="auto", scale=true, balanced_sample_weight=true, min_weight_fraction_leaf=0.0, init=null, max_depth=7, max_leaf_nodes=null, min_samples_leaf=1, max_features="log2", n_estimators=700, loss="deviance", learning_rate=0.01, subsample=1.0 - version: 0.3.0 - trained: 2017-01-07T00:52:58.800624 Table: ~False ~True ----- -------- ------- False 21094 690 True 150 2498 Accuracy: 0.966 Precision: ----- ----- False 0.993 True 0.784 ----- ----- Recall: ----- ----- False 0.968 True 0.943 ----- ----- PR-AUC: ----- ----- False 0.994 True 0.885 ----- ----- ROC-AUC: ----- ----- False 0.986 True 0.991 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.13 0.984 0.094 True 0.124 0.988 0.092 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.05 0.986 0.98 True 0.987 0.062 0.998 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.015 0.999 0.903 True 0.945 0.497 0.923 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.012 1 0.894 True 0.057 0.996 0.467 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.012 1 0.894 True 0.008 1 0.256
References
[edit]- ↑ See en:Wikipedia:Labels/Edit quality for the English Wikipedia manual labeling campaign