Research talk:Revision scoring as a service/Work log/2016-02-23

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Tuesday, February 23, 2016[edit]

OK. Today I'm trying to do what we were doing with Urdu Wikipedia but with Polish Wikipedia instead. Here's list of 500K randomly sampled edits: http://quarry.wmflabs.org/query/7543

Prelabel is running now Amir (talk) 18:24, 23 February 2016 (UTC)

OK. It's done:

(3.4)ladsgroup@ores-compute:~/editquality/datasets$ wc plwiki.prelabeled_revisions.500k_2015.tsv 
  499736  1933819 13821243 plwiki.prelabeled_revisions.500k_2015.tsv
(3.4)ladsgroup@ores-compute:~/editquality/datasets$ cat plwiki.prelabeled_revisions.500k_2015.tsv | grep "True" | wc
  82484  264812 1720937
(3.4)ladsgroup@ores-compute:~/editquality/datasets$ cat plwiki.prelabeled_revisions.500k_2015.tsv | grep "reverted" | wc
  14861   59444  416108

So 16.5% of edits needs review. That's good :) and 3% are reverted.

I sampled 5K to load it up to Wikilabels:

(
  echo "rev_id\tneeds_review\treason";
  (
    cat datasets/plwiki.prelabeled_revisions.500k_2015.tsv | \
    grep "True" | \
    shuf -n 2500; \
    cat datasets/plwiki.prelabeled_revisions.500k_2015.tsv | \
    grep "False" | \
    shuf -n 2500 \
 ) | \
 shuf \
) > datasets/plwiki.revisions_for_review.5k_2015.tsv

Using shuffle, I extracted 20K revs to build the reverted model:

cat datasets/plwiki.sampled_revisions.500k_2015.tsv | \
    shuf -n 20000 > datasets/plwiki.sampled_revisions.20k_2015.tsv

Then we should add "rev_id" to the first line and check if "rev_id" is not accidentally added to revs. (check) Then running label reverted:

cat datasets/plwiki.sampled_revisions.20k_2015.tsv | \
    ./utility label_reverted \
        --host https://pl.wikipedia.org \
        --revert-radius 3 \
        --verbose > datasets/plwiki.rev_reverted.20k_2015.tsv

It's labeling them.

Now I'm extracting features:

cat datasets/plwiki.rev_reverted.20k_2015.tsv | \
        revscoring extract_features \
                editquality.feature_lists.plwiki.reverted \
                --host https://pl.wikipedia.org \
                --include-revid \
                --verbose > \
        datasets/plwiki.features_reverted.20k_2015.tsv

OK. I ran tuning reports and turned out RF is the best. Strange. Everything I touch turns into RF :D

Running with best settings:

>         revscoring train_test \
>                 revscoring.scorer_models.RF \
>                 editquality.feature_lists.plwiki.reverted \
>                 --version 0.1.0 \
>                 -p 'max_features="log2"' \
>                 -p 'criterion="entropy"' \
>                 -p 'min_samples_leaf=7' \
>                 -p 'n_estimators=640' \
>                 -s 'pr' -s 'roc' \
>                 -s 'recall_at_fpr(max_fpr=0.10)' \
>                 -s 'filter_rate_at_recall(min_recall=0.90)' \
>                 -s 'filter_rate_at_recall(min_recall=0.75)' \
>                 --balance-sample-weight \
>                 --center --scale \
>                 --label-type=bool > \
>         models/plwiki.reverted.rf.model
2016-02-23 22:21:47,424 INFO:revscoring.utilities.train_test -- Training model...
2016-02-23 22:22:08,411 INFO:revscoring.utilities.train_test -- Testing model...
ScikitLearnClassifier
 - type: RF
 - params: oob_score=false, scale=true, center=true, warm_start=false, criterion="entropy", random_state=null, max_leaf_nodes=null, class_weight=null, n_jobs=1, n_estimators=640, min_samples_leaf=7, min_weight_fraction_leaf=0.0, verbose=0, balanced_sample_weight=true, min_samples_split=2, max_depth=null, max_features="log2", bootstrap=true
 - version: 0.1.0
 - trained: 2016-02-23T22:22:08.411624

         ~False    ~True
-----  --------  -------
False      3723      155
True         50       66

Accuracy: 0.9486730095142714

PR-AUC: 0.327
Filter rate @ 0.9 recall: threshold=0.08, filter_rate=0.736, recall=0.905
Recall @ 0.1 false-positive rate: threshold=0.918, recall=0.017, fpr=0.0
Filter rate @ 0.75 recall: threshold=0.235, filter_rate=0.903, recall=0.75
ROC-AUC: 0.912

Look "Recall @ 0.1" Wooot! Amir (talk) 22:37, 23 February 2016 (UTC)

Size of the model:

(3.4)ladsgroup@ores-compute:~/editquality/models$ ls -Ssh | grep plwiki.reverted.rf.model
 17M plwiki.reverted.rf.model

OK. We are good to go! Amir (talk) 22:51, 23 February 2016 (UTC)