Research talk:Automated classification of article quality/Work log/2016-04-08

Friday, April 8, 2016[edit]

Quick pasting some notes on the last run:

$ cat enwiki.observations.first_labelings.20160204.json | grep '"stub"' | wc
3005521 28984420 337131600

$ cat enwiki.observations.first_labelings.20160204.json | grep '"start"' | wc
1398595 13858836 159669311

$ cat enwiki.observations.first_labelings.20160204.json | grep '"c"' | wc
 211116 2086434 23159257

$ cat enwiki.observations.first_labelings.20160204.json | grep '"b"' | wc
 134194 1332090 14731302

$ cat enwiki.observations.first_labelings.20160204.json | grep '"ga"' | wc
  29417  295572 3260669

$ cat enwiki.observations.first_labelings.20160204.json | grep '"fa"' | wc
   6696   68043  747531

$ cat enwiki.observations.first_labelings.20160204.json | grep '"a"' | wc
   4661   46356  512263

--Halfak (WMF) (talk) 19:28, 8 April 2016 (UTC)[reply]

Just got a chance to actually build the model with this data. It doesn't look good.

ScikitLearnClassifier
 - type: RF
 - params: warm_start=false, max_features="auto", random_state=null, verbose=0, bootstrap=true, n_estimators=501, min_samples_leaf=8, oob_score=false, balanced_sample=true, max_depth=null, center=true, min_samples_split=2, scale=true, criterion="gini", max_leaf_nodes=null, class_weight=null, n_jobs=1, min_weight_fraction_leaf=0.0, balanced_sample_weight=false
 - version: 0.3.1
 - trained: 2016-04-13T00:13:15.203516

Table:
                 ~b    ~c    ~fa    ~ga    ~start    ~stub
        -----  ----  ----  -----  -----  --------  -------
        b       328   246    102    171       133       17
        c       151   504     25    142       179       17
        fa       70    27    689    186        17       17
        ga       68    92    257    535        24        7
        start    86   147      5     23       548      133
        stub      6    12      1      3       151      804

Accuracy: 0.575
ROC-AUC:
        -------  -----
        'b'      0.782
        'c'      0.843
        'fa'     0.912
        'ga'     0.864
        'start'  0.873
        'stub'   0.971
        -------  -----

F1:
        -----  -----
        b      0.385
        start  0.55
        c      0.493
        ga     0.524
        stub   0.815
        fa     0.661
        -----  -----

This is still low accuracy. I think that we should try full-on trying to change to Nettrom's strategy of only accepting a only the assessment classes that appear on the most recent version of the talk page. So, it'll take some hacking in order to do the next run. --EpochFail (talk) 14:07, 13 April 2016 (UTC)[reply]