Research talk:Automated classification of article quality/Work log/2016-04-08
Add topicAppearance
Latest comment: 9 years ago by EpochFail in topic Friday, April 8, 2016
Friday, April 8, 2016
[edit]Quick pasting some notes on the last run:
$ cat enwiki.observations.first_labelings.20160204.json | grep '"stub"' | wc 3005521 28984420 337131600 $ cat enwiki.observations.first_labelings.20160204.json | grep '"start"' | wc 1398595 13858836 159669311 $ cat enwiki.observations.first_labelings.20160204.json | grep '"c"' | wc 211116 2086434 23159257 $ cat enwiki.observations.first_labelings.20160204.json | grep '"b"' | wc 134194 1332090 14731302 $ cat enwiki.observations.first_labelings.20160204.json | grep '"ga"' | wc 29417 295572 3260669 $ cat enwiki.observations.first_labelings.20160204.json | grep '"fa"' | wc 6696 68043 747531 $ cat enwiki.observations.first_labelings.20160204.json | grep '"a"' | wc 4661 46356 512263
--Halfak (WMF) (talk) 19:28, 8 April 2016 (UTC)
Just got a chance to actually build the model with this data. It doesn't look good.
ScikitLearnClassifier
- type: RF
- params: warm_start=false, max_features="auto", random_state=null, verbose=0, bootstrap=true, n_estimators=501, min_samples_leaf=8, oob_score=false, balanced_sample=true, max_depth=null, center=true, min_samples_split=2, scale=true, criterion="gini", max_leaf_nodes=null, class_weight=null, n_jobs=1, min_weight_fraction_leaf=0.0, balanced_sample_weight=false
- version: 0.3.1
- trained: 2016-04-13T00:13:15.203516
Table:
~b ~c ~fa ~ga ~start ~stub
----- ---- ---- ----- ----- -------- -------
b 328 246 102 171 133 17
c 151 504 25 142 179 17
fa 70 27 689 186 17 17
ga 68 92 257 535 24 7
start 86 147 5 23 548 133
stub 6 12 1 3 151 804
Accuracy: 0.575
ROC-AUC:
------- -----
'b' 0.782
'c' 0.843
'fa' 0.912
'ga' 0.864
'start' 0.873
'stub' 0.971
------- -----
F1:
----- -----
b 0.385
start 0.55
c 0.493
ga 0.524
stub 0.815
fa 0.661
----- -----
This is still low accuracy. I think that we should try full-on trying to change to Nettrom's strategy of only accepting a only the assessment classes that appear on the most recent version of the talk page. So, it'll take some hacking in order to do the next run. --EpochFail (talk) 14:07, 13 April 2016 (UTC)