Research talk:Automated classification of edit types/Work log/2016-10-26
Wednesday, October 26, 2016[edit]
Today, I'm loading up a new dataset for review. This dataset has a sample of 200 edits that have been automatically sampled based on regex matches to edit comments for 11 classes of our taxonomy. That means 2200 total observation (minus 4 that were overlapping).
u_wikilabels=> select * from campaign where wiki = 'enwiki' and active; id | name | wiki | form | view | created | labels_per_task | tasks_per_assignment | active ----+--------------------------------+--------+------------------------+------------------+----------------------------+-----------------+----------------------+-------- 32 | Edit type (5k revisions) | enwiki | edit_type | DiffToPrevious | 2016-04-13 15:42:46.049303 | 4 | 10 | t 31 | Edit type IRR (200 revisions) | enwiki | edit_type | DiffToPrevious | 2016-04-13 15:33:23.967297 | 20 | 10 | t 37 | Article topic (5k pages) | enwiki | article_topic | PageAsOfRevision | 2016-06-30 16:19:24.756233 | 2 | 10 | t 41 | Edit quality (20k 2016 sample) | enwiki | damaging_and_goodfaith | DiffToPrevious | 2016-08-23 20:54:18.531889 | 1 | 50 | t
Let's deactivate the two campaigns that are active first.
u_wikilabels=> update campaign set active = False where id in (31, 32); UPDATE 2
OK now ready to load.
$ cat datasets/enwiki.revision_sample.edit_types.2200.tsv | head | tsv2json int {"rev_id": 742941747} {"rev_id": 742980776} {"rev_id": 744071105} {"rev_id": 745479813} {"rev_id": 744497175} {"rev_id": 744984185} {"rev_id": 743867045} {"rev_id": 741837444} {"rev_id": 745157257}
OK. That looks good.
$ sudo -u www-data /srv/wikilabels/venv/bin/wikilabels new_campaign enwiki "Edit type (2.2k revisions)" edit_type DiffToPrevious 2 10 {'view': 'DiffToPrevious', 'tasks_per_assignment': 10, 'name': 'Edit type (2.2k revisions)', 'created': datetime.datetime(2016, 10, 26, 21, 7, 5, 764611), 'labels_per_task': 2, 'id': 43, 'wiki': 'enwiki', 'active': True, 'form': 'edit_type'}
$ cat ~/datasets/enwiki.revision_sample.edit_types.2200.tsv | tsv2json int | sudo -u www-data /srv/wikilabels/venv/bin/wikilabels task_inserts 43
Done! --EpochFail (talk) 21:10, 26 October 2016 (UTC)