Research talk:Automated classification of edit types/Work log/2016-10-26

From Meta, a Wikimedia project coordination wiki

Wednesday, October 26, 2016[edit]

Today, I'm loading up a new dataset for review. This dataset has a sample of 200 edits that have been automatically sampled based on regex matches to edit comments for 11 classes of our taxonomy. That means 2200 total observation (minus 4 that were overlapping).

u_wikilabels=> select * from campaign where wiki = 'enwiki' and active;
 id |              name              |  wiki  |          form          |       view       |          created           | labels_per_task | tasks_per_assignment | active 
----+--------------------------------+--------+------------------------+------------------+----------------------------+-----------------+----------------------+--------
 32 | Edit type (5k revisions)       | enwiki | edit_type              | DiffToPrevious   | 2016-04-13 15:42:46.049303 |               4 |                   10 | t
 31 | Edit type IRR (200 revisions)  | enwiki | edit_type              | DiffToPrevious   | 2016-04-13 15:33:23.967297 |              20 |                   10 | t
 37 | Article topic (5k pages)       | enwiki | article_topic          | PageAsOfRevision | 2016-06-30 16:19:24.756233 |               2 |                   10 | t
 41 | Edit quality (20k 2016 sample) | enwiki | damaging_and_goodfaith | DiffToPrevious   | 2016-08-23 20:54:18.531889 |               1 |                   50 | t

Let's deactivate the two campaigns that are active first.

u_wikilabels=> update campaign set active = False where id in (31, 32);
UPDATE 2

OK now ready to load.

$ cat datasets/enwiki.revision_sample.edit_types.2200.tsv | head | tsv2json int
{"rev_id": 742941747}
{"rev_id": 742980776}
{"rev_id": 744071105}
{"rev_id": 745479813}
{"rev_id": 744497175}
{"rev_id": 744984185}
{"rev_id": 743867045}
{"rev_id": 741837444}
{"rev_id": 745157257}

OK. That looks good.

$ sudo -u www-data /srv/wikilabels/venv/bin/wikilabels new_campaign enwiki "Edit type (2.2k revisions)" edit_type DiffToPrevious 2 10
{'view': 'DiffToPrevious', 'tasks_per_assignment': 10, 'name': 'Edit type (2.2k revisions)', 'created': datetime.datetime(2016, 10, 26, 21, 7, 5, 764611), 'labels_per_task': 2, 'id': 43, 'wiki': 'enwiki', 'active': True, 'form': 'edit_type'}
$ cat ~/datasets/enwiki.revision_sample.edit_types.2200.tsv | tsv2json int | sudo -u www-data /srv/wikilabels/venv/bin/wikilabels task_inserts 43

Done! --EpochFail (talk) 21:10, 26 October 2016 (UTC)[reply]