Talk:Wiki labels/2015

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Coder server behavior[edit]

I spent some time fleshing out the coder server behavior. Generally, I'm thinking about this data model hierarchically. A wiki has campaigns; campaigns have worksets; and worksets contain revisions.

  • [wiki] > [campaign] > [workset] > [revision]

Users can request to be assigned a workset. A workset represents a small random sample from a whole campaign's larger sample. While a workset is active, those revisions are *owned* by user who claimed them until they abandon them or they expire.

See my notes on the REST interface below.

coder/
Lists out wikis with campaigns.
example response
{
    "wikis": ["enwiki", "ptwiki"]
}
coder/enwiki
Lists out active campaigns for English Wikipedia
example response
{
    "campaigns": ["Quality -- 10k sample 2014", "Edit type -- 10k sample 2014"]
}


coder/enwiki/?expand=true
Lists out active campaigns for English Wikipedia with metadata expanded.
example response
{
    "campaigns": {
        "Quality -- 10k sample 2014": {
            "form": "damaging_and_good-faith",
            "view": "diff_to_previous"
            "progress": {
                "completed": 150,
                "assigned": 275,
                "available": 750
            }
        },
        "Edit type -- 10k sample 2014": {
            "form": "damaging_and_good-faith",
            "view": "diff_to_previous"
            "progress": {
                "completed": 73,
                "assigned": 253,
                "available": 575
            }
        }
    ]
}


coder/enwiki/Quality_--_10k_sample_2014
Gathers metadata for a particular campaign
example response
{
    "Quality -- 10k sample 2014": {
        "form": "damaging_and_good-faith",
        "view": "diff_to_previous"
        "progress": {
            "completed": 150,
            "assigned": 275,
            "available": 750
        }
    }
}
coder/enwiki/Quality_--_10k_sample_2014?assign=workset
Requests that a new workset be assigned to the current user
coder/enwiki/Quality_--_10k_sample_2014/345
Gathers metadata for a particular workset
example response
{
    "workset": {
        "id": 345,
        "assignee": {
            'global_id': 467890,
            'username': "EpochFail"
        },
        "expiration": "2015-02-22T13:45:56Z",
        "revisions": [
            {"rev_id": 3456780},
            {"rev_id": 3456781},
            {"rev_id": 3456782},
            {"rev_id": 3456783},
            {"rev_id": 3456784},
            {"rev_id": 3456785},
            ...
        ]
    }
}
coder/enwiki/Quality_--_10k_sample_2014/345?submit=label&rev_id=3456780&label={...}
Submits a label (JSON blob) for a rev_id within a workset.
example response
{
    "success": true
}
coder/enwiki/Quality_--_10k_sample_2014/345?abandon=workset
Deletes a workset and frees the revisions to be labeled by others
example response
{
    "success": true
}
coder/login
Will forward to meta.wikimedia.org to attempt an OAuth handshake.
coder/logout
Logs the user out of the coding system
example response
{
    "success": true
}

That's all I've got for now. --Halfak (WMF) (talk) 22:31, 18 February 2015 (UTC)

Shouldn't "coder/enwiki" and "coder/enwiki/?expand=true" return something in common? E.g. the simplified version is returning the name "quality_2014" which doesn't appear anywhere in the expanded version. Maybe both should return an object, and the expanded one should be an "extension" of the simplified one? Helder 13:27, 19 February 2015 (UTC)
Yup. I've fixed it. --Halfak (WMF) (talk) 15:18, 19 February 2015 (UTC)
When you say "Deletes a workset and frees the revisions to be labeled by others", do you mean that the workset will still exist but that other users will be allowed to claim it? If so, this doesn't look like a "deletion" for me. It is just an un-assignment... Helder 13:31, 19 February 2015 (UTC)
Ahh. I see a workset as a "claim" on a some tasks. When that claim is deleted, the tasks are freed to be gathered in a new workset. Now that I think about it, this deviates from the strict hierarchy that I suggested above -- a campaign wouldn't contain worksets so much as worksets would be able to be generated from within a campaign on demand. Regardless, it seems that we're imagining the same thing. --Halfak (WMF) (talk) 15:18, 19 February 2015 (UTC)

Proposal: Integration of revision coder service[edit]

I've been thinking about where the "revcoder home" should actually live. I realized that in this mock, the revcoder stuff would push down the edits in Special:Contributions -- which would be frustrating and might convince editors to disable the gadget.

Instead, I thought that we might simply create a page in project space (e.g. en:Wikipedia:Revision_coder) and load up the list of active campaigns there. And it occurred to me that we could load up the revcoder in the same space as a single-page application. Here's a couple mockups that represent what I have in mind.

A mock-up of the revcoder gadget (home and form) are presented on top of a en:Wikipedia:Revision scoring page -- when no gadget is installed, a button is displayed to take the user to a set of instructions.
Integrated revcoder mock (before gadget install). A mock-up of the revcoder gadget (home and form) are presented on top of a en:Wikipedia:Revision scoring page -- when no gadget is installed, a button is displayed to take the user to a set of instructions.
A mock-up of the revcoder gadget (home and form) are presented on top of a en:Wikipedia:Revision scoring page.
Integrated revcoder mock. A mock-up of the revcoder gadget (home and form) are presented on top of a en:Wikipedia:Revision scoring page.

Having the revcoder operate as a single page application might require us to do a bit more work, but it will dramatically reduce the amount of requests that the revcoder gadget (and therefor the user's browser) will need to make to the revcoder service. It will also allow us to pre-load the next revision/diff to improve performance. --Halfak (WMF) (talk) 15:35, 19 February 2015 (UTC)

I'd like to throw my two cents on the matter. After Halfak's explanation yesterday, I find his approach on the matter quite sound. I think there is great benefit in having a gadget which has several campaigns which is divided among tasks that expire if people are sitting on them. I think this approach suits our crowd-sourcing culture at Wikimedia projects better. After all crowd-sourcing itself is a divide-and-conquer strategy to begin with. -- とある白い猫 chi? 18:20, 21 February 2015 (UTC)

Defining where the gadget will be executed[edit]

I think there are (at least) these two options:

  1. When a page is loaded, the loader part of the gadget checks if it is associated to a given Wikidata item (to be created once we translate the page to pt or other language), and if it is, just load the rest of the gadget code
  2. When the page is loaded, check its HTML to see if it has an element with id="foo-bar", and load the rest of the gadget in case there is such an element in the page.

Helder 22:43, 10 March 2015 (UTC)

I implemented option 2. Helder 20:10, 15 March 2015 (UTC)

Schema proposal[edit]

Hey folks,

I worked up a schema file to propose the structure of the revision coder database. Note that both "task.meta" and "label.data" are schemaless JSON fields. This will enable us to describe arbitrary task types (not just review of a revision) and arbitrary label data (not just two boolean fields). I've also included some sample queries that will be used within the API.

  1 CREATE TABLE user (
  2   id INT,
  3   created TIMESTAMP,
  4   touched TIMESTAMP,
  5   PRIMARY KEY(id)
  6 );
  7 /*
  8 INSERT INTO user (id, created, touched) VALUES (608542, NOW(), NOW());
  9 */
 10 
 11 CREATE TABLE campaign (
 12   id SERIAL,
 13   name VARCHAR(255),
 14   form VARCHAR(255),
 15   view VARCHAR(255),
 16   created TIMESTAMP,
 17   PRIMARY KEY(id)
 18 );
 19 /*
 20 -- Inserts a new campaign
 21 INSERT INTO campaign (name, form, view, created)
 22 VALUES ("Edit quality -- 2015 sample", NOW());
 23 
 24 -- Gathers summary statistics and metadata for a campaign
 25 SELECT
 26   campaign.name AS campaign_name,
 27   campaign.created AS campaign_created,
 28   COUNT(DISTINCT task.id) AS tasks,
 29   SUM(label.task_id IS NOT NONE) AS labels
 30 FROM campaign
 31 LEFT JOIN task ON task.campaign_id = campaign.id
 32 LEFT JOIN label ON label.task_id = task.id
 33 WHERE campaign.id = 345;
 34 */
 35 
 36 
 37 CREATE TABLE task (
 38   id SERIAL,
 39   campaign_id INT,
 40   created TIMESTAMP,
 41   meta JSONB,
 42   PRIMARY KEY(id),
 43   KEY(campaign_id)
 44 );
 45 /*
 46 -- Inserts a new task
 47 INSERT INTO task (campaign_id, created, meta)
 48 VALUES (345, NOW(), '{"rev_id": 506725001}');
 49 
 50 -- Gets all tasks and labels for a particular campaign
 51 SELECT
 52   task.id AS task_id,
 53   campaign_id,
 54   label.user_id AS label_user,
 55   label.timestamp AS label_timestamp,
 56   task.meta AS task_meta,
 57   label.data AS label_data
 58 FROM task
 59 LEFT JOIN label ON label.task_id = task.id
 60 WHERE task.campaign_id = 345;
 61 */
 62 
 63 CREATE TABLE label (
 64   task_id INT,
 65   user_id INT,
 66   timestamp TIMESTAMP,
 67   data JSONB,
 68   PRIMARY_KEY(task_id, user_id),
 69   KEY(user_id)
 70 )
 71 /*
 72 -- Inserts a new label
 73 INSERT INTO label (task_id, user_id, timestamp, data)
 74 VALUES (12, 608542, NOW(), '{"damaging": false, "good-faith": true}');
 75 
 76 -- Gathers the labels for a particular task
 77 SELECT
 78   label.task_id,
 79   label.user_id,
 80   label.timestamp,
 81   label.data
 82 FROM labelBridget Sundell
 83 WHERE task_id = 12;
 84 */
 85 
 86 CREATE TABLE workset (
 87   id SERIAL,
 88   user_id INT,
 89   created TIMESTAMP,
 90   expires TIMESTAMP,
 91   PRIMARY_KEY(id),
 92   KEY(user_id)
 93 );
 94 /*
 95 -- Inserts a new workset (but doesn't assign tasks yet)
 96 INSERT INTO workset (user_id, created, expires)
 97 VALUES (608542, NOW(), NOW() + INTERVAL '1 DAY');
 98 
 99 -- Gathers the task and label data for a workset
100 SELECT
101   workset.id AS workset_id,
102   task.id AS task_id,
103   task.meta AS task_meta,
104   label.data AS label_data
105 FROM workset
106 LEFT JOIN workset_task ON workset_task.workset_id = workset.id
107 LEFT JOIN task ON workset_task.task_id = task.id
108 LEFT JOIN label ON label.task_id = task.id
109 WHERE workset.id = 345
110 */
111 
112 CREATE TABLE workset_task (
113   workset_id INT,
114   task_id INT,
115   KEY(workset_id, task_id),
116   KEY(task_id)
117 );
118 /*
119 -- Assigns a task to a workset
120 INSERT INTO workset_task (workset_id, task_id)
121 VALUES (345, 12);
122 */

--Halfak (WMF) (talk) 16:38, 21 March 2015 (UTC)

@Halfak (WMF): is there a typo on line 82?
Looks good otherwise (as far as I can understand SQL). Helder 21:11, 24 March 2015 (UTC)
Indeed that is a typo! Apparently I was in the middle of googling my sister (http://www.sundelleye.com/doctors/) when I was writing that code and lost track of where my cursor was! I'm experimenting with implementing this schema today. Still no word on access to shared postgres instance, but we can always set up a local instance in our VM if we need to. --Halfak (WMF) (talk) 21:22, 24 March 2015 (UTC)