Research talk:Autoconfirmed article creation trial/Work log/2017-09-15

From Meta, a Wikimedia project coordination wiki

Friday, September 15, 2017[edit]

Today I'll start out by wrapping up yesterday's discussion of quality predictions, then move on to writing a dataset spec, then continue gathering data.

Deleted revisions[edit]

We use the data lake to get information on article creations. From there we use the revision and archive tables to determine if the page with the revisions has been deleted or not. We use ORES to get draft and wp10 quality predictions for all live revisions, and use the API to get the content of all available archived revisions. Note that we say all available archived revisions because if a revision gets deleted due to copyright infringement, it cannot be retrieved through the API. This means that we cannot get predictions for all created articles after the fact, and will most likely do some filtering on speedy deletion criteria in order to understand more about content quality.