Research talk:Autoconfirmed article creation trial/Work log/2017-08-04

Friday, August 4, 2017

Continuing to work on instrumentation and measurements today.

H13: The rate of new submissions at AfC will increase.

Submissions to AfC are archived using the category system starting at Category:AfC submissions by date. However, drafts that are either rejected or not submitted to AfC and not edited for more than six months are eligible for deletion per CSD:G13. I'll look into whether the historical information in the data lake can be used to identify when articles have been submitted.

H14: The backlog of articles in the AfC queue will increase faster than expected.

When drafts get submitted to AfC, they are added to Category:Pending AfC submissions. Similarly as we did for H13, we could use entry/exit of this category to understand the queue of AfCs.

H15: The reasons for deleting articles will remain stable.

See npp_report.rb in MusikAnimal's GitHub repository. It gathers data on reasons for why articles get deleted.

H16: The reasons for deleting non-article pages will change towards those previously used for deletion of articles created by non-autoconfirmed users.

MusikAnimal's code gives us the statistics for articles pages. We should be able to adapt it to also check other types of pages. Mainly, we're interested in what happens in User space, because that's where user drafts are, and the Draft namespace.

H17: The quality of articles entering the NPP queue will increase.

There's at least two ways of going about measuring this, and they differ depending on how we define "quality". One way to define quality is to use ORES' wp10 model, meaning that we compare these newly created articles against the WP 1.0 assessment criteria. In that case we can either use specific classes (e.g. Stub, Start) or calculate a completion score similar to what's used in the Education Dashboard and in the analysis of the Keilana Effect.

We could also define quality in terms of New Page Patrol. Perhaps it's possible to use a combination of ORES' wp10 and draftquality models, as well as some of the metrics from the PageTriage extension to label articles as "definite reject", "definite accept", and "other"? That also ties in with the workload of NPPers, so it might be more useful as a hypothesis about NPP.

H18: The quality of newly created articles after 30 days will be unchanged.

Here we are interested in the quality of articles after 30 days, so we can use the revision table to identify the revision for newly created articles after 30 days, then either grab the revision content and run ORES offline, or use the revision ID to get it from ORES' API. We can then either use the predicted class or a completion score as discussed for H17.

H19: The quality of articles entering the AfC queue will be unchanged.

Little is known about the quality of articles entering AfC. We propose to again use ORES for this to predict the quality of the article when it entered AfC. However, ORES might not fit well in this case since drafts might look significantly different from typical Wikipedia articles. Given the length of AfC, we might be able to collect training data and see if it's feasible to train classifiers for common review notes (e.g. "this draft does not assert notability").