Grants:IdeaLab/Automatic article topic detection

From Meta, a Wikimedia project coordination wiki
Automatic article topic detection
Build a classifier that associates articles with WikiProjects. This could be used to route new article drafts to people with the right subject matter expertise for review. For example, English Wikipedia has different guidelines depending on topic. See en:Wikipedia:Notability (academics) for example.
contact emailUser:EpochFail
idea creator
created on18:35, Monday, March 7, 2016 (UTC)

Project idea[edit]

About the idea creator[edit]

I'm a Senior Research Scientist at the Wikimedia Foundation. I've personally performed research studies around the retention of newcomers, the quality of new article creations, and the design of machine learning tools for supporting curation practices.

What is the problem you're trying to solve?[edit]

Right now, patrollers all read from the same new article queue when reviewing. This requires patrollers to maintain a broad expertise in subject matter details and subject specific notability guidelines. Further, many articles are not claimed by WikiProjects or the full set of WikiProjects that might claim them.

What is your solution?[edit]

Develop an automated, machine learned classifier for general topic spaces (which roughly correspond to WikiProjects). We can train the model on articles that have already been claimed or not by WikiProjects. We can deploy the model using ORES to make it available to patrollers and other tool developers. This would allow us to (1) route new article creations towards those with relevant expertise, (2) auto-tag new article creations for relevance to a particular WikiProject and (3) help WikiProjects discover old articles that have yet to be associated to their WikiProject.

Such a model would have potentially broad ranging applications beyond the target of this idea proposal. E.g. routing good-faith newcomers who tend to work in a certain topic area to WikiProjects and helping new WikiProjects bootstrap -- by allowing them to quickly identify the articles most closely related to their topic of interest. If this model is highly accurate, it could also be used to identify new statements for Wikidata.

Project goals[edit]

  1. Build a high fitness article topic classification model
  2. Deploy model via ORES
  3. Re-engineer new article review to let patrollers filter by "predicted topic area"

Get involved[edit]


  • The trick to getting this working in practice is to find a route which can obtain a consensus of editors. I suggest that one such route would be a tool to suggest a set of wikiprojects (and perhaps other tags such as {{Disambiguation}}) for a new article and allow new page patroller to add them with a single click. Stuartyeates (talk) 02:11, 18 March 2016 (UTC)


Smallbones (talk) 20:20, 22 March 2016 (UTC)

  • I'm against general topic spaces next to WikiProjects - imo all sufficiently notable/large/general/... topic spaces should become WikiProjects. This automated topic detector could be used as a suggestor for a) article categories and b) WikiProjects. Instead of adding either straight away it adds these suggestions so that human editors can check them and add the categories & WikiProjects if appropriate. I think this is a great idea and one of many to implement if one would like to utilize the tremendous potential the WikiProjects (aka general topic spaces) have in fostering more and better edits (saying more [active] newcomers & experts; better choice of articles to edit by all editors). WikiProject X might also be interested in this. Fixuture (talk) 22:54, 29 March 2016 (UTC)

Expand your idea[edit]

Do you want to submit your idea for funding from the Wikimedia Foundation?

Expand your idea into a grant proposal