Research:Automated classification of article importance/Importance API draft

From Meta, a Wikimedia project coordination wiki

Use cases[edit]

  • Members of a WikiProject want a list of articles that are candidates for updating their importance rating.
  • A member of a WikiProject has tagged a group of articles and would like suggestions for their importance rating.
  • A contributor has a list of articles they want to work on and would like to rank them by importance to know where to begin.
  • A patroller who is going through recent changes or their watchlist would like to identify changes to high-importance pages.

Key features[edit]

Side chain[edit]

Some WikiProjects (e.g. WikiProjects Medicine and National Football League) have certain categories of articles that receive specific importance ratings. For example, an article about an individual will always be Low-importance in WikiProject Medicine, and articles about National Football League seasons are High-importance in WikiProject National Football League. We therefore need a way to define these types of categories and their subsequent rating.

We choose to encode these relationships using a quadruple of (project name, predicate, object, rating). Project name is the name of the WikiProject. Predicate is the Wikidata predicate used to define the relationship between an article, which maps to an entity on Wikidata, and a given object. Object is a Wikidata entity. Rating is the importance rating given to an article if the relationship exists for that article.

One example is how all articles about individuals are rated Low-importance in WikiProject Medicine. This relationship can be expressed through the rule ("Medicine", "wdt:P31", "wd:Q5", "Low"). In plain English this rule becomes: an article which is an instance of ("wdt:P31") a human ("wd:Q5") should have a "Low" importance rating.

These rules can be fairly easily written in YAML and loaded into a dictionary-based data structure for fast lookups (see WikiProject Medicine's ruleset and the sidechain.py library on GitHub).