Research:Automated classification of article importance/Importance API draft
Use cases[edit]
- Members of a WikiProject want a list of articles that are candidates for updating their importance rating.
- A member of a WikiProject has tagged a group of articles and would like suggestions for their importance rating.
- A contributor has a list of articles they want to work on and would like to rank them by importance to know where to begin.
- A patroller who is going through recent changes or their watchlist would like to identify changes to high-importance pages.
Key features[edit]
Side chain[edit]
Some WikiProjects (e.g. WikiProjects Medicine and National Football League) have certain categories of articles that receive specific importance ratings. For example, an article about an individual will always be Low-importance in WikiProject Medicine, and articles about National Football League seasons are High-importance in WikiProject National Football League. We therefore need a way to define these types of categories and their subsequent rating.
We choose to encode these relationships using a quadruple of (project name, predicate, object, rating)
. Project name is the name of the WikiProject. Predicate is the Wikidata predicate used to define the relationship between an article, which maps to an entity on Wikidata, and a given object. Object is a Wikidata entity. Rating is the importance rating given to an article if the relationship exists for that article.
One example is how all articles about individuals are rated Low-importance in WikiProject Medicine. This relationship can be expressed through the rule ("Medicine", "wdt:P31", "wd:Q5", "Low")
. In plain English this rule becomes: an article which is an instance of ("wdt:P31") a human ("wd:Q5") should have a "Low" importance rating.
These rules can be fairly easily written in YAML and loaded into a dictionary-based data structure for fast lookups (see WikiProject Medicine's ruleset and the sidechain.py library on GitHub).