Research:Knowledge Gaps Index/API

From Meta, a Wikimedia project coordination wiki

The knowledge gaps endpoint is an addition to the existing V1 Analytics Query Service (AQS) intended to give researchers access to useful knowledge gaps data aggregated monthly by Wikimedia as part of our data equity efforts

Information on the full scope of the Knowledge Gaps project can be found here [[1]]


The knowledge gaps data is aggregated into eight "gaps" of interest which are:

  • gender: Man, Woman, Non-binary, Transgender, etc
  • time: 14th century, 20th century, antiquity, etc
  • geography_wmf_region
  • geography_cultural_region:
  • multimedia_illustration
  • sexual_orientation: Heterosexual, Homosexual, Bisexual, etc
  • geography_continent: Africa, Europe, Asia, etc
  • geography_country: FR, GB, US, etc
This is a tentative list and could be changed or expanded as the project grows

Each content gap have "categories" within which there are distinct metrics of interest which are:

  • standard_quality_count
  • pageviews_mean
  • article_count
  • article_created
  • revision_count
  • standard_quality
  • pageviews_sum
  • quality_score

These metrics reflect the level of interaction each of these categories get on wikimedia.

Querying the endpoint[edit]

The general format for querying the AQS endpoint for knowledge_gap metric data is:

https://wikimedia.org/api/rest_v1/metrics/knowledge-gap/per-category/{project}/{content_gap}/{category}/{start}/{end}

where:

  • project: This is the wikimedia project that the category was derived from, ie, en.wikipedia, fr.wikipedia, etc
  • content_gap: This is one the eight gap of interest that the knowledge gaps dataset is classified by.
  • category: This is the entity with the measurable metrics. It is dependent on the content gap classification and could be a;
  • country ('geography_country' content gap)
  • transgender person ('gender' content gap)
  • A time period ('time' content gap) etc
  • start & end: The range of time for which metric data should be calculated and returning. The values can be the same if you are interested in only getting the metrics for a single month.

An Example Query[edit]

 https://wikimedia.org/api/rest_v1/metrics/knowledge-gap/per-category/en.wikipedia/geography_country/NG/20221101/20221201

Important Note

For the start and end dates, ie, 20221101 and 20221201, it represents the year (2022) and month (11 and 12). All dates end with a '01' prefix to conform to the internal wiki time range interface.

Totals[edit]

Within the dataset, there is a monthly aggregation of all metrics across all categories for a given content gap. This aggregation of metrics is referred to as the "totals" and can be queried using the same endpoint as it is treated as a catefory itself

An example query of this is:

https://wikimedia.org/api/rest_v1/metrics/knowledge-gap/per-category/en.wikipedia/gender/all-categories/20221101/20221201

You will observe that the argument supplied to the {category} parameter is "all-categories". This query will return the total aggregation of metrics across all the categories within a content gap over a given range of months.