Research and Decision Science/Data glossary

From Meta, a Wikimedia project coordination wiki

This page collects definitions of essential and core metrics that teams in the Wikimedia Foundation use to guide tactical and strategic decisions. The sibling Data Dictionary page documents various data sources.

Definitions for core metrics and essential metrics[edit]

Core metrics[edit]

Core metrics (also called "Core Annual Plan metrics") are a small, well-defined set of measurements that we use to guide strategic decisions. These are the metrics the Foundation highlights in its annual plan. For FY23-24, the four core metric areas are "Content", "Contributors", "Effectiveness", and "Relevance". Click here for an overview of the core metrics as they relate to the Foundation's FY23-24 Annual Plan. Core metrics are a special subset of essential metrics.

Essential metrics[edit]

Essential metrics are metrics maintained by the Wikimedia Foundation in production which meet the following requirements:

  1. The metric and its measurement is trustable;
  2. The metric is accessible to decision makers;
  3. The metric and its measurement is transparent towards those affected by the decisions made informed by the metric and measurement;
  4. Frequent measurement of the metric is required for decisions of significant importance. This includes:
    • decision making in situations where erroneous decisions can have department, Foundation, or Movement-level impact;
    • decision making about WMF's strategic direction;
    • monitoring of important operations.
  5. The definition and measuring, storage, use, and potential publishing of the metric and its measurement must meet relevant WMF policies and guidelines. More specifically: WMF's Privacy Policy, Data Retention Guidelines, Data Publication Guidelines, and Human Rights Policy.

Please refer to the FAQ and more details to learn more about essential metrics.

Content metrics[edit]

(Note: This section pertains to content metrics that are part of monthly key metrics reporting. These are different from the core metrics content metric.)

The code that calculates these metrics can be found in the movement-metrics repository on Github.

Content
metric definition remarks
Total content pages The total number of existing (non-deleted) pages in content namespaces across all wikis Produced by summing up the net new content pages metric for provided by the Analytics Query Service over all time.
—Wikipedia articles The total number of existing (non-deleted) pages in content names across all Wikipedias. A subset of total content pages. Produced by summing up the net new content pages metric for all Wikipedias.
—Media files The total number of existing (non-deleted) pages in content namespaces on Wikimedia Commons. The large majority of these are file pages, but a small number are gallery pages. A subset of total content pages. Produced by summing up the net new content pages metric for Wikimedia Commons.
—Wikidata entities The number of existing (non-deleted) Wikidata entities. A subset of total content pages. Produced by summing up the net new content pages metric for Wikidata.
Net new content The number of content pages added since the previous month, excluding deleted pages and redirects. Produced by subtracting last month's total content pages metric from this month's.
—Wikipedia articles The number of Wikipedia articles added since the previous month, excluding deleted articles and redirects. A subset of net new content. Produced by subtracting last month's total Wikipedia articles metric from this month's.
—Commons files The number of Wikimedia Commons content pages added since the previous month, excluding deleted pages and redirects. A subset of net new content. Produced by subtracting last month's total media files metric from this month's.
—Wikidata entities The number of Wikidata entities added since the previous month, excluding deleted ones and redirects. A subset of net new content. Produced by subtracting last month's total Wikipedia entities metric from this month's.
Revert rate Number of (bot and non-bot) edits which were reverted at any point before the snapshot divided by non-bot edits made during the given month. Reverted edits are identified using the revision_is_identity_reverted field in the mediawiki_history dataset.
Total edits The total number of edits made across all wikis during the given month. Edits that have been reverted or deleted are included among total edits. Total edits has a "missing" fifth component in addition to the four components included below: the number of other bot edits. Unlike Wikistats, this metric includes edits to pages which were later deleted.
—Mobile edits The number of edits tagged as having been made using the mobile website (which can be used by desktop computers) or the mobile apps.
—Data edits The number of edits to Wikidata, including bot edits.
—File uploads The number of edits, including any bot edits, which uploaded a new file on any project.
—Other non-bot edits The number of non-bot edits which were not on Wikidata and did not upload a new file, minus the number of mobile edits. The current calculation assumes that bot edits made using the mobile site, mobile Wikidata edits, and mobile uploads are negligible.
—Anonymous edits The number of edits made by anonymous users
Content Gap Is a metric that quantifies knowledge gaps in Content by estimating the distribution of pieces of content (e.g., Wikipedia articles, Wikidata items) across different categories (e.g., gender, geographic distribution, cultural background). A complete list of mappings for content can be found here Knowledge gaps are major differences in participation or coverage of a specific group of readers, contributors,or content.

Contributor metrics[edit]

The code that calculates these metrics can be found in the movement-metrics repository on Github.

Editors
metric definition remarks
Active editors The number of registered users who made at least 5 content edits across all projects in the given month.

See meta:Research:Active editor for the full definition.

Unlike Wikistats, this metric includes edits to pages which were later deleted.
—New Active editors who registered during the given month.
—Returning Active editors who registered before the given month.
New editor retention Out of the users who registered in the month before the previous and made at least one edit in their first 30 days, the proportion who also edited during their second 30 days. This includes all edits (including edits to content, talk, and other namespaces) whether or not the edits have been reverted. Unlike Wikistats, this metric includes edits to pages which were later deleted.

Reader metrics[edit]

The code that calculates these metrics can be found in the movement-metrics repository on Github.

Readers
Metric definition remarks
Content Interactions Pageviews (all platforms) + desktop previews (see definitions below).
—Pageviews Full definition: m:R:Page view

Monthly pageviews based on calendar month

Corrected for spurious IE views from some countries

Data source: pageview_hourly

The current calculation includes user agents and automated agents

We use the User Agent to define whether we assign a mobile or desktop experience; User Agents we consider to be mobile are captured in this code on Github. Most tablets are classified as mobile.

——Desktop Same limited to the desktop domains (e.g. en.wikipedia.org) (ditto)
——Mobile Web Same limited to the mobile web domains (e.g. en.m.wikipedia.org, does not include apps) (ditto)
—Desktop previews Seen page previews, defined as previews popups that remain visible for at least one second. Data source: virtualpageview_hourly
Unique Devices Full definition: m:R:Unique Devices

Monthly unique devices for all Wikipedias (includes desktop and mobile web), based on calendar month.

Data source: unique_devices_per_project_family_monthly
Android Uniques Monthly unique users of the Android Wikipedia app. See Android core metrics
Android Installs Installs per day of the Android Wikipedia app, monthly average. See Android core metrics
iOS Downloads First-time downloads of the iOS Wikipedia app from the app store

Does not include updates or installs on an additional device using the same account.

May contain Volume Purchase Program downloads and possibly some artifacts on rare occasions.

Data source: App Annie

Diversity[edit]

A set of contributors, content, or readers metrics (defined above), restricted to:

Platform, structured data, and content use metrics[edit]

The code that calculates these metrics can be found in the Platform-metrics repository on Github.

Metric definition remarks
Non-text Content Used across Wikis The number of non-text content (e.g.Commons files) that being reused on content pages across Wikimedia by the end of current month.
  • Non-text contents are images, audio, video, documents (pdfs), and data (e.g. JSON, et. al). They are stored as commons
  • Redirects are excluded from content pages.
Structured Data Used across Wikis The number of content pages pages across Wikimedia that reuse Wikidata by the end of current month. Redirects are excluded from content pages.
Wikidata Items Reused on Other Wikimedia Projects % of Wikidata items are reused on content pages on other Wikimedia projects by the end of current month. Redirects are excluded from content pages.

Financial Metric: Programmatic Ratio[edit]

The percentage of the budget spent on program expenses, called the Programmatic Ratio, is an important financial metric that we track. Each year, we set a target for our programmatic expense ratio is to align with nonprofit sector best practices.

Independent charity assessment organizations like Charity Navigator help establish these best practices. Charity Navigator sets its benchmark for highest-scoring nonprofits as exceeding >70% programmatic expenses. Charity Navigator provides the following explanation and formula to calculate this ratio:

"Charities exist to provide programs and services. They fulfill the expectations of givers when they allocate a good portion of their budgets toward their stated missions. While administration expenses are necessary for efficient charity operations, organizations that grossly underspend on their programs and services will most likely not have as strong an impact on their charitable missions. We calculate the nonprofit's average program expense percentage over its three most recent fiscal years and then assign a numeric score based on an established scale.

Average Program Expense Percentage = Average Program Expenses ÷ Average Total Expenses (When Calculating Using Form 990) = Average of Part IX line 25B ÷ Average of Part IX line 25A"[1]