Jump to content

Movement Insights/Movement metrics process

From Meta, a Wikimedia project coordination wiki

The instructions for the monthly movement metrics report are mostly in the readme of the code repository.

Data dependencies

[edit]

Normally, you only have to wait for these dependencies to arrive. However, sometimes failures happen which means you find yourself blocked waiting for one of these. In that case, you'll need to contact the responsible people and ask them to fix the problem.

Most of these dependencies are produced by Airflow jobs. To check their status, follow the instructions at wikitech:Data Engineering/Systems/Airflow/Instances#Access.

Datasets owned by other teams

[edit]
dataset expected arrival (day of the month) Airflow job notes
mediawiki_​history day 3-5 main:mediawiki_​history_​denormalize We receive an email alert when it is done (T357472)
editors_​daily main:editors_daily_monthly
pageview_​hourly main:pageview_hourly
virtualpageview_​hourly main:virtualpageview_hourly
net new pages API day 5-10 done if contributor and content data for the new month is available on Wikistats
wmf_​readership.unique_devices_​per_​project_family_monthly main:unique_​devices_​per_​project_​family_​monthly
research.article_​features, research.article_​quality_​scores updated daily research:article_features (code) Used to generate content gaps data. Depends on mediawiki_content_history_v1
content_gap_metrics.metric_features, content_gap_metrics.by_category day 11-13 research:knowledge_gaps (code) The notebooks can be safely re-run to incorporate these without affecting previously generated metrics

Datasets owned by Movement Insights

[edit]

Our movement_metrics job, which is scheduled to run on day 7 of the month, generates the following intermediate datasets.

  • wmf_product.active_editors
  • wmf_product.content_interactions
  • wmf_product.global_markets_pageviews
  • wmf_product.editor_month
  • wmf_product.new_editors
  • wmf_product.pageviews_corrected