Jump to content

Movement Insights/Movement metrics process

From Meta, a Wikimedia project coordination wiki

This is how to produce the monthly movement metrics report.

Wait for dependencies

[edit]

Normally, you only have to wait for these dependencies to arrive. However, sometimes failures happen which means you find yourself blocked waiting for one of these. In that case, you'll need to contact the responsible people and ask them to fix the problem.

Most of these dependencies are produced by Airflow jobs. To check their status, follow the instructions at wikitech:Data Engineering/Systems/Airflow/Instances#Access.

Datasets owned by other teams

[edit]
dataset expected arrival (day of the month) Airflow job notes
mediawiki_​history day 3-5 main:mediawiki_​history_​denormalize We receive an email alert when it is done (T357472)
editors_​daily main:editors_daily_monthly
pageview_​hourly main:pageview_hourly
virtualpageview_​hourly main:virtualpageview_hourly
net new pages API day 5-10 done if contributor and content data for the new month is available on Wikistats
wmf_​readership.unique_devices_​per_​project_family_monthly main:unique_​devices_​per_​project_​family_​monthly
research.article_​features, research.article_​quality_​scores updated daily research:article_features (code) Used to generate content gaps data. Depends on mediawiki_content_history_v1
content_gap_metrics.metric_features, content_gap_metrics.by_category day 11-13 research:knowledge_gaps (code) The notebooks can be safely re-run to incorporate these without affecting previously generated metrics

Datasets owned by Movement Insights

[edit]

Our movement_metrics job, which is scheduled to run on day 7 of the month, generates the following intermediate datasets.

  • wmf_product.active_editors
  • wmf_product.content_interactions
  • wmf_product.global_markets_pageviews
  • wmf_product.editor_month
  • wmf_product.new_editors
  • wmf_product.pageviews_corrected
[edit]
  • Run the movement-metrics notebooks using the instructions in the readme.
  • Assess the metrics and investigate noteworthy trends and draft key takeaways in the Quarto notebook in the prescribed format.
  • Share the generated HTML report via Slack for Omari and other team members to review it.
  • If no changes are required in the report, push the updates to Gitlab, our metric spreadsheet will automatically pick up the newly calculated values.

To push the updates to GitLab, change your working directory to the cloned 'movement-metrics' folder and execute the following git commands:

git add .
git commit -m "Calculate [Month] [Year] metrics"
git push

To authenticate your push to the repository will need to supply your GitLab username and your generated GitLab access token.

Distribute the report

[edit]
  1. Based on the instructions provided here https://wikitech.wikimedia.org/wiki/Data_Platform/Web_publication
    • download the current_monthly_report.html file to your local machine.
    • Open terminal and run scp local_filepath remote_username@remote_host_or_ip:remote_directory, replace filepath, username and host - for example, scp home/current_monthly_report.html xyz@stat10:/srv/published/reports/movement-metrics
  2. ssh to the stat host where you uploaded the file and navigate to /srv/published/reports/movement-metrics
  3. Copy the file to the archive folder by running cp current_monthly_report.html /srv/published/reports/movement-metrics/archive/YYYY-MM.html (make sure to replace "YYYY-MM" with the current month).
  4. Share a few highlights and link the monthly metrics in the #insights-and-data channel on Slack
  5. More places to publish
    1. Add a link of the previous report to Research and Decision Science/Movement Metrics.