Movement Insights/Movement metrics process
This is how to produce the monthly movement metrics report.
Wait for dependencies
[edit]Normally, you only have to wait for these dependencies to arrive. However, sometimes failures happen which means you find yourself blocked waiting for one of these. In that case, you'll need to contact the responsible people and ask them to fix the problem.
Most of these dependencies are produced by Airflow jobs. To check their status, follow the instructions at wikitech:Data Engineering/Systems/Airflow/Instances#Access.
Datasets owned by other teams
[edit]| dataset | expected arrival (day of the month) | Airflow job | notes |
|---|---|---|---|
| mediawiki_history | day 3-5 | main:mediawiki_history_denormalize | We receive an email alert when it is done (T357472) |
| editors_daily | main:editors_daily_monthly | ||
| pageview_hourly | main:pageview_hourly | ||
| virtualpageview_hourly | main:virtualpageview_hourly | ||
| net new pages API | day 5-10 | done if contributor and content data for the new month is available on Wikistats | |
| wmf_ |
main:unique_devices_per_project_family_monthly | ||
| research.article_features, research.article_quality_scores | updated daily | research:article_features (code) | Used to generate content gaps data. Depends on mediawiki_content_history_v1 |
| content_ |
day 11-13 | research:knowledge_gaps (code) | The notebooks can be safely re-run to incorporate these without affecting previously generated metrics |
Datasets owned by Movement Insights
[edit]Our movement_metrics job, which is scheduled to run on day 7 of the month, generates the following intermediate datasets.
- wmf_product.active_editors
- wmf_product.content_interactions
- wmf_product.global_markets_pageviews
- wmf_product.editor_month
- wmf_product.new_editors
- wmf_product.pageviews_corrected
Run the notebooks, Analyze the trends and Prepare the report
[edit]- Run the movement-metrics notebooks using the instructions in the readme.
- Assess the metrics and investigate noteworthy trends and draft key takeaways in the Quarto notebook in the prescribed format.
- Share the generated HTML report via Slack for Omari and other team members to review it.
- If no changes are required in the report, push the updates to Gitlab, our metric spreadsheet will automatically pick up the newly calculated values.
To push the updates to GitLab, change your working directory to the cloned 'movement-metrics' folder and execute the following git commands:
git add .
git commit -m "Calculate [Month] [Year] metrics"
git push
To authenticate your push to the repository will need to supply your GitLab username and your generated GitLab access token.
Distribute the report
[edit]- Based on the instructions provided here https://wikitech.wikimedia.org/wiki/Data_Platform/Web_publication
- download the
current_monthly_report.htmlfile to your local machine. - Open terminal and run
scp local_filepath remote_username@remote_host_or_ip:remote_directory, replace filepath, username and host - for example,scp home/current_monthly_report.html xyz@stat10:/srv/published/reports/movement-metrics
- download the
- ssh to the stat host where you uploaded the file and navigate to
/srv/published/reports/movement-metrics - Copy the file to the archive folder by running
cp current_monthly_report.html /srv/published/reports/movement-metrics/archive/YYYY-MM.html(make sure to replace "YYYY-MM" with the current month). - Share a few highlights and link the monthly metrics in the #insights-and-data channel on Slack
- More places to publish
- Add a link of the previous report to Research and Decision Science/Movement Metrics.