Movement Insights/Movement metrics process

From Meta, a Wikimedia project coordination wiki

This is how to produce the monthly movement metrics report.

Wait for dependencies[edit]

Normally, you only have to wait for these dependencies to arrive. However, sometimes failures happen which means you find yourself blocked waiting for one of these. In that case, you'll need to contact the responsible people and ask them to fix the problem.

Most of these dependencies are produced by Airflow jobs. To check their status, follow the instructions at wikitech:Data Engineering/Systems/Airflow/Instances#Access.

External datasets[edit]

dataset expected arrival (day of the month) Airflow instance Airflow job name notes
mediawiki_​history day 3-5 analytics mediawiki_​history_​denormalize We receive an email alert when it is done (T357472)
editors_​daily analytics editors_daily_monthly
pageview_​hourly analytics pageview_hourly
virtualpageview_​hourly analytics virtualpageview_hourly
net new pages API day 5-10 (to check, see if contributor and content data for the new month is available on Wikistats.) The update process is currently manual, and often gets delayed or forgotten. The Data Products team is working on automating it (T355536)
wmf_​readership.​​unique_​devices_​per_​project_​family_​monthly analytics unique_​devices_​per_​project_​family_​monthly
mediawiki_​wikitext_​history day 10-12 analytics mediawiki_​wikitext_​history Used to generate research.article_​features and research.article_​quality_​scores
research.article_​features, research.article_​quality_​scores day 11-13 research research_article_quality Used to generate content gaps data
content_​gap​_metrics.​by_​category​_all_​wikis day 11-13 research knowledge_gaps The notebooks can be safely re-run to incorporate these without affecting previously generated metrics

movement_metrics job[edit]

Our movement_metrics job, which is scheduled to run on day 7 of the month, generates the following intermediate datasets.

  • wmf_product.active_editors
  • wmf_product.content_interactions
  • wmf_product.global_markets_pageviews
  • wmf_product.editor_month
  • wmf_product.new_editors
  • wmf_product.pageviews_corrected

Prepare Google Sheets API access[edit]

In order to run the notebooks, you will need to set up a service account with access to the Google Sheets where we upload copies of the metrics.

  • Go to the Google Cloud Console.
  • Sign in with your WMF email account.
  • Click on the project dropdown near the top-left of the page and select "New Project". Enter any project name and an organization name. Then click on ‘Create’.
  • Make sure your new project is selected and navigate to the "APIs & Services".
  • Click on “+ ENABLE APIS AND SERVICES”, search for “Google Sheets API” in the API library, select it, and click “Enable” to enable the API for your project.
  • Similarly, search for and enable the ‘Google Drive API’.
  • Now create the service account associated with your WMF Gmail account that will access the Google Sheet. On the left-hand sidebar select ‘API > Credentials’ > ‘+ Create Credentials’ > ‘Service Account’.
  • Enter a service account name and it will generate a service account ID. Click on Create and Continue, choose ‘Editor’ under the ‘basic’ section. Press ‘CONTINUE’. The third step can be left blank and proceed to ‘Done’.
  • You should now see the service account email ID you just created on the page; click on it and select ‘Keys’ > ‘Add key’ > ‘Create new key’ > ‘Json’. The JSON key file will be downloaded automatically to your computer.
  • For each of the sheets (readers and editors and content), select ‘share’ on the top right corner and enter your full service account email with "editor" permissions.

Run the notebooks[edit]

  1. Run the movement-metrics notebooks using the instructions in the readme.

Analyze the trends and prepare the report[edit]

  1. Assess reports and investigate noteworthy trends
  2. Copy the key graphs to slides in the Prep - Movement Metrics deck
  3. Draft summary message in the Summary draft - Movement Metrics doc.
  4. Have Omari and other team members review it

Distribute the report[edit]

  1. Move finished slides to the Movement Metrics deck.
  2. Share the update in the #insights-and-data channel on Slack
  3. Publish slides.
    1. Upload to Commons. Source: your own work. Author: Wikimedia Foundation Movement Insights Team. Copyright template: {{WMF-staff-upload}}
    2. Replace the previous report on Research and Decision Science with the new one.
    3. Add the previous report to Research and Decision Science/Movement Metrics.