Jump to content

Web2Cit/Docs/Monitor

From Meta, a Wikimedia project coordination wiki

The Web2Cit monitor is the part of the Web2Cit ecosystem that uses the collaboratively defined translation tests to run regular checks of the Web2Cit translation system, and writes these results on-wiki for easy and prompt identification of domains which may need fixing.

How to use

[edit]

Overview page

[edit]

At meta:Web2Cit/monitor.

A list of all domains configured in Web2Cit, with a summary of the last check for each, transcluded from domain log pages (see below).

See draft example here.

Domain test result page

[edit]

At meta:Web2Cit/monitor/com/example/www/results.

Saving this to meta:Web2Cit/monitor/com/example/www was considered, but a log sub-domain may conflict with its parent's checks log page (see below)

These pages are meant to be updated only when the test results for a domain change. That is, they won't be updated necessarily every time a check is run. Because of this, users wanting to be notified when test results change for a given domain, will watch (subscribe to) these pages.[Notes 1]

See draft example here.

These result pages may be categorized with either Web2Cit passing tests or Web2Cit failing tests categories.

Domain checks log page

[edit]

At meta:Web2Cit/monitor/com/example/www/log.

A list of checks run for a given domain. Every time a check is run, a new row is added at the top.

See draft example here.

Information concerning the version of the target paths (such as checksums of their corresponding HTML and Citoid responses) may be useful. However, we would need one row per target path for this (i.e., multiple rows per check). We cannot save this to the results page, because that page should only change if test results change. We may consider this when saving to a custom database (instead of Meta; see below).

Issues

[edit]

Please report any issues to this page's discussion page, or to Phabricator using the w2c-monitor project tag.

Development

[edit]

Where the source code is, setting up a local development environment, running tests, deploying to Toolforge, etc...

Installation

[edit]

You need pip installed in your system.

pip install -r requirements.txt

How it works

[edit]

The Web2Cit monitor is implemented in Python, which allows the reading, interpretation, and writing of the tests carried out over the configurations that are necessary for Web2Cit to work as a Citoid complement.

Installation process

[edit]

Installing

[edit]

To install Web2Cit-monitor, it is necessary to clone the repository https://gitlab.wikimedia.org/superzerocool/w2c-monitor.

To make sure you are using the same Python version that we are using on Toolforge, you can use pyenv to use the version indicated in ./.python-version.

Install the dependencies through the command:

pip -r requirements.txt

It supports the use of virtual environments and can be run as indicated in the Toolforge Python task configuration.

It should be considered that temporarily only writing in local logs and in Meta is available without having the option of obtaining the parsed JSON from the responses of the endpoint. This configuration could be an interpretation gateway that web2cit-server does.

Database

[edit]

To generate the database that serves as the work queue store, only the copy command must be executed since a small database is distributed with the model already implemented.

cp ./db/monitor.sqlite.dist ./db/monitor.sqlite

With this, the work queue can now be generated.

Write Credentials in Meta

[edit]

To generate write credentials in Meta, you must create a user account and then request an OAuth v2 token in order to connect to the wiki with a write user.

The permits that must be requested are:

  • Perform high-volume activity
  • Interact with pages
  • Perform administrative actions (this would be used to revert changed not made by the bot)

When getting authentication tokens, they should be stored in the user-config.py file with the following logic:

mylang = 'meta'
family = 'meta'
usernames['meta']['meta'] = 'MY BOT NAME'

authenticate['meta.wikimedia.org'] = ('consumer_token', 'consumer_secret', 'access_token', 'access_secret')

More information on this process can be found at https://www.mediawiki.org/wiki/Manual:Pywikibot/OAuth.

If running on Toolforge, see below for instructions to store credentials as environment variables instead.

Reading from the Web2Cit API

[edit]

The reading process is done by querying the Web2Cit server with the endpoint configuration using domains or using all the domains published on Meta-Wiki in the space meta:Web2Cit/data, which can obtain the patterns.json files, templates.json, and tests.json files.

This reading from the configuration files is done considering the prefix where the configuration and test JSON files exist. When doing this process, it is verified that the templates.json or tests.json files exist in order to be considered as part of the domains configured to operate with Web2Cit.

The API checks are performed for each domain, and the results are returned from a JSON sent by the server that contains information about the evaluation of each path that the test or template files have, just as the API does. server web interface.

Meta Write

[edit]

The information dump process is carried out directly in Meta-Wiki, using pywikibot. It stores, under a single prefix, the evaluation of all the domains that are carried out. This storage is done on 3 different pages:

  • A general results page, which compiles all the domains that have been checked at least once by the monitor,
  • A results page per domain, summarizing the evaluation of each path and the components of the paths, along with the score obtained by each evaluation, and
  • A page that allows you to summarize the results of the evaluation (score per domain) and links to the evaluation made by the monitor at the time of review (or a history of the evaluation)

This writing process to Meta can be replaced with a recording in local logs, which are used in a local demonstration or debug mode that avoids writing to Meta. See Manual execution below.

Change Monitor

[edit]

The check for changes and the addition of new domains to the check is done constantly, established by a process (monitor.py) repeatedly called every 20 minutes on Toolforge. This review process allows us to identify:

  • new domains that can be checked ("first run" trigger);
  • the domains whose configuration file was modified in this period of time ("changed configuration" trigger); and
  • domains that have not been checked in a period of time ("programmed" trigger).

This checking process adds the domains to a work queue so that they are executed by an execution process that checks the work queue to process the changes requested in this period. The job queue execution (runner.py) frequency is set to every 1 hour on Toolforge.

The work queue is managed with a simple SQLite database in order to have a single place and file that concentrates all the information about the execution of the domains and keeps track of pending executions.

Solution architecture

[edit]

The recurring check problem is divided into various sub-packages within the repository, which is connected, through classes, with the rest of the packages.

  • web2citwrapper: it has the consumption logic of the Web2cit-server API, which allows the query and import of results using the domain or path query directly.
  • monitor: it has the logic of evaluating and obtaining files and domains to check using the Mediawiki API
  • writer: has the logic of writing results in Meta, using Mako templates to simplify the writing process in wikisyntax.

Functional commands

[edit]

Monitor

[edit]

To run the check process or monitor, the following command must be invoked

./bin/python3 monitor.py

Which, by default, will run checks looking for the changes in the configuration files of the domains that have occurred within the last 1 hour, domains that have not been checked in the last 30 consecutive days, and domains that have never been checked.

This command only checks for the existence of these changes and generates the work queue in SQLite so that the runner command can run the check.

Runner

[edit]

To execute the writing process from the work queue, the following command must be invoked:

 ./bin/python3 runner.py

This command will search for all domains pending checks whose execution time has expired to enqueue the work within the pending work queue. Once executed, it changes its internal state so as not to requeue the job within the job queue.

Manual execution

[edit]
(not recommended, expert only)

If you want to make a manual execution without waiting for the runner, you can execute the command

./bin/python3 main.py --domain <domain> --trigger <trigger>

This allows the executions to be carried out manually, indicating as a trigger the reason why the command is executed manually.

To write locally and avoid writing to Meta-Wiki, add the --log flag to the command above. Results will be written to ./logs/.

How to use in the Wikimedia wikis

[edit]
Resume

The account used to run the bot on Toolforge is on w2cmon account.

A bootstrap script is run once to create a Python virtual environment at ./pyenv to run the jobs (see wikitech:Help:Toolforge/Python#Jobs).

Two Toolforge scheduled jobs (or cronjobs) are running, loaded from a custom jobs YAML file.

~/jobs.yaml
- name: runner
  command: cd $PWD/w2c-monitor && ../pyvenv/bin/python3 runner.py
  image: tf-python39
  schedule: "0 * * * *"
- name: monitor
  command: cd $PWD/w2c-monitor && ../pyvenv/bin/python3 monitor.py
  image: tf-python39    
  no-filelog: true
  schedule: "*/20 * * * *"

runner job logs are saved to ~/runner.out and ~/runner.err files.

The configured bot account to make changes is Web2cit-monitor-bot. Necessary credentials are saved as environment variables using toolforge envvars create (see wikitech:Help:Toolforge/Running Pywikibot scripts#Setup).

Following changes

To follow changes, you could use the monitor page to see new domains, or use this list to see the latest 15 changes.

List of abbreviations:
D
Wikidata edit
N
This edit created a new page (also see list of new pages)
m
This is a minor edit
b
This edit was performed by a bot
(±123)
The page size changed by this number of bytes

6 March 2026

5 March 2026

Notes

[edit]
  1. This is currently not working because email notifications are disabled for all bot-made edits. See T329573.