The Web2Cit server is the part of the Web2Cit ecosystem that makes the functionalities of the Web2Cit core available via a web service for consumption from other parts of the ecosystem, such as the Web2Cit user script and the Web2Cit monitor, as well as from projects relying on Zotero translators, such as Zotero browser connectors or ZoteroBib.
It returns translation results for one or more target URLs using the corresponding domain configurations defined by Web2Cit collaborators, as available from the Web2Cit storage repository on Meta-Wiki.
How to use
The simplest way to use the translation server is going to its home page (https://web2cit.toolforge.org/) and entering a target URL one would like to get translation results for. Alternatively, just go to
https://web2cit.toolforge.org/<YourTargetURL>. This is an alias of
https://web2cit.toolforge.org/translate?tests=true&url=<YourTargetURL> (see URL query string parameters below).
If you have the Web2Cit user script installed, the translation summary for a target webpage may also be opened from Wikipedia, by clicking the "Web2Cit" link that appears beneath the citation results on the "Add a citation" dialog.
Note that the translation summary returned also includes the translation results as embedded metadata. Therefore, you can use this URL in the unmodified Wikipedia's automatic citation generator (i.e., without the Web2Cit user script installed), or with other tools relying on Zotero translators, such as Zotero browser connectors and ZoteroBib.
Configuration files may be saved to a personal sandbox storage to experiment with them without affecting all Web2Cit users (see the Editing documentation to learn how to do this using the JSON editor).
To instruct the Web2Cit server to use configuration files from your personal sandbox, on the translation summary page enter your Wikimedia username in the field next to "Switch to sandbox configuration:" and click on "Switch". This will use configuration files under
User:<YourUserName>/Web2Cit/data/ on Meta-Wiki.
Alternatively, you may just go to
https://web2cit.toolforge.org/sandbox/<YourUserName>/<YourTargetURL>.[Note 1] This is an alias of
https://web2cit.toolforge.org/translate?tests=true&sandbox=<YourUserName>&url=<YourTargetURL> (see URL query string parameters section below).
Additional debugging information may be included for each translation target, that may help understand why translation may not have worked as expected.
To do so, you may simply click the "Enable debugging" at the bottom of the translation summary page.
Alternatively, you may go to
https://web2cit.toolforge.org/debug/<YourTargetURL>.[Note 1] This is an alias of
https://web2cit.toolforge.org/translate?tests=true&debug=true&url=<YourTargetURL> (see URL query string parameters section below).
The debugging information includes:
For each template:
For each template field:
For each translation procedure:
For each selection or transformation step:
Note that, when parsing configuration files, Web2Cit ignores invalid definitions. These ignored elements will not show up in the debugging information. For example:
URL query string parameters
The Web2Cit server provides a
/translate endpoint which takes a series of URL query string parameters. The URLs indicated in the previous sections are just shortcuts or aliases to this endpoint with specific combinations of parameter values.
Either one of
domain parameters are mandatory. The others are optional or have a default value:
string): the target URL for which a translation result is required.
string): the domain for which translation results must be returned. This is ignored if the
urlparameter was provided.
string | undefined): the path in the domain given for which a translation result must be returned. If
undefined, it will return translation results for all paths for which there is a template or a test case configured (as used by the Web2Cit monitor). This parameter is ignored if the
urlparameter was provided.
false): prepends the citation returned by Citoid to the array of citations returned for each target URL (see T307393 for support with HTML or JSON response fomats).
false): includes debugging information for each translation target. Note this is not supported with MediaWiki response format.
html | json | mediawiki; default:
html): the format in which translation results must be returned (see Response formats section below).
string): an optional parameter indicating the username whose sandbox storage will be used to fetch configurations from (see Sandbox configurations section).
boolean): whether translation tests should be used.
Translation results may be returned in one of three available formats, as requested via the
format URL query string parameter:
This is the default format. It returns a translation summary web page including translation results, grouped by translation target and URL path pattern group.
Translation results list a series of translation fields, each including the translation output (as returned from the applicable translation template), the expected output (as defined in the translation test), and the test score.
A dash (
-) indicates an empty output, and
n/a in the expected output or score columns indicates that an expected output has not been defined.
In addition, the page includes embedded metadata using property names from the
http://www.zotero.org/namespaces/export# vocabulary. These can be directly interpreted by Zotero's Embedded Metadata translator, and hence can be used with the unmodified Wikipedia's automatic citation generator (i.e., without Web2Cit user script installed), or with other tools relying on Zotero translators, such as Zotero browser connectors and ZoteroBib.
This is the most complete return format, and is used by the Web2Cit monitor:
apiVersion: a string indicating the version number of Web2Cit server.
config?: an object with information about the configuration files used (
undefinedin case of fetching error), including:
path: path to the
patterns.jsonconfiguration file on the MediaWiki instance used by the Web2Cit storage. For example,
- revid: revision ID of the
patterns.jsonconfiguration file used;
undefinedif file does not exist or is corrupt.
data?: an object with translation data (
undefinedin case of overall translation error), including:
targets: an array of objects, one per target path requested, each including:
path: the path requested for translation.
href?: the full URL to the target webpage; will be
undefinedif the target path is invalid.
pattern?: the URL path pattern group to which the target path belongs; will be
undefinedif the target path is invalid.
results: an array of translation results, one per translation template. Normally, only one translation result would be returned,[Note 2] corresponding to the first applicable template, or none if no applicable template has been found (see T317448, though). Each translation result includes:
template: an object with information about the translation template used, including:
path?: the path to the webpage on which the template is based;
undefinedif it is the fallback template.
label?: the (optional) fancy name given to the template.
fields: an array of translation field objects, each including:
name: the name of the translation field.
output: an array of strings representing the template field output; if a template field has not been defined for this translation field, an empty array will be returned;
test: an array of strings representing the expected output (as defined in the translation test); will be
undefinedif a test field has not been defined for this translation field.
score: the test score resulting from the comparison between the translation and the expected outputs;
undefinedif an expected output is not available.
score: the average test score across all translation fields;
undefinedif no translation score is defined for any field.
score: the average test score across all translation results (from the same target);
undefinedif no translation score defined for any result.
error?: if an error is thrown during target translation, it will be included here.
debug?: if the server has been called with the
debugoption (see URL query string parameters section above), this will include an object with detailed information for debugging. Check the
DebugJsontype on the
./src/types.tsfile of the w2c-server repository to find out more about it.
score: the average translation score across translation targets;
undefinedif no translation score is defined for any target.
error?: if a general error affecting translation of all targets occurred, the Error object thrown will be included here.
These will be included in a root
error property of the JSON or MediaWiki response formats, or returned as plain text on the HTML response format:
- Invalid query: the URL query string is misformatted. For example: https://web2cit.toolforge.org/translate?abc.
- (Unsupported debug or test modes): MediaWiki format requested and format-unsupported
testsparameters set to true. For example: https://web2cit.toolforge.org/translate?domain=www.example.com&format=mediawiki&debug=true.
- No target: one of the URL shortcuts was used and the target URL was omitted. For example: https://web2cit.toolforge.org/debug/.
- Invalid target: a URL target has been provided, but the URL is invalid. For example: https://web2cit.toolforge.org/translate?format=json&url=abc.
- Invalid domain: a domain target has been provided, but it is invalid. For example: https://web2cit.toolforge.org/translate?domain=abc.
Target-specific errors will be included in an
error property under the corresponding
targets array object of the JSON response format, or as HTML text under the corresponding target section of the HTML response format:
- Invalid path error: the target's path is not a valid path. For example: https://web2cit.toolforge.org/translate?domain=www.example.com&path=abc.
- NoApplicableTemplateError: no applicable template has been found for the translation target. This would happen when not even the fallback template is applicable; see for example T313236.
- Any target translation error included in one of the target outputs returned by the
translatemethod of Web2Cit core's
Domainobject (see the Core documentation).
These errors will not show on the MediaWiki response format, which comprises an array of citations, and targets throwing an error during translation will not return a citation.
Web2Cit server currently responds with HTTP response status code 200, except:
- status code 400 (bad request):
- on Invalid query error (see Errors section),
- on unsupported debug or test mode for MediaWiki format error (see Errors section),
- on No target error (see Errors section),
- on Invalid target error (see Errors section),
- on Invalid domain error (see Errors section);
- status code 404 (not found):
- if no citation has been returned, either because no target paths have been specified, or because no applicable template has been found for any target.
Web2Cit server's Home and HTML-format translation summary pages are translated collaboratively on translatewiki.net, here.
If you need help or would like to report a bug, suggest a feature, etc, you can leave a comment in this page's discussion page, or create a task in Phabricator, with the web2cit-server project tag.
There is an additional server instance running at https://w2c-beta.toolforge.org/. New versions of the Web2Cit server may be available here for public testing, until they are ready for deployment to the production server.
The Web2Cit server's code is available under a GNU GPL v3 license on a Wikimedia GitLab repository: https://gitlab.wikimedia.org/diegodlh/w2c-server.
The code is written in Typescript and built with
tsc. Package management is done with npm.
We use the Express framework to set up the server.
It runs from Toolforge tool account, available at https://web2cit.toolforge.org/
You will need git, node, npm and nvm.
- Clone the w2c-server git repository, or your fork of it:
git clone https://gitlab.wikimedia.org/diegodlh/w2c-server.git.
- Change to the repository's directory:
- To make sure you are using the same version of Node than the one running on Toolforge, run
nvm use. This will install and switch to the Node version indicated in the file
- Install the required dependencies by running
npm install. The repository is configured (via the
./.npmrc) to enforce Node and npm minimum versions indicated in the
package.jsonfile. If you are having trouble with the default version of npm installed with nvm, install a more recent version with
nvm install --latest-npmand run
- To start the development server, run
npm run dev. This should build the project and serve the app from the
./dist/directory to http://localhost:3000/.
Web2Cit core usage
As mentioned in the introduction, the Web2Cit server exposes the functionalities of the Web2Cit core via a web service. For this reason, the Web2Cit core (npm package web2cit) is one of its dependencies.
To better understand how Web2Cit server makes use of this library, you may check the brief explanation included in the Core documentation to showcase Web2Cit core capabilities.
Using local Web2Cit core
If you want that Web2Cit server uses your local build of Web2Cit core, you need to follow these steps:
- On the Web2Cit core directory:
npm link. This will create a global symlink for the Web2Cit core dependency.
npm build. Make sure you run this again after any changes made to the Web2Cit core source code.
- On the Web2Cit server directory
npm link web2cit. This will make the Web2Cit server use the global symlink for Web2Cit core instead of the package downloaded from npm.
npm install. You may need to run this again after some changes made to the Web2Cit core source code.
Debug with Visual Studio Code
The repository includes a
.vscode/launch.json debug configuration file for the Visual Studio Code editor, with custom settings to ensure breakpoints can be set onto source files from the web2cit module (
configuration/outFiles), even if a local build is being used (
--preserve-symlinks as runtime argument).
To start debugging:
- Build Web2Cit server using
npm run buildas explained above.
- On Visual Studio Code, open the Run and Debug pane and start debugging using the Launch Program custom configuration. This will run
node dist/app.jsand attach the debugger to it. You should see a
server is listening on 3000!message on the debug console.
- Set breakpoints where you want the program to stop for debugging purposes.
Automatic tests of the Web2Cit server have not been implemented yet. See T305564.
However, in the meantime, it may be worth it considering the Web2Cit monitor (which uses translation tests defined by Web2Cit collaborators) as a way to semi-automatically check that changes made to the server's source code do not result in unexpected side effects:
- Download and install the Web2Cit monitor locally.
- Run the monitor with
--logarguments, to run checks for all configured domains, and (importantly!) to write results locally, respectively.
- Move the result files to a separate directory, to avoid overwriting them below.
- Change the monitor's source code to use the server build that wants to be tested and run it again with the same arguments.
- Finally, use a diff tool to compare both sets of result files and identify unexpected differences between them.
This test procedure is a temporary workaround and not a replacement for proper automatic tests. For example, tests would be limited to the set of server functions used by the monitor, and to the features relied upon by the collaboratively defined translation tests. In addition, it involves fetching data from third-party web servers, whose responses may also change upon repeated requests.
Use the changelog to document changes, as described here. Keep changes under the "Unreleased" section, until a new version is ready to be deployed (see below).
To deploy the server to production, repeat steps 1-3 above on your host. Read below for the special case of Toolforge.
Running from Toolforge
These section describes how Web2Cit server is set up to run from the Toolforge servers. If you want to run Web2Cit server locally or on a private host, you won't need this.
The Web2Cit server is running from the
web2cit Toolforge account. The following steps were followed to set up and run the web server:
- Login to Toolforge:
ssh login.toolforge.org. Note that you must have Toolforge access to do this. Follow the steps here if you don't.
- Become the web2cit tool account:
become web2cit. This account was created by following the steps here. Note that there is an alternative
w2c-betaaccount for tests.
- Clone the git repository.
- Edit the
service.templatefile so that
webservicecommands below use the following predefined arguments:
- Open a webservice shell to make sure you are using the right version of node and npm to install dependencies and build:
- Change to the
w2c-serverdirectory and run
npm run build. This will compile the source code to the
exitto quit from the webservice shell.
- Change to the
webservice start. By convention, this will run
w2c-server/dist) to start the web server.
Note that the web2cit account also hosts the (pre-alpha) Web2Cit integrated editor. Its files are statically served from
www/static (symlinked to
https://tools-static.wmflabs.org/web2cit/. See the integrated editor documentation for the details.
To check the logs of the Kubernetes container initialized by
kubectl get podsto find the name of the container's parent pod.
kubectl logs <pod_name>to see the logs of the container's current instantiation. Because containers will be restarted automatically upon failure, to see the logs of the container's previous instantiation (e.g. after a crash) add
--previousat the end of the command.
Consider creating a new version before deployment. To do so:
- Move changes from the "Unreleased" section at the top of the changelog file to the new version's section.
npm version --no-git-tag-versionwith the corresponding version increment argument (e.g.,
--no-git-tag-versionto skip the automatic commit.
- Commit as "Bump vX.Y.Z" and tag as "vX.Y.Z".
- Push commit and tag.
Home and result pages
The home and HTML-format result pages are created using React, and server-side rendered using
Home and HTML-format result pages are internationalized using i18next.
Translated messages are located under
See T317044 for discussion around collaboratively translating this via Translatewiki.
In addition to returning Web2Cit translation results at the root (
/translate endpoints, the Web2Cit server currently serves the JSON configuration file editor statically at
As described in the JSON editor documentation, this JSON editor uses some URL query parameters, including a link to a JSON schema file, to render a JSON editing wizard.
The source code for this JSON editor is currently part of the Web2Cit server repository, but it is prepared to be split into a separate general-purpose MediaWiki JSON editor project if desired (see T306837).
Linting and formatting
We use ESLinter as linter, with Typescript support via
Prettier is used as formatter, and
eslint-config-prettier is used to disable ESLint rules that may conflict with it.
lint-staged runs ESLinter and Prettier via a Husky pre-commit hook.
JSDOM is used to create the window object that is needed to create Web2Cit core's root
Domain object. This is currently used to provide XPath functionality.
- Sandbox and debugging aliases can be combined as
- More than one translation result may be supported if the server ever supports returning results for all applicable templates, a function supported by Web2Cit core via the
allTemplatesoption of the
translatemethod. Also, see T307393 for another case where more than one translation result could be returned.