InternetArchiveBot/Documentation/Configuring citation maps

From Meta, a Wikimedia project coordination wiki

The Management Interface uses a unique syntax to describe citation templates, i.e., templates used to cite sources on Wikimedia projects. This syntax serves to map the syntax of a template, in that template’s language, to the semantics of InternetArchiveBot.

Basics[edit]

Here is an example of a citation template map:

url={url}|acessadoem={accesstimestamp:automatic}|arquivourl={archiveurl}|arquivodata={archivetimestamp:automatic}|urlmorta={deadvalues:sim:não::yes}

Let’s break this down.

First, note that only some template parameters are needed for a template map. Particularly, only those relevant to the URL and its access are needed. These parameters include:

  • url
  • title
  • titlelink
  • accesstimestamp: requires exactly 1 input (strftime string or “automatic”)
  • archiveurl
  • archivetimestamp: requires exactly 1 input (strftime string or “automatic”)
  • deadvalues: requires exactly 4 inputs. 3rd input can be left blank
  • paywall: requires exactly 4 inputs.
  • doi
  • isbn
  • page
  • permadead: requires exactly 2 inputs
  • timestamp: requires exactly 1 input (strftime string or “automatic”)

There are four additional variables, that automatically assume the parameter is an archivetimestamp:

  • epoch
  • epochbase62
  • microepoch
  • microepochbase62

Each map consists of parameter definitions that are segmented by the vertical pipe character, |. The basic syntax for a map is localname={datafield}, where localname is the parameter name used for that template in the relevant language and datafield is one of the fields above. You can also use static values, where the bot prints the same thing for all cases. For example, for the Portuguese citation template parameter arquivourl, we would map it to the archiveurl data field in this way:

arquivourl={archiveurl}

Aliases[edit]

You can specify more than one name for a parameter by separating each alias with the vertical pipe character, |. If a parameter has multiple aliases, and none of them appear in the given citation, then the primary parameter name will be used by default. Note the syntax below:

localname1|alias1|alias2|alias3={variable}|localname2|alias4|alias5|alias6={variable2}

Using the above example, to accept template parameters with both Portuguese and English names:

archive-url|arquivourl={archiveurl}

If neither archive-url nor arquivourl are found in the citation, archive-url will be added to the citation, as it is the first in the sequence.

Inputs[edit]

Most parameters stand on their own. Some parameters require inputs which are defined values for the parameter. Parameters are defined with this syntax:

localname={datafield:parameter}

Note the colon character : separating the data field and the parameter value.

Archival provider[edit]

InternetArchiveBot is designed to support many different providers of web archiving services in addition to the Internet Archive. The providers are as follows:

  • @wayback: The Internet Archive’s Wayback Machine
  • @europarchive
  • @archiveis
  • @memento
  • @webcite
  • @archiveit
  • @arquivo
  • @loc
  • @warbharvest
  • @bibalex
  • @collectionscanada
  • @veebiarhiiv
  • @vefsafn
  • @proni
  • @spletni
  • @stanford
  • @nationalarchives
  • @parliamentuk
  • @was
  • @permacc
  • @ukwebarchive
  • @wikiwix
  • @catalonianarchive
  • @ghostarchive

Use these provider names to restrict certain or all parameters on a citation template to only using a specific archival provider. For example, templates that were designed to only handle one archival provider, such as Template:Wayback for the Wayback Machine, would use this syntax.

Restricting parameter-value sets to a provider is done as follows:

{@providername|param=value/variable|param2|alias2=value/variable…}

To restrict the entire template to a given provider, wrap {@providername|<templatesyntax>} around your template syntax. See example below:

{@wayback|URL|url={url}|title={title}|date={archivetimestamp:%Y%m%d%H%M%S}}

The above example sets the constraint to only use this template, if the bot needs to supplement a reference with an archive URL from the Wayback Machine.

To restrict a specific subset of parameters to provider, treat {@providername|<templatesyntax>} as a key–value pair. See example below:

url={url}|{@wayback|wayback={archivetimestamp:%Y%m%d%H%M%S}}|{@webcite|webciteID={microepochbase62}}|{@archiveis|archive-is={archivetimestamp:%Y%m%d%H%M%S}}|{@default|archiv-url={archiveurl}|archiv-datum={archivetimestamp:automatic}}|text={title}|archiv-bot={timestamp:%Y-%m-%d %H:%M:%S} InternetArchiveBot

As seen in the above example, you can add multiple provider constraints to the same template map. If there are no constraints for a parameter-value set, then they apply to all archival providers. To prevent unconstrained parameter-value sets from appearing with constrained values, you can wrap those param/value sets with the @default provider. This instructs InternetArchiveBot to only use those parameter-value sets if the conditions for using the constrained param/value sets were not satisfied.

Parameter details[edit]

Timestamp variables[edit]

Any variable that ends in ‘timestamp’ is treated as a timestamp variable. These parameters require exactly one input. For parameters that are timestamps, within the curly braces you need to either specify the format of the timestamp (in strftime PHP timestamp format) or automatic to use the default (as defined in the local wiki configuration).

Currently, there are three timestamp variables.

  • accesstimestamp: represents the last known time the URL was accessed by a reader or editor.
  • archivetimestamp: represents the time the snapshot URL was taken by the archive service in the archive URL.
    • In place of archivetimestamp, these standalone variables also work:
      • epoch: The unix epoch of the archive snapshot time
      • epochbase62: The unix epoch of the archive snapshot time encoded in base 62
      • microepoch: The unix epoch in microseconds of the archive snapshot time
      • microepochbase62 the unix epoch in microseconds of the archive snapshot time encoded in base 62. This is commonly used by webcitation.org

timestamp: only used to present the current timestamp. If the value is not defined, InternetArchiveBot will define the value with the current time

Example:

acessadoem={accesstimestamp:automatic}

acessadoem is recognized as an access date parameter using defaults.

date={timestamp:%B %Y}

This sets the date, if not set, with the current Month and Year.

url[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template are the URL parameters.

title[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template are used for page titles.

titlelink[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template are used to for linking the page titles to internal pages. This is used to prevent link conflicts when the bot adds URLs to citations where the page title is already linking elsewhere.

archiveurl[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template are the archive URL parameters.

doi[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template have Digital Object Identifiers.

isbn[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template have International Standardized Book Numbers.

page[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template have one or more page numbers to the source material.

deadvalues (“URL is dead?”)[edit]

One template parameter concerns the question of whether the URL being cited is dead. The URL being tagged as dead may, in some circumstances, cause the bot to scan the URL to verify the status and/or replace the link. The answer to this question is “yes” or “no,” localized in the language of the template and wiki.

Let’s start with the basic syntax:

urlmorta={deadvalues}

This declares that urlmorta is the “dead values” parameter. However, there is additional syntax for mapping the different possible values:

[value if dead/yes]:[value if alive/no]:[value if unknown status]:[default value]

The third value is optional. The fourth value is the default value used and should be either “yes” or “no” (in English). The fourth value determines whether the bot considers the URL dead, due to the presence of an archive URL, in the absence of the deadvalues parameter. The example for Portuguese:

sim:não::yes

Note that the third parameter is left blank as there is no value. Combining these together you get:

urlmorta={deadvalues:sim:não::yes}

Optionally, you can use two semicolons to operate two different values for the same option. For example, to accept both Portuguese and English values, you would use:

urlmorta={deadvalues:sim;;yes:não;;no::yes}

paywall[edit]

The paywall variable allows the bot to identify websites that restrict access to logged-in accounts or require subscriptions (paid or free). It also allows the bot to set the appropriate parameter if a link to an access-restricted page is added. It requires exactly 4 inputs, but only one at minimum is needed, meaning the other 3 can be left blank. The four values are as follows:

[value if paid subscription]:[value if free registration]:[value if limited preview]:[value if freely accessible]

The “value if paid subscription” is the value that should be used if the citation is to a resource that requires a paid subscription. On English Wikipedia this value is “subscription”.

The “value of free registration” is the value that should be used if the citation is to a resource that requires the user to register a free account. On English Wikipedia this value is “registration”.

The “value if limited preview” is the value that should be used if the citation is to a resource that only makes a limited preview available to unregistered or unsubscribed users. On English Wikipedia this value is “limited”.

The “value if freely accessible” is the value that should be used if the citation is to a freely available resource with no access restriction. On English Wikipedia this value is “free”.

An example:

subscription:registration:limited:free

Optionally, you can use two semicolons to operate two different values for the same option. For example, to accept both Portuguese and English values, you would use:

accesso-url={paywall:subscrição;;subscription:registo;;registro;;registration:limitada;;limited:free}

permadead[edit]

The permadead variable allows the bot to flag a template identifying a dead link with the permanent dead status. This means the bot identified a dead link but was unable to find an archive URL for it. It’s also useful for the bot to recognize whether or not a URL should even be rescued when identified with these flags. The variable requires exactly two inputs, as shown:

[value if yes]:[value if no]

The example for Portuguese:

sim:não

Optionally, you can use two semicolons to operate two different values for the same option. For example, to accept both Portuguese and English values, you would use:

permadead={permadead:sim;;yes:não;;no}