InternetArchiveBot/Documentation/Configuring archive templates

From Meta, a Wikimedia project coordination wiki

About[edit]

Archive templates are necessary to supplement dead links with archive URL and appropriate metadata. They are appended to end of the dead URL. Without them, if a bare URL, or otherwise known as a URL not in a citation template, needed to be fixed, it would be outright replaced with an archive URL. The bot needs to know about the archive templates and a map of how they are formatted on wiki allows the bot to use the template to it's fullest extent.

Adding/editing an archive template[edit]

Under Meta, click Adjust system configuration
Figure 1
Select Configure global bot definitions
Figure 2
  1. To add an archive template, navigate to "Adjust system configuration" under the Meta menu, as seen in Figure 1.
  2. Select the pencil button for "Configure global bot definitions", as seen in Figure 2.
  3. Select the pencil button for "Define archive templates", as seen in Figure 3.

You will be presented with the archive template definitions editor. You can click the + button to add a new template definition, or the pencil icons of the respective existing definitions to edit the template definition.

Select Define archive templates
Figure 3

At least one template must be defined in the bot for the bot to work. Even if it isn't intended to be used.


Once you have a correctly defined archive template, it will show up under Configure bot behavior as a usable template. To activate the usage, simply list the name of the template as if it were being transcluded without parameters on wiki. You can add more than one. Seperate each instance with a comma. Ex: {{Webarchive}},{{Webarchiv}},{{WhateverElseTemplatesYouWantToAdd}}

Editor items[edit]

In the editor there are three different settings to be defined for the map.

  1. Template name: This is the name of the map definition. It is not the name that is used on wiki. This allows for quick and easy identification of which definition you are using when configuring the bot.
  2. Template behavior: This defines how the template should be applied when the bot is using it.
    1. Append template to URL: Add it behind the URL being rescued
    2. Replace original URL with template: If the template is designed to replace the original URL because the template renders the original URL and the archive, use this option. The bot will then replace the link with this template.
  3. Template syntax: The syntax of the map to be defined or edited. See below for how to edit the syntax.

Basics[edit]

Here is an example of an archive template map:

url|1={archiveurl}|date|2={archivetimestamp:automatic}|title={title}

Let’s break this down.

First, note that only some template parameters are needed for a template map. Particularly, only those relevant to the archive URL are needed. If the definitions editor cannot validate the existence of an archive URL, then the submission will be rejected. These parameters include:

  • url
  • title
  • archiveurl
  • archivetimestamp: requires exactly 1 input (strftime string or “automatic”)
  • permadead: requires exactly 2 inputs
  • timestamp: requires exactly 1 input (strftime string or “automatic”)

There are four additional variables, that automatically assume the parameter is an archivetimestamp:

  • epoch
  • epochbase62
  • microepoch
  • microepochbase62

Each map consists of parameter definitions that are segmented by the vertical pipe character, |. The basic syntax for a map is localname={datafield}, where localname is the parameter name used for that template in the relevant language and datafield is one of the fields above. You can also use static values, where the bot prints the same thing for all cases. For example, for the Portuguese citation template parameter arquivourl, we would map it to the archiveurl data field in this way:

arquivourl={archiveurl}

Aliases[edit]

You can specify more than one name for a parameter by separating each alias with the vertical pipe character, |. If a parameter has multiple aliases, and none of them appear in the given citation, then the primary parameter name will be used by default. Note the syntax below:

localname1|alias1|alias2|alias3={variable}|localname2|alias4|alias5|alias6={variable2}

Using the above example, to accept template parameters with both Portuguese and English names:

archive-url|arquivourl={archiveurl}

If neither archive-url nor arquivourl are found in the citation, archive-url will be added to the citation, as it is the first in the sequence.

Inputs[edit]

Most parameters stand on their own. Some parameters require inputs which are defined values for the parameter. Parameters are defined with this syntax:

localname={datafield:parameter}

Note the colon character : separating the data field and the parameter value.

Archival provider[edit]

InternetArchiveBot is designed to support many different providers of web archiving services in addition to the Internet Archive. The providers are as follows:

  • @wayback: The Internet Archive’s Wayback Machine
  • @europarchive
  • @archiveis
  • @memento
  • @webcite
  • @archiveit
  • @arquivo
  • @loc
  • @warbharvest
  • @bibalex
  • @collectionscanada
  • @veebiarhiiv
  • @vefsafn
  • @proni
  • @spletni
  • @stanford
  • @nationalarchives
  • @parliamentuk
  • @was
  • @permacc
  • @ukwebarchive
  • @wikiwix
  • @catalonianarchive
  • @ghostarchive

Use these provider names to restrict certain or all parameters on a citation template to only using a specific archival provider. For example, templates that were designed to only handle one archival provider, such as Template:Wayback for the Wayback Machine, would use this syntax.

Restricting parameter-value sets to a provider is done as follows:

{@providername|param=value/variable|param2|alias2=value/variable…}

To restrict the entire template to a given provider, wrap {@providername|<templatesyntax>} around your template syntax. See example below:

{@wayback|URL|url={url}|title={title}|date={archivetimestamp:%Y%m%d%H%M%S}}

The above example sets the constraint to only use this template, if the bot needs to supplement a reference with an archive URL from the Wayback Machine.

To restrict a specific subset of parameters to provider, treat {@providername|<templatesyntax>} as a key–value pair. See example below:

url={url}|{@wayback|wayback={archivetimestamp:%Y%m%d%H%M%S}}|{@webcite|webciteID={microepochbase62}}|{@archiveis|archive-is={archivetimestamp:%Y%m%d%H%M%S}}|{@default|archiv-url={archiveurl}|archiv-datum={archivetimestamp:automatic}}|text={title}|archiv-bot={timestamp:%Y-%m-%d %H:%M:%S} InternetArchiveBot

As seen in the above example, you can add multiple provider constraints to the same template map. If there are no constraints for a parameter-value set, then they apply to all archival providers. To prevent unconstrained parameter-value sets from appearing with constrained values, you can wrap those param/value sets with the @default provider. This instructs InternetArchiveBot to only use those parameter-value sets if the conditions for using the constrained param/value sets were not satisfied.

Parameter details[edit]

Timestamp variables[edit]

Any variable that ends in ‘timestamp’ is treated as a timestamp variable. These parameters require exactly one input. For parameters that are timestamps, within the curly braces you need to either specify the format of the timestamp (in strftime PHP timestamp format) or automatic to use the default (as defined in the local wiki configuration).

Currently, there are three timestamp variables.

  • accesstimestamp: represents the last known time the URL was accessed by a reader or editor.
  • archivetimestamp: represents the time the snapshot URL was taken by the archive service in the archive URL.
    • In place of archivetimestamp, these standalone variables also work:
      • epoch: The unix epoch of the archive snapshot time
      • epochbase62: The unix epoch of the archive snapshot time encoded in base 62
      • microepoch: The unix epoch in microseconds of the archive snapshot time
      • microepochbase62 the unix epoch in microseconds of the archive snapshot time encoded in base 62. This is commonly used by webcitation.org

timestamp: only used to present the current timestamp. If the value is not defined, InternetArchiveBot will define the value with the current time

Example:

acessadoem={accesstimestamp:automatic}

acessadoem is recognized as an access date parameter using defaults.

date={timestamp:%B %Y}

This sets the date, if not set, with the current Month and Year.

url[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template are the URL parameters.

title[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template are used for page titles.

titlelink[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template are used to for linking the page titles to internal pages. This is used to prevent link conflicts when the bot adds URLs to citations where the page title is already linking elsewhere.

archiveurl[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template are the archive URL parameters.

doi[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template have Digital Object Identifiers.

isbn[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template have International Standardized Book Numbers.

page[edit]

This is a standalone variable. It allows the bot to recognize which parameters in a cite template have one or more page numbers to the source material.

deadvalues (“URL is dead?”)[edit]

One template parameter concerns the question of whether the URL being cited is dead. The URL being tagged as dead may, in some circumstances, cause the bot to scan the URL to verify the status and/or replace the link. The answer to this question is “yes” or “no,” localized in the language of the template and wiki.

Let’s start with the basic syntax:

urlmorta={deadvalues}

This declares that urlmorta is the “dead values” parameter. However, there is additional syntax for mapping the different possible values:

[value if dead/yes]:[value if alive/no]:[value if unknown status]:[default value]

The third value is optional. The fourth value is the default value used and should be either “yes” or “no” (in English). The fourth value determines whether the bot considers the URL dead, due to the presence of an archive URL, in the absence of the deadvalues parameter. The example for Portuguese:

sim:não::yes

Note that the third parameter is left blank as there is no value. Combining these together you get:

urlmorta={deadvalues:sim:não::yes}

Optionally, you can use two semicolons to operate two different values for the same option. For example, to accept both Portuguese and English values, you would use:

urlmorta={deadvalues:sim;;yes:não;;no::yes}

paywall[edit]

The paywall variable allows the bot to identify websites that restrict access to logged-in accounts or require subscriptions (paid or free). It also allows the bot to set the appropriate parameter if a link to an access-restricted page is added. It requires exactly 4 inputs, but only one at minimum is needed, meaning the other 3 can be left blank. The four values are as follows:

[value if paid subscription]:[value if free registration]:[value if limited preview]:[value if freely accessible]

The “value if paid subscription” is the value that should be used if the citation is to a resource that requires a paid subscription. On English Wikipedia this value is “subscription”.

The “value of free registration” is the value that should be used if the citation is to a resource that requires the user to register a free account. On English Wikipedia this value is “registration”.

The “value if limited preview” is the value that should be used if the citation is to a resource that only makes a limited preview available to unregistered or unsubscribed users. On English Wikipedia this value is “limited”.

The “value if freely accessible” is the value that should be used if the citation is to a freely available resource with no access restriction. On English Wikipedia this value is “free”.

An example:

subscription:registration:limited:free

Optionally, you can use two semicolons to operate two different values for the same option. For example, to accept both Portuguese and English values, you would use:

accesso-url={paywall:subscrição;;subscription:registo;;registro;;registration:limitada;;limited:free}

permadead[edit]

The permadead variable allows the bot to flag a template identifying a dead link with the permanent dead status. This means the bot identified a dead link but was unable to find an archive URL for it. It’s also useful for the bot to recognize whether or not a URL should even be rescued when identified with these flags. The variable requires exactly two inputs, as shown:

[value if yes]:[value if no]

The example for Portuguese:

sim:não

Optionally, you can use two semicolons to operate two different values for the same option. For example, to accept both Portuguese and English values, you would use:

permadead={permadead:sim;;yes:não;;no}