InternetArchiveBot/Documentation/Configuring bot behavior

From Meta, a Wikimedia project coordination wiki
  • Links to scan:
    • All external links on an article: If this setting is selected, InternetArchiveBot will check every link on a given article to see if it resolves or not.
    • Only links within references: If this setting is selected, InternetArchiveBot will only check links for availability if they are included within <ref> tags.
  • Scanned links to modify:
    • All links: If this setting is selected, InternetArchiveBot will add archive URLs to all links regardless if they are considered dead or alive.
    • Only tagged dead links: If this setting is selected, InternetArchiveBot will add archive URLs to all links that have been marked as dead with a template.  This option overrides the below setting on “How to handle tagged links”
    • All dead links: If this setting is selected, InternetArchiveBot will add archive URLs to all links that it considers to be dead.
  • How to handle tagged links:
    • Remove all tags if links are whitelisted or alive: If this setting is selected, InternetArchiveBot will actively remove dead link templates and markers if any URL it encounters is assumed to be alive (whitelisted), or is tested to be alive.
    • Treat tagged links as dead: If this setting is selected, InternetArchiveBot will treat any tagged link as dead regardless of its own internal assessment of the URL.
  • Tagging dead citation templates:
    • Tag with a dead link template: InternetArchiveBot will append a dead link template to a citation template, instead of setting the URL’s live state on the cite template.
    • Use the dead link parameter on the cite template: InternetArchiveBot will set the URL’s live state from within the cite template.
  • Pages to scan:
    • Scan all mainspace pages: This will scan all content pages on the wiki for dead link
    • Scan only pages containing dead link tags: This will only scan those pages that contain the “dead link” template, ignoring the other pages.
  • Archive versions:
    • Use newest archives (applies to newly searched archives): When adding an archival link, or replacing a link with an archival version, use the most recent archive of that resource available.
    • Use archives closest to the access date (applies to newly searched archives): When adding an archival link, or replacing a link with an archival version, use the archive that is closest to the originally cited access date for that resource. When the access date is not written down, the bot infers the access date from the timestamp of the edit.
  • Modify existing archives: Should InternetArchiveBot use its own preferred archives over what already exists there?
    • No (URLs in place of archive URLs that are not valid archives will still be replaced)
    • Yes
  • Leave talk page messages: Should InternetArchiveBot leave a message on the article’s talk page after editing that article?
    • No
    • Yes
  • Only leave talk page messages: One option is for InternetArchiveBot to not make any edits to live content, but instead make recommendations on the talk page. If you select “yes,” the bot will not make any edits to content and only post talk page messages, regardless of what is selected for “Leave talk page messages”.
    • No
    • Yes (This overrides the option above)
    • Do not edit links outside of references and leave a talk page message instead. (This overrides the option above)
  • Leave archiving errors on talk pages: The bot may encounter errors when submitting URLs to the Wayback Machine for archiving. If this option is enabled, the bot will leave messages on talk pages whenever it fails to submit a URL to the Wayback Machine.
    • No
    • Yes
  • Leave verbose talk messages: If this option is selected, the bot will leave more detailed talk page messages, if it is configured to leave talk page messages.
    • No
    • Yes
  • Talk message section header: A wiki markup formatted string of text that will serve as the section header for regular talk page messages.
  • Talk message: A wiki markup formatted block of text that will serve as the body of the regular talk page message.
  • Talk message only section header: A wiki markup formatted string of text that will serve as the section header for talk page recommendations.
  • Talk only message: A wiki markup formatted block of text that will serve as the body of the message to leave recommendations.
  • Talk message error section header: A wiki markup formatted string of text that will serve as the section header of the error message being left behind.
  • Talk error message: A wiki markup formatted block of text that will serve as the body of the message containing details about archive errors that the Wayback Machine left.
  • Default used date formats: Formatted like a switch-case statement, it’s a new line separated list of acceptable formatting strings that the bot should recognize with the last string being the default and beginning with the term “@default: “.  Refer to the strftime PHP manual for how the strings should be formatted.
  • Opt-out tags: If a template(s) is specified here, the bot will not edit any references that include this template.
  • Talk-only tags: If a template(s) is specified here, the bot will recognize that template on pages and not edit them, leaving recommendations on the talk page instead.
  • No-talk tags: If a template(s) is specified here, the bot will recognize that template on pages and not leave recommendations on the talk page.
  • Paywall tags: If a template(s) is specified here, references that contain them will be identified as a potential paywalled resource. Certain dead URL responses will be considered as a result of being stuck behind a paywall.
  • Reference tags: Reference blocks that are opened and closed using templates are specified here.
  • Archive tags: This section is dynamically generated based on the predefined archive template maps created with the archive template editor. SEE CREATE AND EDIT ARCHIVE MAPS for more info on creating those maps.  Select the archive tags that are applicable to the wiki and specify a template, or a comma separated list of templates.  The bot will use those templates to add archive URLs to their original counterparts in the desired formatting.
  • Dead link tags: Produces a rendered example of how the tag will be used on the respective wiki to identify dead URLs.  If a template(s) is specified here, the bot will use this template to recognize human marked dead URLs and to mark dead URLs with no archive as dead.
  • Template behavior: This defines the behavior of how the dead link tag is added to dead URLs.
    • Append template to URL - Add the template behind the dead URL
    • Replace original URL with template - Remove the original URL and replace it with the entire template.  Useful if the template renders the original URL.
  • Dead link template syntax: Uses the same syntax as defined on the Citation Template Configurator. This lets you define the "dead link" template used on your wiki, for example https://en.wikipedia.org/wiki/Template:Dead_link
  • Notify only on domains:
  • Scan for dead links:
    • No
    • Yes
  • Submit live links to the Wayback Machine: If this option is selected, the bot will submit each live link it finds to the Wayback Machine for archival. For Wikipedia and other Wikimedia wikis this is disabled, as the Internet Archive is already crawling these links automatically.
    • No
    • Yes
  • Convert archives to long-form format: If this option is selected, the bot will convert archive URLs on the page to longer format URLs that include more metadata.
    • No
    • Yes
  • Normalize archive URL encoding: If this option is selected, the bot will encode non-ASCII characters in URLs.
    • No
    • Yes
  • Convert plain links to cite templates: If this option is selected, the bot will convert bare references that consist only of a link to full citations with proper citation templates.
    • No
    • Yes
  • Edit rate limit: Set this to 0 to disable the rate limit imposed on the bot’s edit rate.  This can be set to something like “4 per minute”, or “400 per day”.  When this is enabled, bot job queuing will be disabled.
  • Added archive talk-only: Part of the {modifiedlinks} magic word, this is used to describe the recommended addition of an archive to a URL. This is used when the main article hasn't been edited. Supports the following magic words:
    • {link}: The original URL.
    • {newarchive}: The new archive of the original URL.
  • Dead link talk-only: Part of the {modifiedlinks} magic word, this is used to describe that the original URL has been found to be dead and should be tagged. This is used when the main article hasn't been edited. Supports the following magic words:
    • {link}: The original URL.
  • No dead link talk-only: Part of the {modifiedlinks} magic word, this is used to describe that the original URL has been tagged as dead, but found to be alive and recommends the removal of the tag. This is used when the main article hasn't been edited. Supports the following magic words:
    • {link}: The original URL.
  • Added archive message item: Part of the {modifiedlinks} magic word, this is used to describe the addition of an archive to a URL. Supports the following magic words:
    • {link}: The original URL.
    • {newarchive}: The new archive of the original URL.
  • Modified archive message item: Part of the {modifiedlinks} magic word, this is used to describe the modification of an archive URL for the original URL. Supports the following magic words:
    • {link}: The original URL.
    • {oldarchive}: The old archive of the original URL.
    • {newarchive}: The new archive of the original URL.
  • Fixed source message item: Part of the {modifiedlinks} magic word, this is used to describe the formatting changes and/or corrections made to a URL. Supports the following magic words:
    • {link}: The original URL.
  • Dead link message item: Part of the {modifiedlinks} magic word, this is used to describe that the original URL has been tagged as dead. Supports the following magic words:
    • {link}: The original URL.
  • No dead message item: Part of the {modifiedlinks} magic word, this is used to describe that the original URL has been untagged as dead. Supports the following magic words:
    • {link}: The original URL.
  • Default message item: Part of the {modifiedlinks} magic word, this is used as the default text in the event of an internal error when generating the {modifiedlinks} magic word. Supports the following magic words:
    • {link}: The original URL.
  • Archive error item: Part of the {problematiclinks} magic word, this is used to describe the problem the Wayback machine encountered during archiving. Supports the following magic words:
    • {problem}: The problem URL.
    • {error}: The error that was encountered for the URL during the archiving process.
  • Edit summary: This sets the edit summary the bot will use when editing the main article. See the #Magic Word Globals subsection for usable magic words. (Items 11, 12, and 13 are not supported)
  • Error message summary: This sets the edit summary the bot will use when posting the error message on the article's talk page.
  • Message summary: This sets the edit summary the bot will use when posting the analysis information on the article's talk page. See the #Magic Word Globals subsection for usable magic words.

Magic word globals[edit]

These magic words are available when mentioned in the respective configuration options above.

  1. {namespacepage}: The page name of the main article that was analyzed.
  2. {linksmodified}: The number of links that were either tagged or rescued on the main article.
  3. {linksrescued}: The number of links that were rescued on the main article.
  4. {linksnotrescued}: The number of links that were unable to be rescued on the main article.
  5. {linkstagged}: The number of links that were tagged dead on the main article.
  6. {linksarchived}: The number of links that were archived into the Wayback Machine on the main article.
  7. {linksanalayzed}: The number of links that were overall analyzed on the main article.
  8. {pageid}: The page ID of the main article that was analyzed.
  9. {title}: The URL encoded variant of the name of the main article that was analyzed.
  10. {logstatus}: Returns "fixed" when the bot is set to edit the main article. Returns "posted" when the bot is set to only leave a message on the talk page.
  11. {revid}: The revision ID of the edit to the main article. Empty if there is no edit to the main article.
  12. {diff}: The URL of the revision comparison page of the edit to main article. Empty if there is no edit to the main article.
  13. {modifedlinks}: A bullet generated list of actions performed/to be performed on the main article using the custom defined text in the other variables.