InternetArchiveBot/API
The IABot Management Interface comes with a simple, fast, and easy to use API. This allows tools and gadgets to integrate with the tool and contribute to improving IABot's core as well as rely on the vast resources IABot has. Other Wikipedia bots can access the API as well and approved bots on Wikipedia are automatically given bot rights on the API. To access the API, go to https://iabot.wmcloud.org/api.php.
Authorizing the tool/bot/gadget to use the API
[edit]Unlike the primary interface, the API allows limited access to information without the need for authorization, however, to use the API fully, an OAuth authorization header must be passed to the API in every request. For bot's accessing the API, since conventional OAuth will not work, since it requires user input, the API uses a header relay system and relays your header to MW to obtain your bot's account details. The API will then pass back the payload, provided the header supplied was valid. Gadgets and onwiki JS scripts, can use the conventional authentication method which will direct the user to approve the tool's access to the user's account.
Differences between authorization methods
[edit]The conventional method of authorization typically involves users clicking the login button on the tool interface directing them to an authorization window to authorize the access to the tool, which afterwards they are directed back to the original location. This allows the tool to fully access the user's account, within the grants allowed, and make edits on behalf of the user.
The method for API access for a bot account needs to be done in a manner where popup dialogs don't need to be answered since they are JS based. As such the API will accept the bot's OAuth header, and will pass that to the MW software to obtain the identification of the bot. The header is generated by the bot, and the payload from the response is passed back to the bot. The API will not always use the header and as such will not always pass back a payload. Bot's are encouraged to validate the payload at least once every run and whenever a payload is passed back to the bot. To force a payload response, the "returnpayload" parameter can be set, either POST or GET. This will however slow down the response of the request as it tries to retrieve a payload. This method of authorization only allows the tool to identify what account just connected to the API, and cannot make edits on behalf of the bot. As such some API features will be disabled.
Authorizing a JS gadget or another tool using OAuth
[edit]If the tools using the API are on wiki JS gadgets, or other OAuth supported tools, users can simply call the URL https://iabot.wmcloud.org/oauthcallback.php?action=login&returnto={returnurl}, where {returnurl} is the URL of the location to direct the user back to on successful login, to authorize the user for the API.
To log the users back out, call the URL https://iabot.wmcloud.org/oauthcallback.php?action=logout&returnto={returnurl}.
Authorizing a bot with an owner-only consumer
[edit]This method is more complicated, and limits some API functions that require direct access to the account. Because the method being used to authenticate is relaying the OAuth header to MW, it is important to note the destination URL being encoded into the OAuth signature is not the tool API URL, but MW's OAuth /identify URL. Not encoding the correct URL into the signature will result in an invalid signature response. In addition to that, the correct identify URL to the correct wiki needs to be used. Since this is a multi-wiki interface, the tool needs to know which wiki you are working on. The default wiki is enwiki. To change the wiki, you can set the wiki
parameter to the appropriate wiki. You can set it in the same request as the authorization. For example if you want the Swedish Wikipedia, your request needs to have wiki=svwiki
in either the POST or GET fields. When signing the header, the URL being encoded must match the format https://{domain}/w/index.php?title=Special:OAuth/identify, where {domain} is the wiki being worked on.
If you are planning to run on the English Wikipedia, you would pass wiki=enwiki
in the GET or POST fields, and encode the URL https://en.wikipedia.org/w/index.php?title=Special:OAuth/identify into your OAuth signature. You would then pass it to the tool API to authenticate you.
If you are not familiar with OAuth may use the provided 2 scripts that can be called from the shell mentioned in the section below.
If the OAuth header passed to the tool is valid, it will be able to identify the bot connecting to the application, and it will pass back an encrypted payload. It is important to note that the header is only used to identify the bot. To prevent unexpected session failures, it is recommended to always pass a header and validate the payload whenever one is returned. It is also recommended to validate the payload at least once every run. To force a payload in the response, you can set the returnpayload
parameter. This however slows down the request as the API now tries to perform the identify request. More details about the parameters are explained below.
Helper scripts to make OAuth easier
[edit]OAuth can be difficult to implement for some users. There are 2 scripts meant to be executed externally that can do the hard work for you.
The first is the MWOAuthGenerateHeader:
- Download: MWOAuthGenerateHeader.php
- Execute the script in the command line interface or via some form of exec function as follows
php MWOAuthGenerateHeader.php <consumerkey> <consumersecret> <accesstoken> <accesssecret> <identifyurl>
- Replace the bracketed placeholders with the appropriate values
- On success, the output of the script will be a properly formatted OAuth header to be passed to the tool via the headers. If something went wrong, it will output "FAIL" instead.
The second is the MWOAuthDecodePayload:
- Download: MWOAuthDecodePayload.php
- When the tool passes back the payload, execute this script as follows
php MWOAuthDecodePayload.php <consumersecret> <payload>
- Replace the bracketed placeholders with the appropriate values
- On success, the output of the script will be a JSON object with your account details. If any part of the payload fails to pass validation, it will output an error message instead. If you know the script was properly executed, it could suggest some form of interception or attack took place during the request, and execution should be aborted.
API functions and usage and limitations
[edit]GET or POST?
[edit]The API accepts either, and can be interchanged. POST parameters with the same parameter as a GET will take priority. It is recommended to POST all requests.
Global parameters
[edit]The API allows some information to be obtained without the need for authorization, such as obtaining URL information and pages they're found on, but the remaining functions require authorization.
There are global parameters when passed affect all requests and how the API operations.
Parameter name | Possible values | Description |
---|---|---|
action | See the section below | This is the primary parameter to direct the API. This tells the API what action to carry out. |
returnpayload | anything | Forces the API to pass the header to the wiki's OAuth and try to get a payload. If successful, a payload will be returned with the request. A failure will result in an error getting returned with the response. The bot will not be logged out on a request failure however. |
token | The token echoed from the previous response | This is a required parameter for all write requests making some form of a change. This is the CSRF token. Missing this will return a 400 error. |
checksum | The checksum token echoed from the previous response | This is a required parameter for all write requests making some form of change. This is the checksum token which is used to validate the request as valid. A bad token will result in a 409 error and a missing token will result in a 400 error. |
wiki | 'enwiki', 'svwiki', or any other support wiki | This directs the tool which wiki to switch to. Bots should generally use their home wiki. Changing this also means changing the identify URL mentioned in the above section for owner-only consumers. |
offset | string | This is a parameter used for requests requiring pagination. When omitted, only the first 1000 entries are returned in the result. When moving between pages, pass the value defined in the continue value, when defined, to move through the result set, by passing the same request and this parameter. |
Global return values
[edit]All API responses are in JSON.
In every response there are values that get returned when completing a request.
Value name | Type | Description |
---|---|---|
loggedon | boolean | Indicates whether or not the client is logged onto the API. |
username | string | Only defined if the client is logged in. Contains the identified user connected to the API. |
csrf | string | Only defined if the client is logged in. Contains the CSRF token required for all write requests. |
checksum | string | Only defined if the client is logged in. Contains the checksum token required to execute a write request. |
servetime | float | The length of time in seconds the webserver took to service the request. |
result | string | Defined when a write request is being made. Successful requests have the value "success". Other requests have a value of "fail". |
continue | string | Defined when a request uses pagination and has more results. This is the value for the next set of entries which can be up to 1000 per set. Pass to the offset parameter to go to the next set of entries when repeating the request. |
Global return values on error
[edit]Value name | Type | Description |
---|---|---|
noaccess | string | Returned when the API is inaccessible. Possible codes are, "disabledinterface", "maintenance", and "Missing authorization". Respectively, they mean the tool has been disabled by a developer, the tool developers are performing maintenance, and that client is not logged in to the API to execute the desired function. |
notavailable | string | Usually is passed back when a function of the API being accessed is not executable with an owner-only consumer. |
errormessage | string | Defined whenever an error occurs during a request. It's usually accompanied with an error code for bot's to identify. This contains an English description of the error. |
noaction | string | Default response for the API if no action is being performed. |
validationerror | string | Defined when returnpayload is set in the request. Possible values are "noheader" and "invalidheader". |
usedheader | string | Returned with validationerror and autherror. Contains the header the API attempted to use. |
autherror | string | Defined when initial login to the API failed. |
requesterror | string | Defined when a request couldn't be executed due to an issue with the request. Possible values are "invalidchecksum", "missingchecksum", "blocked", "dberror", "404", "invalidtoken", and "missingtoken" |
ratelimit | string | Default response for the API if the number of allowed requests were exceeded during the last minute. Limits are 5 requests/minute for anonymous users, 500 requests/minute for logged in users, and 5000 requests/minute for authorized bots. |
missingpermission | string | Default response for the API if the requested action requires a permission that the client lacks. Contains the permission required. |
accessibletogroups | array | Defined with missingpermission listing the usergroups that have access to the requested function. |
missingvalue | string | Defined when a required value for a request is missing, or has bad data. |
Action functions
[edit]action=runpages
[edit]This action allows automated processes to fetch the run statuses of the bot on the wikis it is configured to run on and who last toggled the run state of a given wiki.
This action has no parameters.
This action has no unique responses for successful requests.
This action has no errors specific to this action.
This action can be called without authorization.
action=getfalsepositives
[edit]This action allows automated processes to fetch reported false positives, from either bots or tool users. This action requires the "viewfpreviewpage" permission, and logically is not available to anonymous clients.
This action offers the following parameters:
Parameter | Required | Accepted values | Description |
---|---|---|---|
displayopen | Default option | Anything | Setting this parameter will return all false positives that have been reported but not acted on. |
displayfixed | No | Anything | Setting this parameter will return all false positives that were reported and fixed. |
displaydeclined | No | Anything | Setting this parameter will return all false positives that were reported and declined as an invalid report. |
The action has the following possible return values:
Value | Type | Description |
---|---|---|
openreports | int | The number of active unacted reports. |
fpreports | array | All of the reports in the request. |
This function does not return action specific errors.
action=getbotqueue
[edit]This action allows automated processes to fetch bot jobs in the bot queue submitted from either bots or tool users. This action requires the "viewbotqueue" permission, and logically is not available to anonymous clients.
This action offers the following parameters:
Parameter | Required | Accepted values | Description |
---|---|---|---|
displayqueued | No | Anything | Setting this parameter will return all bot jobs still pending completion. |
displayrunning | Default option | Anything | Setting this parameter will return all bot jobs actively being worked on. |
displayfinished | No | Anything | Setting this parameter will return all bot jobs that have been successfully finished. |
displaykilled | No | Anything | Setting this parameter will return all bot jobs that have been killed by the requesting users or tool maintainers. |
displaysuspended | No | Anything | Setting this parameter will return all bot jobs that have been suspended by the tool maintainers. |
The action has the following possible return values:
Value | Type | Description |
---|---|---|
queued | int | The number of bot jobs still pending. |
running | int | The number of bot jobs in operation. |
botqueue | array | Details of all of the bot queue jobs requested. |
This function does not return action specific errors.
action=reportfp
[edit]This action allows automated processes to report false positives, to the interface and tool maintainers. This action requires the "reportfp" permission, and logically is not available to anonymous clients. This action requires the CSRF and Checksum tokens to work.
This action offers the following parameters:
Parameter | Required | Accepted values | Description |
---|---|---|---|
fplist | Yes | Newline separated string | This parameter is a list of URLs separated by a newline, to be reported as false positives. |
The action has the following possible return values:
Value | Type | Description |
---|---|---|
toreport | array | The URLs that were reported to the maintainers. |
toreset | array | The URLs that were automatically corrected during reporting. |
notdead | array | The URLs already found to be alive and are being ignored. |
notfound | array | The URLs IABot hasn't encountered and are being ignored. |
alreadyreported | array | The URLs that already reported and will not be reported again. |
This action has the following possible errors:
Value | Type | Description |
---|---|---|
reportfperror | string | Defined when an error specific to the action has occured. |
action=searchurldata
[edit]This action allows automated processes to fetch URL data for any encountered URL on Wikipedia. This action is available to anonymous clients.
This action offers the following parameters:
Parameter | Required | Accepted values | Description |
---|---|---|---|
urls | No | Newline separated string | This parameter is required if urlids or any of the search filters are not set. A list of URLs, separated by newlines, to lookup and provide details about. |
urlids | No | Newline separated int | This parameter is required if urls or any of the search filters are not set. A list of URL IDs, separated by newlines, to lookup and provide details about. |
hasarchive | No | 0 or 1 | This parameter is required if urls and urlids or any of the other search filters are not set. Set to 0 to retrieve all URLs with no archive associated with it. Set to 1 to retrieve all URLs with an archive associated with it. |
livestate | No | Pipe separated string | This parameter is required if urls and urlids or any of the other search filters are not set. Filters records to the given states of the URLs. Available options are:
|
isarchived | No | string | This parameter is required if urls and urlids or any of the other search filters are not set. Filters records based on if they have a known available archive in the Wayback Machine. Returns true or false in the archived field for each URL. For URLs where it's uncertain an archive exists, NULL is returned instead. Available options are: (only one can be picked)
|
reviewed | No | 0 or 1 | This parameter is required if urls and urlids or any of the other search filters are not set. Set to 0 to retrieve all URLs that haven't been reviewed by a user or another bot. Set to 1 to retrieve all URLs that have been reviewed by a user or another bot. |
The action has the following possible return values:
Value | Type | Description |
---|---|---|
urls | array | Details of all of the URLs requested. |
This function does not return action specific errors.
action=searchpagefromurl
[edit]This action allows automated processes to fetch pages encountered URLs were found on. This action is available to anonymous clients.
This action offers the following parameters:
Parameter | Required | Accepted values | Description |
---|---|---|---|
url | No | string | This parameter is required if urlid is not set. A URL to lookup found pages with. Using urlid is recommended. |
urlid | No | int | This parameter is required if url is not set. Look up the pages based on the URL's ID. |
The action has the following possible return values:
Value | Type | Description |
---|---|---|
pages | array | A list of all pages the URL was found on. |
This function does not return action specific errors.
action=searchurlfrompage
[edit]This action allows automated processes to fetch URLs encountered on given pages were found on. This action is available to anonymous clients.
This action offers the following parameters:
Parameter | Required | Accepted values | Description |
---|---|---|---|
pageids | Yes | Pipe separated int | A list of page IDs to lookup. |
The action has the following possible return values:
Value | Type | Description |
---|---|---|
urls | array | A list of all URLs that were found on the given pages. |
This function does not return action specific errors.
Wikipedia page ID's can be retrieved using the MediaWiki API. For example for 'Albert Einstein':
action=modifyurl
[edit]This action allows automated processes to modify URL data IABot uses. This action requires the "changeurldata" permission, as well as the "alteraccesstime" permission to alter the access time, the "deblacklisturls" permission to remove URLs from the blacklist, the "dewhitelisturls" permission to remove URLs from the whitelist, the "blacklisturls" to add URLs to the blacklist, the "whitelisturls" to add URLs to the whitelist, the "alterarchiveurl" permission to alter the archive URL of URLs, and the "overridearchivevalidation" permission to bypass the archive validation checks, and logically is not available to anonymous clients. This action requires the CSRF and Checksum tokens to work. Not all permissions are needed to perform the desired functions.
This action offers the following parameters:
Parameter | Required | Accepted values | Description |
---|---|---|---|
urlid | Yes | int | The URL ID of the URL to modify. |
accesstime | No | A PHP recognized timestamp | The timestamp of the access time. The bot uses this when searching for new archives. Requires the "alteraccesstime" permission to modify. |
livestateselect | No | 0, 3, 5, 6, or 7 | The live state to set the URL to.
|
archiveurl | No | string | A URL of an archive snapshot of the original URL. Requires the "alterarchiveurl" permission to modify. |
reason | No | string | An optional reason describing the changes being made and why. It's recommended to provide one. |
overridearchivevalidation | No | 1 or "on" | Bypass the checks on the archive snapshot. The snapshot will still be checked to ensure it is an archive snapshot, but making sure it matches the original will be bypassed. Requires the "overridearchivevalidation" permission to set. |
This action has no unique output responses for successful requests.
This action has the following possible errors:
Value | Type | Description |
---|---|---|
urldataerror | string | Defined when an error specific to URL modification occured. Possible values are "illegalaccesstime", "stateblockedatdomain", "illegalstate", "invalidarchive", "urlmismatch", and "404". |
action=analyzepage
[edit]This action allows automated processes to run the bot library on a page and make an edit on the clients behalf This action requires the "analyzepage" permission, and logically is not available to anonymous clients. This action requires the CSRF and Checksum tokens to work. This action is only available for fully authenticated clients. Owner-only consumers will not work.
This action offers the following parameters:
Parameter | Required | Accepted values | Description |
---|---|---|---|
pagesearch | Yes | string | The page title of the page to analyze. |
reason | No | string | An option reason for analyzing the page. |
archiveall | No | "on" | Attempt to add archives to all non-dead references and save non-existent copies to the Wayback Machine. |
The action has the following possible return values:
Value | Type | Description |
---|---|---|
linksanalyzed | int | The number of URLs the bot found and analyzed. |
linksarchived | int | The number of URLs the bot archived to the Wayback Machine |
linksrescued | int | The number of URLs it fixed on wiki, either through adding archives, or correcting formatting. |
linkstagged | int | The number or URLs it tagged as dead on wiki. |
pagemodified | bool | Whether the page was edited or not. |
waybacksadded | int | The number of Wayback Machine archives added to the page. |
othersadded | int | The number of other archives added to the page. |
revid | int OR bool | The revision ID of the edit. False if no edit was made. |
modifiedlinks | array | The list of links it modified on the page. |
This action has the following possible errors:
Value | Type | Description |
---|---|---|
analyzeerror | string | Defined when an error specific to the action has occured. Possible values are "404" and "apierror". |
action=submitbotjob
[edit]This action allows automated processes to submit bot jobs for InternetArchiveBot to carry out. This action requires the "submitbotjobs" permission, and logically is not available to anonymous clients. This action requires the CSRF and Checksum tokens to work. Additionally, bot jobs larger 500 pages require the "botsubmitlimit5000" permission, bot jobs larger than 5000 pages require the "botsubmitlimit50000" permissions, and bot jobs larger than 50000 pages require the "botsubmitlimitnolimit" permission.
This action offers the following parameters:
Parameter | Required | Accepted values | Description |
---|---|---|---|
pagelist | Yes | Newline separated string | A list of page titles to process, separated by newlines. |
The action has the following possible return values:
Value | Type | Description |
---|---|---|
id | int | The job ID number |
status | string | The current job run status. Possible values are "queued", "running", "complete", "killed", and "suspended" |
requestedby | string | The user that requested the bot job. |
targetwiki | string | The wiki code of the target wiki. |
queued | string | Timestamp of when the job was submitted. |
lastupdate | string | Timestamp of the last update to the job. |
totalpages | int | The total number of pages in the bot job. |
completedpages | int | The number of pages completed by the bot. |
runstats | array | An array of statistics during the run. |
This action has the following possible errors:
Value | Type | Description |
---|---|---|
bqsubmiterror | string | Defined when an error specific to the action has occured. |
action=getbotjob
[edit]This action allows automated processes to submit bot jobs for InternetArchiveBot to carry out. This action requires the "submitbotjobs" permission, and logically is not available to anonymous clients. This action requires the CSRF and Checksum tokens to work. Additionally, bot jobs larger 500 pages require the "botsubmitlimit5000" permission, bot jobs larger than 5000 pages require the "botsubmitlimit50000" permissions, and bot jobs larger than 50000 pages require the "botsubmitlimitnolimit" permission.
This action offers the following parameters:
Parameter | Required | Accepted values | Description |
---|---|---|---|
id | Yes | int | The job ID to lookup. |
The action has the following possible return values:
Value | Type | Description |
---|---|---|
id | int | The job ID number |
status | string | The current job run status. Possible values are "queued", "running", "complete", "killed", and "suspended" |
requestedby | string | The user that requested the bot job. |
targetwiki | string | The wiki code of the target wiki. |
queued | string | Timestamp of when the job was submitted. |
lastupdate | string | Timestamp of the last update to the job. |
totalpages | int | The total number of pages in the bot job. |
completedpages | int | The number of pages completed by the bot. |
runstats | array | An array of statistics during the run. |
This function does not return action specific errors.
action=statistics
[edit]This action allows automated processes to query bot statistics of on wiki activity. This action can be used by anonymous clients.
This action offers the following parameters:
Parameter | Required | Accepted values | Description |
---|---|---|---|
time-start | No | timestamp | Do not include data points pre-dating this timestamp. |
time-end | No | timestamp | Do not include data points post-dating this timestamp. |
only-day | No | int | Only include data points if it was made on particular day(s) of a month. Can accept multiple pipe-separated list of ints from 1-31. |
only-month | No | int | Only include data points if it was made on particular month(s) of a year. Can accept multiple pipe-separated list of ints from 1-12. |
only-year | No | int | Only include data points if it was made on particular year(s). Can accept multiple pipe-separated list of ints from 2015-2024. |
only-wiki | No | string | Only include data points for a certain wiki. Accepts wiki codes (ie. enwiki). Can accept multiple pipe-separated list of strings. |
only-key | No | string | Only include data points for certain key values. Can accept multiple pipe-separated list of strings. |
min-value | No | int | Do not include data points if value of Key is less than given int. Must be greater than 0. |
max-value | No | int | Do not include data points if value of Key is greater than given int. Must be greater than 1 or min-value. |
format | No | string | Format the dataset.
|
The action has the following possible return values:
Value | Type | Description |
---|---|---|
statistics | array | An array of statistics requested. |
This function does not return action specific errors.
action=logout
[edit]This action allows automated processes logout from the API.
This action has no parameters.
This function does not return any action specific values.
This function does not return action specific errors.