User-Agent policy

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Comment This page is purely informative, reflecting the current state of affairs. To discuss this topic, please use the wikitech-l mailing list.

As of February 15, 2010, Wikimedia sites require a HTTP User-Agent header for all requests. This was an operative decision made by the technical staff and was announced and discussed on the technical mailing list.[1][2] The rationale is, that clients that do not send a User-Agent string are mostly ill behaved scripts that cause a lot of load on the servers, without benefiting the projects. Note that non-descriptive default values for the User-Agent string, such as used by Perl's libwww, may also be blocked from using Wikimedia web sites (or parts of the web sites, such as api.php).

User agents (browsers or scripts) that do not send a User-Agent header may now encounter an error message like this:

Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.

User agents that send a User-Agent header that is blacklisted (for example, any User-Agent string that begins with "lwp", whether it is informative or not) may encounter a less helpful error message (lie) like this:

Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes.

This change is most likely to affect scripts (bots) accessing Wikimedia websites such as Wikipedia automatically, via api.php or otherwise, and command line programs.[3] If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address), e.g.:

User-Agent: MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4

Do not copy a browser's user agent for your bot, as bot-like behavior with a browser's user agent will be assumed malicious.[4] Do not use generic agents such as "curl", "lwp", "Python-urllib", and so on. For large frameworks like pywikibot, there are so many users that just "pywikibot" is likely to be somewhat vague. Including detail about the specific task/script/etc would be a good idea, even if that detail is opaque to anyone besides the operator.[5]

For more information, please refer to the MediaWiki API Documentation.[6]

Web browsers generally send a User-Agent string automatically; if you encounter the above error, please refer to your browser's manual to find out how to set the User-Agent string. Note that some plugins or proxies for privacy enhancement may suppress this header. However, for anonymous surfing, it is recommended to send a generic User-Agent string, instead of suppressing it or sending an empty string. Note that other features are much more likely to identify you to a website — if you are interested in protecting your privacy, visit the Panopticlick project.

Browser-based applications written in Flash or JavaScript are typically forced to send the same User-Agent header as the browser that hosts them. This is not a violation of policy.

Notes[edit]

  1. The Wikitech-l February 2010 Archive by subject
  2. User-Agent: | Wikipedia | Wikitech
  3. API:FAQ - MediaWiki
  4. [Wikitech-l] User-Agent:
  5. Anomie (31 July 2014). Clarification on what is needed for "identifying the bot" in bot user-agent?. Mediawiki-api.
  6. As an example (among other examples) of how to set a user-agent, in PHP, one might use the following, if one's cURL handle is $ch:
    curl_setopt ( $ch , CURLOPT_USERAGENT , 'MyCoolTool/1.1 (http://example.com/MyCoolTool/; MyCoolTool@example.com) BasedOnSuperLib/1.4' );