Data request limitations

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Wikimedia requires that users adhere to certain data request limitations to avoid putting excessive load on the servers. Wikimedia asks that users, when practicable, use the data dumps to obtain large amounts of data from WMF projects rather than making unnecessary API requests. At one time, there was a live feed available, as a paid service, to make it possible to keep one's databases in sync with Wikimedias', but this is no longer available to new customers. Alternatives include using the IRC feeds or the Toolserver.

Issues[edit]

Retrieving large numbers of page revisions[edit]

Referring to some of the features available via API query properties, Tim Starling writes: "You can use api.php with rvprop=content and rvcontinue to fetch the text of all revisions of a page. Please do this in a single thread with a substantial delay between requests, since this is a very expensive operation for our servers. Do not attempt to do it for a large number of pages, for that, use the XML download instead. Do not do it regularly or set up a web gateway which allows users to initiate these requests."[1]

Live mirrors[edit]

Live mirrors are forbidden. A live mirror is one that polls WMF for the page data every time a user requests that page from the mirror. A live mirror is not a mirror that merely polls WMF for data needed to keep the mirror up-to-date.

Polling API[edit]

Users must follow the User-Agent policy. If you run your requests in serial, rather than parallel, you are unlikely to put too much strain on the servers.

InstantCommons[edit]

No policy has yet been established limiting mw:InstantCommons use. It is considered unlikely that individual wikis using the InstantCommons feature would cause a significant increase in cost for the Wikimedia Foundation since every file only has to be downloaded once, and there are per-user bandwidth limitations.

Images[edit]

See Data dumps#Downloading Images. You should get them from a mirror if you can.

References[edit]

See also[edit]