Jump to content

Interwiki bot access protocol

From Meta, a Wikimedia project coordination wiki

Note: Most of this is obsolete, as it has been replaced by the API interface. The interwiki bots should be modified to use the existing Query API to significantly reduce server load. --Yurik 07:39, 4 September 2006 (UTC)[reply]

See also feature request #1993062 --Melancholie 23:14, 14 June 2008 (UTC)[reply]


This page documents a mediawiki interface and implementation that allows interwiki bots to get needed information with considerably less server load and bandwidth usage.

In the present implementation, bot requests all the data through Special:Export interface, parses each page, recognizes interwiki links and disambiguation templates, follows interwiki links to the other sites to also get data, assembles the results, and updates the pages with the new interwiki links.

To substantially optimize this process, the following has been proposed:

  1. The bot should be able to request just the interwiki links for needed pages
    A format similar to Special:Export request can be used here
    Example: http://de.wikipedia.org/w/api.php?action=parse&format=xml&text={{:Test}}{{:Bot}}{{:Haus}}&prop=langlinks (langlinks, categories, links, templates, images, externallinks)
  2. The bot needs to know if the page is a disambiguation.
    The bot maintains a list of all disambiguation templates for all sites. The bot can send the disambiguation template names as part of the request, so server can give a flag if a given page is a disambig.
    Example: http://de.wikipedia.org/w/api.php?action=query&prop=templates&titles=BKL
    Example: http://de.wikipedia.org/w/api.php?action=parse&format=xml&page=BKL&prop=templates
  3. The bot needs to know when the page is a redirect
    Resolving example: http://de.wikipedia.org/w/api.php?action=query&prop=langlinks&titles=Main%20Page&redirects
    Checking example: http://de.wikipedia.org/w/api.php?action=query&list=allpages&apfrom=Main_Page&aplimit=1&apfilterredir=redirects (&apfilterlanglinks=withlanglinks)

More general format proposed on IRC

[edit]

The mediawiki users should be able to request data by giving a list of needed pages and needed properties. For example:

  • Request

Get properties interwiki, template, followredirects for pages A,B,C

  • Response
    • A: iwlinks [x1,y1]; templates [aa1,bb1]
    • B: DOES NOT EXIST
    • C: REDIRECT to D
    • D: iwlinks [x2,y2]; templates [aa2,bb2]

All this could be incorporated into the Special:Export result format.

See http://de.wikipedia.org/w/api.php?action=parse&format=xml&page=BKL&prop=categories as an example.

See Also

[edit]
  • Similar initiative for individual page data and CSV formatted data at REST page.