Recentchanges via XMPP

From Meta, a Wikimedia project coordination wiki

Recentchanges via XMPP refers to a system for distributing notifications about events on a wiki (the ones that get listed on Special:Recentchanges) using the XMPP protocol. This page describes the motivation, architecture and implementation of such a system for Wikimedia.

For the currently ongoing demo, see the #Prototype section.

Motivation[edit]

Some times, we may want to get notified about events happening on a wiki - edits, new users, page protection, etc. Researchers may want to analyze the events, bots may want to react to them, content re-users may want to refresh their local copy, etc.

Currently, the only decent way to follow changes on a wiki is to periodically poll the web API. This kind of polling isn't very reliable (see bugzilla:24782 for instance), and also not very efficient (everyone ask for the same info, over and over). A push based approach would be nicer. Until now, this was done using IRC channels: every event on the wiki generates a message on IRC. These messages however are formatted for human consumption and are hard to parse, sometimes ambiguous, and occasionally even truncated (IRC has a limit of about 500 bytes per message).

XMPP is much better suited for this, as has been discussed several times before (e.g. at the Wikimedia Conference): It has no fixed limit for message size, and it conveniently allows any additional XML payload to be attached to messages, which makes it easy to distribute structured data records. XMPP is also a federated protocol, using a client-server-server-client communication structure, like email (or, in fact, IRC), which makes it much more scalable that any poll-based approach.

Being able to easily embed XML in the messages means that it becomes very simple for clients to extract and use the information. An XMPP stanza containing information about a RecentChanges-event could look something like this:

<message xmlns="jabber:client" to="john@jabber.org/mars"
         from="enwiki@conference.jabber.toolserver.org/xmlrc"
         id="38" type="groupchat">
  <body>[[List of Knight's Cross recipients: Z]]  
        http://en.wikipedia.org/w/index.php?diff=378715833&amp;rcid=389791968&amp;oldid=378714319&amp;title=List+of+Knight%27s+Cross+recipients%3A+Z * 
        MisterBee1966 * (+1203) /* Recipients */</body>
  <rc comment="/* Recipients */ " newlen="26554" rcid="389791968"
      pageid="8089657" title="List of Knight's Cross recipients: Z"
      timestamp="2010-08-13T14:08:49Z" wikiid="enwiki" 
      revid="378715833" old_revid="378714319" user="MisterBee1966"
      ns="0" type="edit" oldlen="25351">
    <tags />
  </rc>
</message>

Note that the <rc> tag is exactly the same as the one generated by the web API.

Architecture[edit]

One difficulty with getting data from a PHP web application like MediaWiki to XMPP is the way PHP works: it starts a fresh run of the program for every page request. Because of this, there is no persistent process that could stay connected to the XMPP server, a new connection would have to be made every time. Also, if the XMPP server is broken or slow, it will slow down page delivery. Such a dependency on another service is highly undesirable for a high load web site like Wikipedia.

The solution, already used now for the notifications in IRC, is to have MediaWiki send UDP packets that contain the relevant data, and have a standalone bridge service collect these UDP packets (for all wikis), and send them to their respective destinations (IRC channels, XMPP chat rooms, etc).

So, the flow of information looks like this:

MediaWiki + XMLRC
   \--(UDP)--> udp2xmpp.py
                  \--(XMPP)--> XMPP server ..... XMPP server 
                                                  \--(XMPP)--> XMPP client 
                                                              (e.g. rcclient.py)

Implementation[edit]

Such a system is implemented by the XMLRC extension. It consists of two main parts:

  • the XMLRC extension proper, loaded into MediaWiki, will send UDP packets containing <rc> tags describing all events happening on the wiki.
  • udp2xmpp, the bridge process, which receives UDP packets and sends them on to XMPP

There are two more components that may be useful:

  • rcclient.py, a XMPP client library for python that makes it easy to process any RecentChanges events received via XMPP.
  • rc2udp, a service that polls the API and sends the <rc> tags as UDP packets. This allows for testing an udp2xmpp setup without having to install XMLRC in the wiki directly.

Prototype[edit]

A prototype setup has been deployed on the Toolserver as of August 17 2010. You can follow the changes on the English language Wikipedia by joining the XMPP chat room enwiki@conference.jabber.toolserver.org with any Jabber client (Note: some people have reported problems connecting with Google Chat accounts.). If you want to play with the extra payload, download rcclient.py and use it to process the messages from that channel.

The prototype is based on rc2udp, that is, it polls the API to get the data. This is of course only for testing. The goal is to deploy XMLRC on the Wikimedia web servers and to have a channel for every wiki.

Note that the prototype may be down every now and then, and that it will not be permanent. But please go ahead an test it!

See also[edit]