Wikimania05/Paper-JJ1

From Meta, a Wikimedia project coordination wiki
This page is part of the Proceedings of Wikimania 2005, Frankfurt, Germany.

 

 

 


DavWiki - the next step of WikiRPCInterfaces?[edit]

  • Author(s):'' {{{...}}}
  • License: Janne Jalkanen
  • Slides: Janne Jalkanen
  • Video: {{{radio}}}
  • 'Note:' {{{slides}}}

About the slides: {{{agenda}}}

About license

LGPL or Creative Commons Attribution-ShareAlike

<include>[[Category:Wikimani templates{{#blocked:}}]]</include>

Slides[edit]

The presentation slides are available at Media:Wikimania05-JJ1-presentation.sxi, in OpenOffice 1.0 format.

Paper[edit]

DavWiki – the next step of WikiRPCInterfaces?[edit]

Janne Jalkanen
Nokia Corporation
Janne.Jalkanen@nokia.com


Abstract: WebDAV is an IETF standard way of exposing a remote document repository to an application over the HTTP standard. In this presentation, we examine how this makes it a good companion to a wiki; how a wiki can expose a WebDAV interface, and how a Wiki can use a remote WebDAV interface. As an example implementation, we look at JSPWiki that recently gained a WebDAV interface. As none of the previous attempts to unify a Wiki RPC interface have been successful, we suggest to adopt an industry standard, already supported by Microsoft Office, Mac OSX and KDE.

Introduction[edit]

The question why a wiki might even need an API is still an open one. Unlike with weblogs, no Wiki API has really become very popular, and there are no major applications for wiki APIs. However, there are many reasons why a standardized API would be useful for a wiki: most importantly, it would allow scripting of a wiki, much easing the maintenance work. Otherwise, the work would be done with the aid of HTML scrapers, which are not necessary portable even among different instances of the same Wiki engine, and strain the rendering engine. A scripting API also allows for functionality to be added in the engine without the need to actually upgrade the software, using even different languages than what the engine is coded in.

WebDAV[edit]

The Web Distributed Authoring and Versioning (WebDAV) is an IETF standard for remote content management, codified in RFC 2518. WebDAV Class 1 offers the basic traditional capabilitities of a file system to any web application: by using DAV extensions to the normal HTTP commands of GET, PUT, POST, and DELETE, things like moving and copying files can be done. In addition, WebDAV supports arbitrary metadata for any file or collection in the file system through the PROPFIND and PROPPATCH commands. A Class 2 server would also in addition be able to lock a resource.

Since WebDAV extends normal HTTP, finding the metadata of a resource consists of sending a PROPFIND instead of a GET – if you know the URL, you know how to access it with WebDAV.

WebDAV has also been extended with additional capabilities, such as versioning using the DeltaV framework (defined in RFC 3253). However, the support for these is not yet widespread. In our work, we suggest that it might be beneficial to simply adopt an existing internet standard, already supported by major operating systems, such as Windows, OSX and Linux/KDE, as the Wiki RPC API. This approach has already been taken with CMS systems like Zope.

Previous work[edit]

One of the first Wiki RPC APIs was the WikiRPCInterface [3]. It uses the XML-RPC API, and provides the basic Wiki capabilities. An application can fetch a page in either WikiMarkup or rendered HTML, browse the versioning history, and get the list of recently changed pages. This turned out to have several practical applications, such as a generic email notification system developed by Mahlen Morris: A computer in San Fransisco, CA would “dial in” regularly on the jspwiki.org web site, get a list of changes and email them to a list of subscribers. The availability of this API meant that the JSPWiki itself did not have to contain any code for email notification, providing some extra robustness. The simplicity of XML-RPC also made development of clients very easy. The work was later ported to some other popular wiki engines such as MoinMoin and TWiki by Les Orchard, among others.

Unfortunately, this API also had several drawbacks:

  • The interface does not allow authoring (because of the fear that a runaway program might delete the entire site – WikiSpam might become a serious issue)
  • XML-RPC limitations forced the API to wrap all text in Base64, needlessly making the data structures bigger
  • There was very little metadata handling in the API itself
  • The interface is WikiMarkup agnostic (though this could be viewed as a feature as well). Which unfortunately meant that you needed to know which type of a Wiki you were talking to.

Other WikiRPC solutions have been implemented in the VoodooPadAPI, the TikiWiki API, and the Mediawiki Special:Export API. Mediawiki is also getting a SOAP API, and Atom is likely to have a wiki component as well. However, none of these have received the same sort of attention as their blog counterparts: Blogger API, MetaWeblog API and Atom. Much of it is because every wiki offers a different markup variant, something that could be called a regular WikiMarkupMess. This makes it difficult to do any sort of interoperability between two wikis. Some attempts at an unifying wiki markup have been suggested, but it is difficult to conceive how things like plugins, variables and templates (as in Mediawiki) could be translated from one wiki to another. The only common level seems to be HTML.

Implementation[edit]

WebDAV has not received a lot of publicity, as it's one of those things that grow quietly in the woodwork. There are implementations available for all major platforms and languages. For Java, the Jakarta project publishes Slide, a WebDAV Class 2 compliant platform, which also includes support for the DeltaV versioning API.

JSPWiki is a Java-based J2EE WikiEngine which is a relatively average WikiEngine. However, since most of its functionality is implemented as plugins (except for the core WikiMarkup-HTML translation engine), it can be relatively easily hacked and extended. It is available under the Lesser General Public License from jspwiki.org.

JSPWiki has gained an experimental WebDAV API in the recent beta versions. It is implemented via extending the wiki servlets with a custom webdav-capable servlet class. An XPath engine handles the parsing of incoming WebDAV messages, and a custom parser based on the nekohtml engine does XHTML-WikiMarkup translation. The JSPWiki markup is a variant of the PHPWiki markup.

Advantages[edit]

WebDAV offers a file-system view towards a web resource. A wiki can be viewed as a general, typically flat file system, which has the capability of outputting every page as HTML or WikiMarkup. Therefore, a Wiki can support a limited subset of the WebDAV spec (typically Class 1).

WebDAV offers several advantages: The standard is well-understood, and has been integrated in a number of products already. A typical Wiki allows one to store attachments – generic binary objects (aka blobs), but while the editing cycle of text is extremely easy (click on “edit page”, edit, preview, save), the editing cycle of an attachment is far longer: (click on attachment name, download, edit, save to local hard drive, go to the attachment info page, upload new version.) If the Wiki page repository was exposed through a WebDAV API, it would be possible to simply click on an attachment name, open it in a local file browser window, and then manage it as if it were a regular file. Clicking on “save” in your Photoshop would immediately publish the picture to the Wiki.

The same happens with the WikiMarkup pages: you can just open a page repository in your own file browser, and open a text page in your favourite editor. You probably lose preview ability, but it would work completely transparently. If your wiki supports XHTML import (like JSPWiki does), you could even edit the rendered version of the page, and during saving, the wiki engine would translate it back to WikiMarkup. This does however have several problems: typically, the editing capability of any HTML editor surpasses the expression capability of simple WikiMarkup, and also it tends to produce somewhat messy WikiMarkup page. Of course, with Wikis where the native storage format is XHTML (or some form of XML), this restriction has less effect.

Of course, the ability to access a page repository as a file system means that a lone developer, armed with Perl, sed, awk and a number of other incomprehensible words would be able to apply all known string manipulation methods known to man and manage the file repository remotely – even from your friendly cron. Regardless of the underlying page repository system. A strong advantage often mentioned in a corporate environment is that using WebDAV allows you to get rid of the attachment “load-edit-store locally-upload a new version” -cycle: you can just directly save to a wiki. However, many operating systems might leave traces of temporary files or backup copies on the drive, so caution must be used when enabling something like this. An interesting application would be a wiki that is able to use a WebDAV server as a page repository. This means that wikis could be distributed – a wiki would be able to “mount” the repository of any other wiki (WikiMarkup issues nonwithstanding, but maybe generic XHTML import could do with this). This might make it easy to provide up-to-date documentation, for example: a certain namespace would always be fetched from a remote server.

The ability to access a page repository as a file system would also mean that it would be possible to carry a part of a wiki with you by using normal directory synchronization solutions, such as rsync or unison. This does raise the issue of conflict management, though: an user does not like if he is represented upon connection a large number of “unified diffs” to determine what is the current version. However, since wikipages are just text, the wikiengine could be made to simply display the changes prominently upon a conflict; and leave the conflict behind for people to be managed. It's after all not that difficult to remove a bunch of <<<<<<:s and >>>>>>:s than it is to remove WikiGraffiti. This could be called an “ungrafecul merge.”

Restrictions[edit]

However, WebDAV does make some assumptions that does make life a bit difficult for the developer. First of all, the directory structure exposed by WebDAV is a full directory structure, whereas a Wiki name space is typically flat (i.e no sub-pages). Many modern Wikis do however support a limited directory structure (usually in the form of a master page that can have any number of sub-pages, but the sub-pages don't necessarily have the ability to host other sub-pages). Therefore a WebDAV-enabled Wiki probably cannot allow arbitrary subdirectories to be created.

The other major restriction comes with WikiNames. Many wikis assume a WikiPage conforms to a particular set of characters or conventions: for example, a space is often a big no-no in the WikiName of a page – it is either deleted completely, or replaced by an underscore. Therefore a DAV-enabled WikiEngine needs to reject certain page names; something that the user may find utterly confusing.

With attachments, ungrafecul merges mostly don't work. Therefore it would be immensely useful to have locking capability within the WikiEngine, but on the other hand, it is not required for the client to support locks. Therefore a write may well fail, because someone already changed it. Typically this could be handled as a new revision, just replacing the old one.

The “rendered content” problem is also very prominent in Wikis: a page is available in multiple formats: as WikiMarkup or as HTML (some even offer PDF versions, or plain text). You cannot obviously offer both from the same URL, and neither can you use HTTP parameters to return a different version. What you can do is to list both versions (WikiPage.txt, WikiPage.html) in your wiki application directory (though this doubles the size of a directory listing), or you can have a separate URL space for different types. For example, JSPWiki uses /dav/raw/ for all raw text pages, and /dav/html/ for all rendered content. A PDF renderer could use /dav/pdf/ to offer read-only versions of pages in PDF format. Zope assigns a different port to the raw content – you access the rendered content at example.com/rendered, and the source at example.com:8900/rendered. Both methods have their advantages and disadvantages – JSPWiki method allows better browsing, whereas the Zope method keeps the 1:1 correspondence between GET and WebDAV URLs.

Using WebDAV over a slow link can also be a problem. If the wiki namespace is visible as a single directory, a large wiki may need to move around very large XML messages: 20,000 pages generates a lot of XML when you do a simple list of files. The WebDAV RFC suggests a solution where the pages are grouped according to the first letter might be in order: e.g. /dav/raw/a/About.txt, /dav/t/TextFormattingRules.txt, etc. This would, unfortunately, lose the 1:1 mapping of URLs to DAV urls. A big question is also how to address the idiosyncrasies of wikis that dedicated Wiki APIs offer: a RecentChanges list, a list of backlinks, similar pages, searching, diff, etc. However, many of these are doable with simple file manipulation commands; RecentChanges is roughly equal to “ls -lRt”, searching is done by any decent operating system, and the diff command can be found from the /usr/bin directory. Note also that average users probably might not be exposed to these at all, since the wiki would not be visible as a wiki. It could also be that some things could be exposed as “special files”, such as RecentChanges. These files would not be writable, but one could read them (much like the Linux /proc -file system, which allows applications to view system information as if they were normal text files.) The only thing that is really needed would be a standard format for describing the RecentChanges – but this is something that standard syndication formats such as Atom should be able to help. A big question is also that of security. If your wiki server is available as a directory, even a dim-witted script kiddie can do “mount http://www.wikipedia.org /mnt/wikipedia && rm -rf /mnt/wikipedia/* && echo “0wn3d” > /mnt/wikipedia/MainPage”. Which may be a bit too easy, and should make most administrators a bit uneasy. WebDAV does provide its own authentication system [RFC 3744], though you can still use HTTP BASIC authentication, if you want.

An important thing noticed during the implementation phase is that many WebDAV clients seem to interpret the standard in different ways: what worked with one client does not necessarily work with other clients. For example, Mac OSX 10.4 makes multiple, constant connections to the server, which may bring quite a lot of load to it. This may be a significant hurdle in gaining acceptance for WebDAV.

Conclusions and Future Work[edit]

WebDAV could be used as a general purpose Wiki API. It does not solve the problem of the WikiMarkupMess, but it would give an underlying infrastructure for wikis to interact on a WikiMarkup level. If the wikiengine in question supported HTML import, the users could even edit Wikipages using Frontpage or other dedicated HTML editor (at the loss of some capability). However, the DAV is a lot more useful in attachment handling: it removes the need for the temporary local storage by allowing saves directly to the wiki. Attaching a file to a wiki is a matter of dragging and dropping a file to the proper DAV directory. JSPWiki engine supports basic WebDAV class 1 functionality. In the future, the DAV support will be extended, and the HTML import feature improved.

Another good question on how versioning – integral to any wiki – should be handled. WebDAV provides its own versioning scheme with RFC 3253, but it may be a bit complicated to implement.

About the author[edit]

Janne Jalkanen has been the lead developer of JSPWiki since its humble beginnings. He is keenly interested in wikis as a new enabling social platform – much like blogs, wikis are a way to take the internet back from the geeks. He currently works at Nokia Corporation, among other things investigating the possibilities of social software in a business environment. He blogs at http://www.ecyrd.com/ButtUgly/.

References[edit]

[1] RFC 3253: Versioning Extensions to WebDAV (Web Distributed Authoring and Versioning). Available on-line at http://www.ietf.org/rfc/rfc3253.txt.
[2] Goland, Y., Whitehead, E., Faizi, A., Carter, S.R. and D. Jensen, "HTTP Extensions for Distributed Authoring -- WEBDAV", RFC 2518, February 1999. Available on-line at http://www.ietf.org/rfc/rfc2518.txt
[3] Jalkanen J (et al): WikiRPCInterface: http://www.jspwiki.org/wiki/WikiRPCInterface