Wikimedia Research Network/Rationale

From Meta, a Wikimedia project coordination wiki

As of May 23, 2005, there are 274604 registered users in the English Wikipedia alone. According to Erik Zachte's Wikistats, there were 18811 active user accounts (at least 5 edits in that month) across all Wikipedia editions in April 2005 [1].

In contrast, a StatCVS analysis of the current MediaWiki codebase shows that the top 10 developers are responsible for 90.0% of all program code. Essentially, the technical infrastructure for content used by millions and created by many thousands has been written and maintained by only a handful of people.

The codebase itself has grown from about 35,000 lines of code in May 2003 to 150,000 LOC in May 2005. During the 4 1/2 years of Wikipedia, the software has turned from a simple Perl script run on a single server into a complex database application with load balancing and disk, database, memory and proxy-based caching, run on an increasingly massive server farm. Those servers are of course maintained by many of the same people responsible for MediaWiki's development.

With hundreds of wikis run by Wikimedia alone as well as hundreds of public non-Wikimedia installations and many corporate users, any change to the software has to undergo careful review for scalability, security and usability before it can be accepted. One part-time hardware assistant and one full-time developer collaborate with a team of volunteers who have lives outside Wikimedia and cannot commit to it indefinitely.

At the same time, the Wikimedia Foundation has quickly taken up one new project after the other: a dictionary, a repository of source materials, a collection of reference books, a quote collection, a biological species catalog, a news site, a media archive. An open eLearning community (Wikiversity) has been experimentally launched in German, and other projects are always under discussion.

This strategy has been good to build a global community and to stake certain claims in the growing world of free content. However, each of Wikimedia's projects has individual technical needs. This begins with Wikipedia itself. Even though a review process for Wikipedia has been under discussion since the project's inception, no reliable process is in place after more than 4 years.

Whether it is a simple news publication workflow model for Wikinews, a concept of content modularization for Wikibooks, a translation interface for Wikisource, or a data model for Wiktionary and Wikispecies; for each of our projects, it is possible to identify enhancements which could greatly increase their usefulness to readers and editors alike.

The existing team of developers is rightly focused on adapting the codebase to the rapid growth of the Wikimedia projects, fixing bugs, and keeping the servers running. The task of identifying project needs is a massive one and should not be put on the shoulders of the developers.

The process of identifying useful and necessary changes requires more than just a technical understanding of how our software works. It requires careful study of each project's processes, communication with the community, surveys, evaluation of other software solutions, and, importantly, cooperation with scientific researchers already conducting similar studies on our projects or related ones.

Beyond identifying needs, it is desirable to collaborate with outside institutions and individuals to address them: companies using MediaWiki or Wikimedia content, teachers and professors who would like to give their students interesting projects to work on, and all those who would like to support the development of our software in any way. Here, the Research Team can handle a large part of the organizational work required, while letting the developers have the final say about the merits of any contributed code.