Fundraising 2007/Why Give blog/Engines of Collaboration: A Look Under the Hood of Wikimedia

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search

This blog entry has been posted as part of the Wikimedia Fundraising 2007 blog. Please see Fundraising 2007/Why Give blog for more details on these blog posts.


What's inside the black box that makes Wikipedia work?

For many people, Wikipedia is a black box. You put in a query and get out information. Those who understand Wikipedia as a volunteer-driven project might put it differently: You put in a bunch of smart people, and you get an encyclopedia. ;-) But what's the black box, and how does it work?

A large part of Wikipedia's success can be attributed to its social policies and principles, and perhaps we'll explore those in a future post. Today, I'd like to take a look at the key technical mechanisms underlying the encyclopedia and its sister projects. Wikipedia is a wiki, a database open to revision by anyone. The "edit" link gives you instant write access to the contents of almost any article. How can this fundamental openness result in anything useful? I'd say the following technical mechanisms are critical:

  • Wiki syntax. This is the code wikis are written in. It's simpler than HTML, but more complex than plain text. If you wanted to substantially contribute to Wikipedia, you'd have to learn at least the basics of wiki syntax, but there's plenty of help and tutorials to get you started.
  • Eternal memory. Wikipedia preserves a record of every change to an article ever made, allowing editors to instantly revert changes if they want. (Jon Udell's Heavy Metal Umlaut video is a good visual explanation of this principle.) Beyond content changes, even administrative actions like deletions, user blocks, or protection of pages can always be undone and are fully logged, to ensure the impossibility of "lasting damage".
  • Total surveillance. OK, I'm exaggerating a bit for dramatic effect - we take privacy very seriously. But all changes to Wikipedia can be directly linked to the user account or IP address who made them, and we have tons of tools that help us in the day-to-day patrolling of the content that goes into Wikimedia projects.
  • Discussion pages. A social tool, the discussion page associated with every article is of critical importance to develop consensus in decision making.
  • Users as toolmakers. One of the cool things about wikis is that users can create their own processes. For example, one of our key quality assurance processes, "Featured article candidates", is nothing but a wiki page where users nominate articles as high quality, and discuss these nominations. More on empowering users below.

It's instructive to compare Wikipedia to its predecessor. Nupedia was Jimmy Wales' first encyclopedia project, and it failed dramatically. Unlike Wikipedia, Nupedia implemented a rigorous, top down peer review process -- and when the project was quietly discontinued in 2003, it had produced a mere 24 articles. Wikipedia's openness is the key to its success, and it is counterbalanced by the tools and policies regulating all changes to the content. But who controls the engine that makes it all work?

Code is Law[edit]

I've always found the word "software" to be somewhat ridiculous: there's nothing particular soft about it, nor is it any kind of "ware". Computer programs dominate so many aspects of our daily life today, yet we hide them in artificial obscurity. They are tools, sure, but they also have a regulatory function, especially in social spaces. The possible interactions of any online community are deeply affected by the computer code that underpins it. I prefer the word "code" to "software". As scholar Larry Lessig observed, computer code is comparable to legal code in its effects on (networked) society.

That makes it doubly important that code can be inspected, looked at. This is the core code that runs Wikipedia today. It is known as "MediaWiki", a deeply misguided play on the word "Wikimedia". The code is available under a free software / open source license, known as the GNU General Public License which, once again, allows anyone to share and modify it, provided they make all their own changes freely available.

The code is written in a programming language called PHP, which is also free and open. It's also free to learn how to use it. This means anyone with the time and inclination can contribute to making the Wikipedia code better -- browse around the MediaWiki website for more information.

And this is exactly what's happened. For most of its history, Wikipedia has had no paid employees. Recently, the Wikimedia Foundation hired its two most prolific volunteer programmers, Brion Vibber and Tim Starling. Their contributions are immense; and there are countless other individuals and companies working on the code as well. Perhaps I'm exaggerating, but I often say that the MediaWiki software is as important to the future of free knowledge and open learning as the Linux kernel is to the future of computing.

Donating to the Wikimedia Foundation will allow us to hire more developers to systematically improve the MediaWiki codebase in key areas which, in turn, will improve the encyclopedia and its sister projects. But before I elaborate on some of the things we could do in the future, it might help to explain how a few key technological changes shaped our projects in the past.

Milestones of MediaWiki[edit]

Wikipedia, today, has plenty of multimedia. Images in particular adorn hundreds of thousands of pages -- some of them of truly brilliant quality. This wasn't always the case, and there were a few key improvements to our codebase that led to an explosion in the use of images on the site. For example, in March 2004, it became possible to automatically generate small and large versions of images, and features for galleries as well as vector graphics support followed.

In September 2004, we created a multimedia repository called Wikimedia Commons, which now hosts more than 2 million freely usable pictures, sound files, and videos. Technically, one of the keys to its success was the ability to instantly embed any image from Commons on any Wikimedia project in any language. Very recently, Tim Starling implemented an embedded video and audio player, and the number of videos and sounds embedded into articles has grown substantially since.

Another critical change was the implementation of a new categorization system in summer 2004, led by Magnus Manske and Brion Vibber. Today, we have a gigantic categorical index. When the category system was first implemented, it was fascinating how a single feature change led to an explosion in content: in just a few days, thousands of categories were created out of nothing.

In order to make Wikipedia available in many languages, and to improve its usability, an undeniably critical feature was the ability to edit all user interface texts (like the links in the left-hand sidebar on Wikipedia) through the wiki itself. But we take this principle of openness to revision even further: Our software can be reprogrammed by anyone, directly through the wiki! :-) Don't believe me? Take a look at Lupin's navigation popups tool, which fundamentally changes the way you browse Wikipedia.

How is it done? Essentially, our software allows you to tell your web browser (Firefox, Internet Explorer, or whatever) to execute a little script whenever you visit a Wikipedia page. These programs can be enormously complex, and make Wikipedia much friendlier to use. Of course, for security reasons, none of these scripts will be run unless you follow the explicit instructions to activate them.

Once again, code is law: If we had not given our users the ability to write these scripts, they would never have been created -- and consequently, Wikipedia would be a different place than it is today. These are just a few examples, and you can read more about the evolution of MediaWiki in its Wikipedia article. Now imagine what we could do if we employed not two, but 10 software developers. I'll help. :-)

The Future of Collaboration[edit]

Mind you, I'm not suggesting that our codebase should not continue to be improved through massive volunteer collaboration. In fact, I believe much of our effort should be focused on integrating and improving the work of others. After reading the above, it should not come as a surprise that MediaWiki can be heavily customized with plug-ins that add additional functionality. They are different from the browser-side scripts I referenced above, and potentially much more powerful still.

Take a look at the vast number of extensions out there. Some have enormous potential: The Semantic MediaWiki extension, for example, alters the way wikis handle structured data like the numerical information in infoboxes you find in Wikipedia articles. Imagine if you could use Wikipedia not just as an encyclopedia, but as a giant database, searchable in every conceivable way: "Show me countries with a population smaller than 10,000." -- "Show me the latest albums by punk rock bands." -- "Generate a graphical timeline of all Roman emperors."

Or, if that doesn't excite you, how about making Wikipedia more user-friendly? LiquidThreads, a project I am involved in, reinvents discussion pages to make them much simpler to use. There have also been many attempts to build rich-text editors for Wikipedia. Personally, I think (due to the complexity of everything that we can do with our current wiki syntax) it will take a very substantial investment of resources to really push usability a large step forward, but there are always incremental improvements we can make with less effort.

There are other cool extensions which have been lingering, sadly, unused for years. For security reasons, we have never deployed WikiTeX, which would make it easier for our editors to add musical scores, graphs and plots, chemical formulae, and similar content to Wikipedia articles.

In many of these cases, what is needed is a final push: security and scalability work, integration, testing, documentation. In other words, the parts of the work that are least exciting. Frequently, authors of MediaWiki plug-ins only seek to satisfy their own personal needs, by getting the extension to run in an independent wiki environment they have created. That's why the Wikimedia Foundation needs to be able to put some money into adapting and implementing the best and most significant tools.

There are also internal strategic priorities, projects that are so important we can't necessarily depend on volunteers to make them happen. Here are a few:

  • Flagged Revisions. This toolset will allow us to empower contributors to identify the versions of Wikipedia articles that are known to be of high quality. Readers can then choose whether they want to see the very latest version of an article (which might contain vandalism), or only the most recently reviewed one. Finalizing the implementation of FlaggedRevs is part of our quality initiative. But to give you an idea how limited our resources are, we had to pull our developers off this project just to make sure that we could get the technical work for this fundraiser done! Truly, every donation would help us in our ability to execute key initiatives like this, making Wikipedia more useful and better for you.
  • Cross-project integration. Right now, every single Wikimedia project has a separate user account database. Want to fix an article from the German Wikipedia? If you only have an account in the English one, you'll have to create a new one! This is not an easy problem to solve: thousands of account names exist in multiple projects, so we need to merge identical accounts and split non-identical ones. Fortunately, much work on this has already been done, but more remains. And once the account databases are integrated, there is potential for many more exciting features -- like the ability to change content in Wikinews from Wikipedia, to upload pictures to Commons directly from Wikibooks, etc. In this way, we can bring the family of Wikimedia projects much closer together.
  • Wiki-to-print and export technology. Right now, we're not offering a lot of tools to make it easy for you to print or download collections of articles. This will soon change, through an exciting collaboration that will be announced within the next few weeks. It will make it easy to download high quality PDFs of selected articles. We're aiming to also support export to word processor formats. But even this is only the beginning -- there's a whole bunch of tools that would make it easier for the Wikibooks project to create high quality, open access textbooks. This technology is key for the developing world, so that we can distribute free knowledge in whatever formats are most helpful.
  • Mix & Burn Wikipedia. Related to the above, we want to make it possible to easily create your own Wikipedia/Wikimedia DVD or USB stick -- either including all articles, or a selection. This would require a reader application that runs without Internet access. Fortunately, there are already many projects in this space that, once again, just need a final push. Now, imagine that such an application would not only make it possible to read articles, but also to change them and to synchronize the changes back once you have an Internet connection -- this would enable us to make participatory Wikipedia terminals anywhere in the world.

Once again, these are just a few examples. I believe that the future of collaboration is much greater still: there will be real-time collaboration on articles, even on images and video. Wikipedians will talk to each other via Voice over IP while editing articles, and Wikiversity could become a global free institution of learning using the same tools for global teacher/learner interaction, connecting people who have knowledge with those who seek it. Wikinews could turn into a global virtual newsroom, making it possible to instantly record any event as a "citizen journalist", and to collaborate with others to tell the full story.

Our donation banner proclaims: "You can help Wikimedia to change the world". Indeed, by supporting us in this fundraising drive, you will allow us to do more than just keeping Wikipedia running. A donation to the non-profit Wikimedia Foundation is a donation for the future of learning. Every donation helps, and if you want to make a major gift, please contact us at: majordonors AT wikimedia DOT org