Versioning and distributed data

From Meta, a Wikimedia project coordination wiki

Live discussion next Thursday? Interest in an IRC chat: Tom Lord, Sasha Wait, IulianU...

There are a number of "semantic wiki" ideas, suggestions for p2p distribution of a big wiki, and for how to allow different groups to maintain their own wikis, but pull updated content from a favorite source from time to time.

At present, sites that pull updates from Wikipedia do so with their own ad-hoc scripts, either scraping the site with a web-crawler or waiting for the regular updates and extracting what they want from a full database dump.

Basic ideas[edit]

  • Is there a way to let people contribute to the wiki without diving into the community? To coordinate a small local wiki about water purity with the Wikipedia category on the same subject?
  • How can we efficiently allow people to download a small set of articles (with histories)?
  • How about letting them synch their local changes with Wikipedia's database?

Some existing models[edit]

  • BitTorrent (shared distribution of database-sized files)
    Perhaps some slow & steady version of BitTorrent, with good addressing of parts of a large database, can help people download Terabyte-sized files in the background, so their local db is never more than a few weeks out of date.
  • GnuArch (two-way feedback, better versioning)
    Perhaps a better versioning implementation can help people in different communities & environments contribute to a single body of information, without having to try to merge their communities/interfaces.

Position statements[edit]

This section is for links to pre-meeting "opening statements" from participants interested in this topic.

See also[edit]