Wikimedia Blog/Drafts/Wikipedia in one 40GB file

From Meta, a Wikimedia project coordination wiki

[archived]

Title[edit]

Body[edit]

For the first time ever, we have released a complete dump of all encyclopedic articles of English Wikipedia, with thumbnails. The ZIM file is 40 GB and contains the current 4.5 million articles with their corresponding 3.5 millions pictures: http://download.kiwix.org/zim/wikipedia_en_all.zim.torrent

This ZIM file is easily usable on many devices like Android smartphones and Win/OSX/Linux PCs with Kiwix, or Symbian with Wikionboard.

To run your own Wikipedia, even the biggest one, you don't need modern computers with big CPUs. A (read-only) Wikipedia mirror can be run on a ~100USD RaspberryPi, thanks to a ZIM file and Kiwix (in its version called kiwix-serve).

Like always, we also provide a packaged version (for the main PC systems) which includes fulltext search index+ZIM file+binaries: http://download.kiwix.org/portable/wikipedia_en_all.zip.torrent

What is also interesting to point out is that the file was generated in less than two weeks thanks to recent innovations:

  • the new infrastructure producing rich HTML output for the VisualEditor;
  • a script able to fetch it live;
  • a solution able to compile any local HTML directory to a ZIM file.

This is at last an efficient way to generate new ZIM files: we "only" have to use it to release regular updates for all Wikimedia projects, probably the oldest and most important problem we still face at Kiwix. We're closest than ever to the solution of a problem now 6 years old.

All this would not have been possible without the support of:

  • Wikimedia CH and the "ZIM autobuild" project;
  • Wikimedia France and the Afripedia project;
  • Gabriel Wicke from the WMF Parsoid development team.

With enough volunteer developers we'll be able to preserve other aspects of the typical online experience:

  • table of contents on each page;
  • categories to browse pages by topic;
  • JavaScript and CSS resources.

Emmanuel -->