Mirroring Wikimedia project XML dumps

From Meta, a Wikimedia project coordination wiki
This is an archived version of this page, as edited by ArielGlenn (talk | contribs) at 10:58, 25 February 2013. It may differ significantly from the current version.

This page coordinates the efforts for mirroring Wikimedia project XML dumps around the globe, on independent servers, similar to the GNU/Linux .isos mirror sites. See the list of mirrors below for the dumps.

Requirements

Space

  • Last 5 good dumps (most desired option): 8.1 TB for 5 most recent dumps, as of Dec 2012.
  • Last 2 good dumps: 2.5 TB, as of Jan 2012.
  • Only most recent good dumps : 1.3 TB, as of Jan 2012.
  • Historical archives (2 dumps per year from 2002 through 2010): 1.6T now (Jan 2012), missing some data, expect 3-4T total.
  • All dumps currently hosted: about 27TB and growing, as of Jan 2012.
    Expect slow growth; the number of dumps we keep will not grow substiantially but the number and size of projects will increase steadily.
    We are not very interested in selectively mirroring some projects or dumps.
  • "Other" (pageview and other statistics): 2.3T, as of Jan 2012, allow growth to 4-5T.

Bandwidth

Wikimedia provides about 20 MB/s via dataset2.wikimedia.org (stats) for XML dumps, as of January 2011.

Current Mirrors

Dumps

Organisation Contents Location HTTP access FTP access rsync URL
Wikimedia All public data Florida, United States http://dumps.wikimedia.org none none
C3SL Last 5 good XML dumps Curitiba, Paraná, Brazil http://wikipedia.c3sl.ufpr.br ftp://wikipedia.c3sl.ufpr.br/wikipedia/ rsync://wikipedia.c3sl.ufpr.br/wikipedia/
Masaryk University Last 5 good XML dumps Brno, Moravia, Czech Republic http://ftp.fi.muni.cz/pub/wikimedia/ ftp://ftp.fi.muni.cz/pub/wikimedia/ rsync://ftp.fi.muni.cz/pub/wikimedia/
Your.org All public data ??, United States http://dumps.wikimedia.your.org/ ftp://ftpmirror.your.org/pub/wikimedia/dumps/ rsync://ftpmirror.your.org/wikimedia-dumps/

Media

Organisation Contents HTTP access FTP access rsync URL
Your.org Media (current version only) http://ftpmirror.your.org/pub/wikimedia/images/ ftp://ftpmirror.your.org/pub/wikimedia/images/ rsync://ftpmirror.your.org/wikimedia-images/

Media tarballs

Organisation Contents HTTP access FTP access rsync URL
Your.org Media tarballs per project (except commons) http://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/ ftp://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/ --
For an unofficial listing of torrents, see data dump torrents.

Pageview stats, MediaWiki tarballs, other files

Organisation Contents HTTP access FTP access rsync URL
Wansecurity.com MediaWiki releases, pageview and other stats, historical XML archive, mwdumper http://wikimedia.wansec.com/ -- rsync://wikimedia.wansec.com::wikimedia/

Who can we contact for hosting a mirror of the XML dumps?

If you are a hosting organization and want to volunteer, please send email to ariel -at- wikimedia.org with XML dumps mirror somewhere in the subject line.

If you are brainstorming organizations that might be interested, see discussion page.

See also

External links