Mirroring Wikimedia project XML dumps
This page coordinates the efforts for mirroring Wikimedia project XML dumps around the globe, on independent servers, similar to the GNU/Linux .isos mirror sites. See the list of mirrors below for the dumps.
- Last 5 good dumps (most desired option): 10.5 TB for 5 most recent dumps, as of March 2014.
- Last 2 good dumps: 4.2 TB, as of March 2014.
- Only most recent good dumps : 2.1 TB, as of March 2014.
- Historical archives (2 dumps per year from 2002 through 2010): 1.6T now (Aug 2012), missing some data, expect 3-4T total.
- All dumps and other data currently hosted: about 34 TB and growing, as of March 2014.
- Expect slow growth; the number of dumps we keep will not grow substantially but the number and size of projects will increase steadily.
- We are not very interested in selectively mirroring some projects or dumps.
- "Other" (pageview and other statistics): 5.2 TB, as of March 2014.
Compare this to the estimates from 2012.
|Wikimedia||All public data||Virginia, United States|
|C3SL||Last 5 good XML dumps||Curitiba, Paraná, Brazil|
|Your.org||All public data||Illinois, United States|
|Internet Archive||All public data||California, United States|
- Note: The media files in the mirror may be outdated, please use with care. Have a look at the last modified date.
|Your.org||Media (current version only)|
|Your.org||Media tarballs per project (except Commons)|
|Internet Archive||Media tarballs per day for Wikimedia Commons|
- Notes for the wikimediacommons collection
- All the Commons uploads (and their description pages in XML export format) of each day since 2004, one zip file per day, one item per month. A text file listing various errors is available for each month, as well as a CSV file with metadata about every file of each day.
- The archives are made by WikiTeam and meant to be static; an embargo of about 6 months is followed, in order to upload months which are mostly cleaned up. Archives up to early 2013 have been uploaded in August-October 2013 so they reflect the status of the time. After logging in, you can see a table with details about all items.
- See Downloading in bulk using wget for official HTTP download instructions. Download via torrent, however, is supposed to be faster and is highly recommended (you need a client which supports webseeding, to download from archive.org's 3 webseeds): there is one torrent per item and an (outdated) torrent file to download all torrent files at once.
- Please join our distributed effort, download and reseed one torrent.
- Individual images can be downloaded as well thanks to the on-the-fly unzipper, by looking for the specific filename in the specific zip file, e.g.  for File:Quail1.PNG.
- Other notes
- A partial 2012 snapshot for over 100 wikis is available at archive.org.
- For an unofficial listing of torrents, see data dump torrents.
Pageview stats, MediaWiki tarballs, other files
The nd.edu site is restricted to certain institutions with Internet2/ESnet/Geant connectivity, but those with access (primarily academics and researchers) will have high bandwidth downloads.
|Your.Org||Everything in the other/ directory, it seems.|
|Center for Research Computing, University of Notre Dame||Wikidata entity dumps, pageview and other stats, Picture of the Year tarballs, Kiwix openzim files, other. Restricted ESnet/Geant/I2 access only!|
If you are a hosting organization and want to volunteer, please send email to ariel -at- wikimedia.org with XML dumps mirror somewhere in the subject line.
If you are brainstorming organizations that might be interested, see discussion page.
- Wikipedia:Database download
- Data dumps
- wikitech:Backup procedures
- en:User:Emijrp/Wikipedia Archive