Jump to content

Data dumps/Finding older xml dumps

From Meta, a Wikimedia project coordination wiki

Wikimedia does not keep dumps that it produces forever. If you are looking for dumps of the wiki projects older than a few months, here are some options:

  • If they are really old, 2010 or earlier, check our select archives. Some files are as old as 2001!
  • If they are pretty recent, odds are good you can find them at the Internet Archive. Filter by project, language and year to narrow down the list.
  • Closed wikis are still dumped. Only if the wiki name is removed completely from our list of databases do we stop generating dumps. Example: [1]
  • Deleted wikis no longer have current dumps generated, but the last old dump is usually still around. Example: [2]
  • Some mirrors may have old copies if they have not been updated in a while but this is very hit or miss. Check the list of current mirrors.
  • You can ask an appropriate mailing list if someone might have saved a copy for their research: wiki-research-l is one possibility.
  • ...?

Note that 'full' dumps contain content for all the old revisions, except for those deleted, so the most recent dump should work for you. If you are hoping to duplicate someone else's research results and need to use the exact same dataset, try contacting them to see if they have saved a private copy or have uploaded it somewhere.