Data dumps/Download tools
Downloading the XML dumps
Once you've decided what files to download, it's important to pick the correct server, probably one of the mirrors. Mirrors may be much closer to you and are usually less overloaded than dumps.wikimedia.org, which also enforces strict connection and speed limits.
For the download you can use any download manager, but you may prefer a standard command-line downloader like wget or curl which handles URL selection, resuming, retrying etc. For instance, to download the latest full dump of a wiki (Meta-Wiki in the example) from the source server, in 7z format to save on size and decompression time:
wget --recursive --no-parent --no-directories --continue --accept 7z https://dumps.wikimedia.org/metawiki/latest/
or in short:
wget -r -np -nd -c -A 7z https://dumps.wikimedia.org/metawiki/latest/
axel --num-connections=3 https://dumps.wikimedia.org/metawiki/latest/metawiki-latest-pages-meta-current.xml.bz2
If you need to download several files over multiple connections, look into xargs.
You can download media bundles for a project or use rsync to pick up media from one of our mirror sites.
Alternatively, you can use the Wikix program to read any XML dump and create a series of parallel download scripts which will run on a Linux based system. The Wikix program requires that you have the curl program installed on your Linux distribution.
The WikiTeam software provides similar capabilities as well.
Downloading XML dumps and access logs
The open source package QUAC has scripts wp-get-dumps and wp-get-access that use rsync to download from mirrors.