Data dumps/What's available for download

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Available for download per project[edit]

See also : Database_field_prefixes

Everything but the media bundles in the list below is available from or one of our mirror sites. Some items are also available for download as torrents, created by users.

  • Most database tables as sql files: *.sql.gz files match the name of the corresponding tables, see database layout.
    • Page-to-page link lists (pagelinks, categorylinks, imagelinks, templatelinks tables)
    • Lists of pages with links outside of the project (externallinks, iwlinks, langlinks tables)
    • Media metadata (image, oldimage tables)
    • Info about each page (page, page_props, page_restrictions tables)
    • Titles of all pages in the main namespace, i.e. all articles (*-all-titles-in-ns0.gz)
    • List of all pages that are redirects and their targets (redirect table)
    • Log data, including blocks, protection, deletion, uploads (logging table)
    • Misc bits (interwiki, site_stats, user_groups tables)
  • Text of current or all revisions of all pages, as an XML file
  • Metadata about each page and current or all revisions, as an XML file
  • Media bundles for each project, separated into files uploaded to the project and files from Commons

Projects with Flagged Revisions enabled have the corresponding tables available for download.

Sql files for testing, generated from the page metadata and page content XML files have been made available for the February 2013 dump run of the English language Wikipedia, for use with MediaWiki 1.20 [1]. Before blindly using them, please note that these do not have the usual drop/create tables stanzas at the beginning. We hope to make these available for every project on a regular basis.

Tab-delimited files for use with MySQL's LOAD DATA INFILE, generated form the Sql files for the February 2013 dump run of the English language Wikipedia are also available for testing [2] for MediaWiki 1.20. We hope to make these available for all projects on a regular basis as well.

Downloading media[edit]

Media bundles for each project are available from a mirror site, via http, ftp or rsync: see Media tarballs on our list of mirrors.. If you want to browse or retrieve the original media as individual files, that's available too; see Media on our list of mirrors.

The Wikimedia Foundation has permission to use certain images, and many of the fair use images are borderline in terms of whether they can be used or not off Wikipedia. If you choose to download the image base, you do so at your own risk and assume all liability for the use of any images on the main Wikipedia site. The Wikipedia Community vigorously police the site and remove infringing images daily, however, it is always possible that some images may escape this extraordinary level of vigilance and end up on the site for a short time. As of February of 2007, the entire collection of images produce a compressed tar.gz file of over 213 GB (gigabytes). As of November 2011 the image and other media files take up about 17T, most of it already compressed media.

Data not available for download[edit]

Some data is not available because it's private. This includes user data such as passwords, e-mail addresses, preferences, watchlists, etc. Likewise, deleted or suppressed content is not available for download; it may have contained spam, personally identifying information, copyright violations or other sensitive material.

It's not clear how a full right to fork could be guaranteed.

Wish list[edit]

Some things people want are on a wish list of other items, which you can add to if you like.