Page metadata
The following page and revision information is available in XML data dumps from dumps.wikimedia.org. For the XML schema, see the appropriate version of the MediaWiki export XSD.
Which files to use
[edit]To learn which pages/content are included in which XML dump, refer to Data dumps/What's available for download#XML files.
To understand the XML filenames we use (such as "pages", "stub", and "meta") refer to Data dumps/FAQ#What do the affixes mean?
For a sample and explaination of the XML format used in these dumps, refer to Data dumps/Dump format.
Page metadata
[edit]- Title of the page
- Namespace of the page
- Page id (for old versions these are shown in the URLs of page history links)
- If the page is a redirect, the title of the redirect
The entire page table dump is also available; see SQL dumps and MediaWiki database layout to learn more page metadata.
Revision metadata
[edit]- ID of the revision
- If the edit was marked as a 'minor' revision by the editor
- Date and time the edit was made
- Username and user id, or IP address of the editor
- Comment left by the editor when the edit was saved
- Length in bytes of the revision content
- Sha1 sum of the revision content
- Revision id of the previous (parent) revision
- Content model of the revision (is it wikitext? json?)
- Content format of the revision
Additionally the ID of the related entry in the text table is provided.
Content
[edit]In content dumps, almost all of the same metadata is provided, and the full content of included revisions is also written.
Not available in the XML files
[edit]Other metadata about a page is available in the aforementioned page table dump only, and includes:
- If the page is protected
- Whether the page is newly created or has more than one revision
- Id of the most recent revision of the page
- Length in bytes of the content
- Content model of the page
- Content language of the page
See also
[edit]- Wikipedia:Edit summary#Places where the edit summary appears, lists of edit metadata.
- mw:Help:export
- RDF metadata