|MediaWiki Handbook: Contents, Readers, Editors, Moderators, System admins, Researchers +/-|
There are two types of import, both accessed through Special:Import:
- transwiki import, also called interwiki import: import pages directly from another wiki; the settings of the destination wiki determine which source wikis are enabled; message with id 'import-interwiki-text' (talk) appears; after "Transfer pages into namespace" one can specify a target namespace; the option "all" actually means "the same as the original".
- upload import: import a file in a special XML format produced by exporting pages from another wiki; message with id 'importtext' (talk) appears;
See the page Importer for information about the user groups performing these actions.
On many Wikimedia wikis transwiki import is disabled too, it gives message with id 'importnosources' (talk): "No wikis from which to import have been defined and direct history uploads are disabled." However, pages from commons:, foundation:, w:, cs: and fr: can currently be imported to Meta, and pages from Meta can be imported to mw:. The act of importing is added to the page history and to Special:Log/import.
If an imported page has the same name as an existing page in the target wiki, the page is overwritten if the imported page is newer (according to the timestamps). If an error occurred during the import then you may find the import is partially complete (some pages imported, but not all). Since pages are overwritten, attempting the import again should not be a problem.
If you specified to include history information, then you should also see information about the edits in the 'history' of the imported pages, and in the user contributions. The edits will not show up in 'recent changes' (neither positioned at the time of the original edit, nor at the time of importing). The effect will be similar to a full history merge and it may be quite difficult to later determine which edits were imported.
There is an option "Include all templates", which will import the templates that are on an imported page. If this is not used they will be redlinked unless a template of the same name exists on the target wiki. If this is used, templates of the same name on the target wiki will be overwritten and the import will include not only the templates on the imported page but also any templates used within those templates. It is possible to import a collection of pages by specially creating a page that transcludes them, and importing that page, with the option on. However, this should be done very carefully or it may result in far more being imported than desired - again, it will import all templates on the transcluded pages and all templates nested within those templates.
Useful applications of importing include:
- when a page is moved to another wiki and subsequently edited there, have the history together in the target wiki; this is especially useful if the source page becomes more difficult to find due to page moves etc.
- when a page is moved to another wiki and deleted on the source wiki, preserve the history.
- in order to have templates that exist on another wiki/subdomain.
- on certain projects such as wikisource and wiktionary, to move multilingual content between language subdomains.
To check whether your wiki has the transwiki right configured, and, where it is configured, to which wikis, you can query your wiki using the api.
- <wiki url>/w/api.php?action=paraminfo&modules=import
The configuration is expressed in
<param name="interwikisource" description="For interwiki imports: wiki to import from"> …
Alternatively one can check the global configurations at http://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php and look in the section wgImportSources.
If there is no transwiki configured, then seek your local community's consensus to have the transwiki configured, and to identify from which wikis you may wish to import. Requests for configuration changes should be lodged in Bugzilla: where you would create a new bug under the Wikimedia: section. You would be expected to link to your community's discussion in your bug request.
A query of the API at your local wiki would also display the usergroup rights, check for the
- <wiki url>api.php?action=query&meta=siteinfo&siprop=usergroups
Assignation of transwiki import rights
Most transwiki rights are assigned to users following a successful discussion in their community's wiki, followed up by a request to stewards at Steward requests/Permission. Some wikis have requested and been granted that local bureaucrats be able to assign this right, check with your wiki for the current situation there.
How to export, and the format of exported pages, is described at Help:export. Normally any user can export wiki pages to a file, but to import pages into a wiki from a file, you must have 'Sysop' privileges on that wiki. So if you have your own MediaWiki installation, then you should be able to see the 'Special:Import' page there. Within the Wikimedia Foundation family of wiki projects, only users with the importupload user-right can import pages into a wiki from a file; this includes only members of the "importer" group and stewards.
To import wiki pages from your computer, simply click browse to locate the file on your local file system.
Editing the import file
In the case of upload import, because of the simple readable file format the XML file can easily be edited between exporting and importing. This should be done with caution and integrity, one can make antedated edits and use false user names, and in combination with deletion, one can "change history". Applications of this editing include:
- adding a note to the edit summary about the importing
- changing user names and/or page names to avoid name conflicts (just between the title tags and between the username tags or also in links and signatures)
- changing namespace names into the generic or the applicable ones (ditto)
Note that if two versions of the page have the same timestamp (because one was uploaded with the same timestamp as a preexisting version), the later (imported) version will show up in the edit history but not in the article itself.
See mw:Manual:XML Import file manipulation in CSharp for an example of working with these XML files in Visual Studio .NET C#.
Merging histories and other complications
If the import includes history information, and the edits involved a user name which in the importing project is used by somebody else, then upload import should be applied, and the occurrences of the user name in the XML file should first be replaced by another name, to avoid ambiguity. If the user name was not used yet in the importing project then the user contributions are available anyway, although an account is not automatically created.
Just like when a page is referred to in a link, and/or put in a URL, generic namespace names are automatically converted, and if a prefix is not a namespace name the page will arrive in the main namespace. However, e.g. "Meta:" may be ignored (dropped) on a project that uses that prefix for interwiki linking. It may be desirable to change it in the XML file to "Project:" before importing.
If a page name exists already, importing revisions of a page with that name causes the page histories to be merged. Note that after inserting a revision between two existing revisions in the page history, the change made by the user who made the next edit seems different from what it actually has been: to see the actual change made by the user one has to take the diff between the two already existing revisions, not the diff with respect to the inserted one. Therefore this should not be done except to reconstruct the true page history.
A revision is not imported if a revision of the same date, and exactly the same time up to the second, exists already (beware that this doesn't seem to happen in all cases). In practice this occurs only when the revision has already been imported before, or when the revision one attempts to import was imported the other way around, or both were imported from a third site.
An edit summary may refer to, and possibly link to, another page. This may be confusing when the page has been imported but the target page has not.
The edit summary does not automatically show that the page has been imported, but in the case of upload import that can be added to the edit summaries in the XML file before importing. That can avoid some potential sources of ambiguity and/or confusion. When editing the XML file with find/replace, note that adding a text to the edit summaries requires distinguishing between edits which already have an edit summary, hence comment tags in the XML file, and those without these tags. If there are multiple pairs of comment tags, only the last one is effective.
Without provisions for user name conflicts, the user contributions list shows:
- the edits by the person registered under the user name concerned on the project
- for each wiki from which pages have been imported, the edits of imported pages before import, by the user who on the source project has the user name concerned
If at the time of import the page did not exist yet on the target site, the two can be distinguished by comparing the time of import with the time of the edit.
If the user page and user talk page do not have a user contributions link in the page margin then the user is not registered, so all their edits are imported.
For a large-scale transfer, somebody with sufficient system privileges can move data within the server, which is more practical than sending large XML files from the server to a user's local computer and then back to the server.
Large files may be rejected for two reasons. The PHP upload limit, found in PHP configuration file php.ini:
; Maximum allowed size for uploaded files. upload_max_filesize = 20M
And also the hidden variable limiting the size in the input form. Found in the mediawiki source code, includes/specials/SpecialImport.php:
<input type='hidden' name='MAX_FILE_SIZE' value='20000000' />
Maybe you should change the following four directives in php.ini:
; Maximum size of POST data that PHP will accept. post_max_size = 20M
max_execution_time = 1000 ; Maximum execution time of each script, in seconds max_input_time = 2000 ; Maximum amount of time each script may spend parsing request data
; Default timeout for socket based streams (seconds) default_socket_timeout = 2000