Data dumps/mwdumper

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search


mwdumper is a standalone program for filtering and converting compressed XML dumps. It can produce output as another XML dump as well as SQL statements for inserting data directly into a database in MediaWiki's 1.4 or 1.5 schema.

Future versions of mwdumper will include support for creating a database and configuring a MediaWiki installation directly, but currently it just produces raw SQL which can be piped to MySQL.

The program is written in Java and has been tested with Sun's 1.5 JRE and GNU's GCJ 4. Source is in our CVS; a precompiled .jar is available at

Be sure to review the README.txt file which is also provided. It explains the invocation options required. Friendly wiki-version of README with few additional hints is available at

NOTE: mwdumper is an unsuitable choice for an already installed MediaWiki with an active database on any of the Redhat or Fedora Core Linux Distributions. Running mwdumper against an installed MediaWiki site will result in the program inaccurately reporting it is inputting pages when in fact the mysql database is rejecting the records. Typically, the cause of the mysql commands being rejected seems related to the set-variable max_allowed_packet= setting in the mysql /etc/my.conf file. In order to use mwdumper, you must follow the attached steps if you are running on any Linux distribution with mysql 5.X or above. Make certain you have first downloaded the php-mysql extensions with mysql before installing MediaWiki. On Fedora Core 5 and later platforms, use the YUM utility to update your system.

mwdumper installation notes[edit]

Steps to get mwdumper to work with Ubuntu, Debian and FreeBSD 6.2[edit]
  1. Drop relevant tables via mysql shell (page, revision, text): i.e. mysql> truncate table page;. This will drop and recreate the table, and is MUCH faster than deleting the data from the tables themselves (delete * from table; for example). There is no need to drop the entire database.
  2. Install MediaWiki as you would normally.
  3. Start mwdumper to import the data, i.e.: java -jar mwdumper.jar --format=sql:1.5 <xml dump file name> | mysql -u root -p <dbname>
  4. Get some coffee, it could take awhile, depending on the size of your imported data.
Another approach which also works Ubuntu, Debian and FreeBSD 6.2[edit]
  1. Drop relevant tables as in step 1 in the previous example
  2. Install MediaWiki as you would normally
  3. Start mwdumper to redirect the data to a flat SQL file, i.e.: java -jar mwdumper.jar --format=sql:1.5 <xml dump file name> > output_filename.sql
  4. When that completes, pull it in with MySQL directly, i.e.: mysql <dbname> < output_filename.sql
  5. Fill a second cup of coffee, this will take longer, and will not give you any progress output like it would if you were using mwdumper directly

You can check the progress of the import as follows, from within the mysql shell: mysql> select count(*) from page; or for faster query (not accurate): mysql> explain select count(*) from page;

Some Java optimizations that might help[edit]

There are some Java switches which you can use that may help with your import speed and performance, if you're using mwdumper (google them for details on what they do):

java -Xmx512m -Xms128m -XX:NewSize=32m -XX:MaxNewSize=64m -XX:SurvivorRatio=6 -XX:+UseParallelGC -XX:GCTimeRatio=9 -XX:AdaptiveSizeDecrementScaleFactor=1 -server -jar mwdumper.jar [...]

Steps to get mwdumper to work on Redhat and Fedora Core Linux Distros with mysql 5.x[edit]
  1. Destroy any existing database: i.e. mysqladmin drop <dbname> -p (enter root db password)
  2. Recreate the database: i.e. mysqladmin create <dbname> -p (enter root db password)
  3. DO NOT install MediaWiki until the database in uploaded. If MediaWiki is installed, mwdumper may not work.
  4. Run the tables.sql function supplied with MediaWiki from the MediaWiki root directory: i.e. mysql -u root -p <dbname> < maintenance/tables.sql
  5. Start mwdumper: i.e. java -jar mwdumper.jar --format=sql:1.5 <xml dump file name> | mysql -u root -p <dbname>
  6. You will have to manually enter admin status for WikiSysop accounts and run the upgrade.php script on the MediaWiki database in order to obtain WikiSysop access or install over the previous MediaWiki installation and import the databases in order to activiate the WikiSysop account. Add the following MediaWiki PHP file to your /maintenance directory by cutting and pasting the text, and name the file createAndPromote.php, then recreate the WikiSysop account by executing the example scripts provided. The attached PHP file is for the MediaWiki 1.15 release.

file createAndPromote.php[edit]


 * Maintenance script to create an account and grant it administrator rights
 * @file
 * @ingroup Maintenance
 * @author Rob Church <>

$options = array( 'help', 'bureaucrat' );
require_once( '' );

if( isset( $options['help'] ) ) {
	exit( 1 );

if( count( $args ) < 2 ) {
	echo( "Please provide a username and password for the new account.\n" );
	die( 1 );

$username = $args[0];
$password = $args[1];

echo( wfWikiID() . ": Creating and promoting User:{$username}..." );

# Validate username and check it doesn't exist
$user = User::newFromName( $username );
if( !is_object( $user ) ) {
	echo( "invalid username.\n" );
	die( 1 );
} elseif( 0 != $user->idForName() ) {
	echo( "account exists.\n" );
	die( 1 );

# Insert the account into the database
$user->setPassword( $password );

# Promote user
$user->addGroup( 'sysop' );
if( isset( $option['bureaucrat'] ) )
	$user->addGroup( 'bureaucrat' );

# Increment site_stats.ss_users
$ssu = new SiteStatsUpdate( 0, 0, 0, 0, 1 );

echo( "done.\n" );

function showHelp() {
	echo( <<<EOT
Create a new user account with administrator rights

USAGE: php createAndPromote.php [--bureaucrat|--help] <username> <password>

		Grant the account bureaucrat rights
		Show this help information


Steps to recreate WikiSysop and add the account to groups "sysop" and "bureaucrat"[edit]

from your MediaWiki root directory, enter the following commands:

php maintenance/createAndPromote WikiSysop <password>
php maintenance/changePassword --user=WikiSysop --password=<password>

You may have to enter "password" twice in order for the account to work properly, which is why there is a call to "changePassword" after the account has been recreated and assigned sysop and bureacrat status.

mwdumper is not the correct tool if you want to maintain an existing wiki as it may not always work correctly if the MediaWiki databases have already been installed on the Fedora Core releases and may not provide useful output as to any errors occurring. Most of these problems are related to record and insertion rejection of SQL requests by the underlying MySQL database version you may be running. You may wish to test mwdumper with your particular OS distribution with a trial run to see if you encounter any of these problems. There are several fixes for some of these issues.

Known Problems[edit]

  • mwdumper will fail with: ERROR 1153 (08S01) at line 2187: Got a packet bigger than 'max_allowed_packet' bytes if your dumps contain large sections of unicode characters with Cherokee Unicode and other unicode texts and in most cases does not work at all with these dumps, even with the defaults in mysql set to utf8. One solution to this problem is to increase the max packet (request) size which can be input into mysql via INSERT commands. Try changing set-variable = max_allowed_packet=20M in your /etc/my.conf file and restart the mysqld program.
  • mwdumper does not report errors when uploading to a system with a database that is not freshly created.
  • mwdumper may not always complete the dump, even though it is reporting that it is and even if you have followed all the procedures listed here. Due to the lack of proper error handling in the program, it may be better to just run importDump.php if you encounter problems with this tool.
  • If you run into problems using mwdumper to input directly into mysql on a particular Linux Distribution or version of the mysql database, consider setting up mwdumper to convert the XML dumps into an intermediate .sql file then import the output file directly into mysql rather then allowing mwdumper to do so.
  • Try passing the '-f' switch (force) with mysql to force record insertions into your MySQL database if mysql starts rejecting updates from the mwdumper program or reports duplicate key errors.
  • You will need to repopulate some fields that mwdumper doesn't import correctly because they did not exist when the bulk of the mwdumper code was written (circa 2005). In particular, as of time of writing,at a minimum the following maintenance scripts must be run (with the --force option if necessary) following an import:
    • maintenance/populateParentId.php
    • maintenance/populateRevisionLength.php
    • maintenance/populateRevisionSha1.php

People who survived[edit]

  • akosiaris is said to have repeatedly and semi-willingly used mwdumper
  • Kelson/WMUK
  • You? Please update the page