Multiple titles

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

The problem described here has been fixed by a unique index in the database in MediaWiki version 1.4. It is unknow to me (SirJective) if it can occur in version 1.5, due to the change in the database scheme.

In almost every wikipedia there are some articles which exist in multiple instances. These are database entries with the exact same title. Only one of them is accessible through the software, but all of them contribute to maintenance lists such as the list of short pages or the list of orphaned articles, leading to bogus entries.

See also this entry on the wikipedia-l mailing list.

For technical details, see this and the following mails at wikitech-l.

How they are created[edit]

These doubled entries are probably created, when a user wants to save a new article and clicks "Save page" several times. Instead of ignoring the request or saving the requests as different revisions of one article, the server creates new articles. The technical details are being discussed at wikitech-l under the subject "double database entries".

How to find them[edit]

SQL queries like this:

SELECT cur_namespace, cur_title, count(*)
FROM cur
GROUP BY cur_namespace, cur_title
HAVING count(*) > 1
LIMIT 100;

yield articles with different id, but same namespace and same title.


SELECT cur_id,cur_timestamp,cur_text
FROM cur
WHERE cur_title = 'requested_title' AND cur_namespace = requested_namespace
LIMIT 100;

to see all current versions of one specified title. Some of these versions have different timestamps and texts, some have identical text, and some even the identical timestamp.

If you're using Mysql 4.1, a better query to list duplicate article titles is:

SELECT cur_namespace, cur_title, GROUP_CONCAT( cur_id )
FROM cur
GROUP BY cur_namespace,cur_title
HAVING count(*) > 1
LIMIT 100;

Lists of duplicates[edit]

When there are redirects listed, the article in question is the redirect page, not the redirection target!

A list of duplicates in the en wikipedia can be seen here:

Lists of duplicates in other wikipedias:

How to get rid of them[edit]

1. One way would be to contact a developer to run a delete query on cur_id.

2. Another way is to delete and then undelete the article. This can be done by any sysop. Once the article is restored, all duplicate versions appear in the edit history. Duplicate versions with the same newest timestamp must be edited first, to create a singular newest version. Due to the way the undeletion function works, this is needed since otherwise all newest versions are restored.