Community Wishlist Survey 2019/Admins and patrollers/Make undeletion page ID sensitive

From Meta, a Wikimedia project coordination wiki

Make undeletion page ID sensitive

  • Problem: The practice of selecting revisions on Special:Undelete to be restored is an outdated practice that is problematic nowadays for various reasons.

    The first thing one might have noticed in MediaWiki 1.31 is that parent IDs are now preserved on undeletions. But this still has some problems. The first problem, and the most serious one, is that it can cause page histories to be very confusing. For example, when viewing the history of Draft:Marques Monroe on Wikipedia, you will see that many size differences do not look "right". Allowing rev_parent_id to point to a revision in another page's history would just make things worse (in particular, if a revision from page A had rev_parent_id pointing to a revision from page B and B is deleted, the parent ID would become broken). Another problem is that if one selects a revision without selecting the previous revision, undeletion might result in a revision with a broken parent ID (T193211). For example, on this wiki, revision 18325584 has rev_parent_id 18325581, which is a deleted revision ID. Yet another problem is that there are already many revisions, e.g. on Wikipedia, where rev_parent_id either incorrectly points to a later revision (T38976; particularly imported revisions, such as the oldest 7 revisions in the history of the "2002" article) or is incorrectly set to zero (e.g. the revisions from June 2005 in the history of the "2006 in video gaming" article), and forcing parent IDs to be preserved on undeletions would make it impossible to solve such problems. Finally, one more problem is that for revisions deleted in MediaWiki 1.18 or earlier, which did not have the ar_parent_id field filled in, the pre-1.31 behavior of using the "getPreviousRevisionId" function would still be followed, which is inconsistent with the behavior for revisions deleted in newer MediaWiki versions.

    Setting the parent ID issues aside, there are still a few more issues with the current undeletion schema. One problem is that Special:Undelete lists all deleted revisions for a given title without indicating which deleted page ID they originally belonged to. It also allows two unrelated page histories to be "mixed" or "merged" too easily, which would cause confusion regardless of what happens with the parent IDs. Another problem is that timestamps, not revision IDs, are used to identify deleted revisions, meaning that two revisions with the same timestamp could not be separated (T39465). Finally, one more problem is that for pages with thousands of deleted revisions, Special:Undelete would list too many revisions all on one page.

  • Who would benefit: Administrators, particularly those who do history merges or imports
  • Proposed solution: See the tasks below for more details. We would then have a "pagearchive" table with columns named "pa_id", "pa_namespace", "pa_title", "pa_page_id", and "pa_rev_count", and the archive table would have a new column named "ar_pa_id". Also, a script that migrates the "ar_namespace", "ar_title", and "ar_page_id" columns to the new table would then be created. Then another script would be created that would de-duplicate page IDs in the "pagearchive" table, and also de-duplicate those that duplicate the ID of existing pages or an existing rev_page field. Finally, one more script would be created that would populate missing pa_page_id fields that previously corresponded to missing ar_page_id fields.

    Special:Undelete would not list deleted revisions anymore. Instead, it would list deleted page IDs from the "pagearchive" table as radio buttons, and deleted revisions, their diffs, and deleted page histories could be viewed directly by using the "oldid" and "curid" parameters in the URL. It would also allow the page to be automatically moved to another title of the user's choice after restoration, and this would be mandatory when the title of the page one is trying to restore already exists. In the latter case, the existing page would be temporarily deleted to allow for the restoration of the selected page ID. Also, Special:DeletedContributions would be fixed to look more like Special:Contributions, by displaying size differences and, for deleted revisions where the ar_parent_id is zero, the "new page" mark.

    Importing revisions to an existing page title would be allowed only when they are either all later than the page's latest revision, all earlier than the page's oldest revision, or all fit between two consecutive revisions in the page's history. In the latter two cases, the rev_parent_id field for the first revision following the imported revisions would automatically be changed to the ID of the latest imported revision.

    Special:MergeHistory would still exist, but would also have another restriction, namely that it could not be used when all of the revisions from the source page's history would become part of the target page's history. Also, the first revisions following the merged revisions in both pages' histories would automatically have their parent IDs swapped, so that n and 0 would become 0 and n.

    Two new special pages named "Special:SplitHistory" and "Special:MergeAndMove" would be created. For the former, a "SplitHistory" class would be created that would contain a function that splits the history of a deleted page ID by changing the ar_parent_id field for a single deleted revision to zero and assigning a new page ID to that deleted revision and all later ones. For an existing page, "Special:SplitHistory" would ask you to choose a cut-off point, which title some revisions should be moved to, and whether the earlier or the later revisions should be moved there. For the latter, a "MergeAndMove" class would be created that would contain a function that first changes the rev_parent_id field for the oldest revision in the target page's (henceforth called "B") history to coincide with the page_latest field for the source page (henceforth called "A"), then changes the rev_page field for all of B's revisions to A's page ID, deletes B from the page table, and finally moves A to B to complete a history merge.

    The "populateParentId" script would be modified so that it could not only populate rev_parent_id fields, but also ar_parent_id fields. And it would also generate a new parent ID for all existing and deleted revisions that already had one to fix many parent ID problems.

    Finally, a clean-up script would be created that would permanently delete some revisions from the revision and archive tables. When duplicated revisions (those with the same rev_page or ar_pa_id, same timestamp, same comment, same minority status, and same SHA1) occur, the script would keep the smallest rev_id or ar_rev_id only, and would automatically remove the RevisionDelete status from that revision if the text, edit summary, and username or IP address are all hidden. An example of where the latter would occur is the history of the "Cecilia Skingsley" article on Wikipedia. For revisions by IP addresses, the script would also delete the revisions from the "ip_changes" table.


I note that T20493 may be relevant, in that it contemplates changing how deletion works in a much less overcomplicated manner. Anomie (talk) 12:07, 1 November 2018 (UTC)Reply[reply]

Hi GeoffreyT2000, I left a message for you and MER-C on Community Wishlist Survey 2019/Admins and stewards/Feature parity for tools dealing with deleted revisions. -- DannyH (WMF) (talk) 01:29, 14 November 2018 (UTC)Reply[reply]