Community Wishlist Survey 2021/Archive/Set maximum delay in updating category membership

From Meta, a Wikimedia project coordination wiki

Set maximum delay in updating category membership

NoN Outside the scope of Community Tech

  • Problem: When a change is made to a template or module that involves category membership, pages that transclude that template or module require a null edit in order to update their category membership. Because of delays in the job queue, such category membership changes can take weeks, or even months.
Changes to the underlying MediaWiki software that apply categories (e.g. those in Special:TrackingCategories) do not force pages into the job queue, which means that category membership for affected pages can take months, years, or forever.
These delays cause outdated information, missing information, and outright errors to be rendered for readers, and cause editors who are working on fixing problems identified by maintenance categories to be delayed in applying those fixes. When a maintenance category should be populated but is empty, it gives editors the false impression that all affected articles are working properly.
  • Who would benefit: This change can be applied to all WMF wikis. Readers would benefit from seeing up-to-date renderings of articles and fully populated categories. Editors would benefit from knowing that maintenance categories reflect the true state of affected pages, at least after a known period of time. Editors who run null edit bots and similar workarounds could spend their time doing more productive work.
  • Proposed solution: Set up a background process that tracks all pages based on their last edit time stamp, including null edits. Use that tracking to make a list of needed null edits for "stale" pages.
  • More comments: The "stale" age tolerance may have to be adjusted based on the size of a given wiki. A small wiki might be able to keep pages refreshed to no older than one week, but a larger wiki might need to have a limit of one month. There could be a different limit for pages in different namespaces; articles could have a shorter "maximum stale time" than user pages, for example.
There are secondary benefits to applying null edits to all stale pages periodically, including updated rendering of templates, refreshing of ages when a template calculates an age based on a birth date, and more.

Discussion

Corrections to my technical description of the problem are welcome. I may not have the exact mechanisms right. There are detailed discussions in the linked phabricator tickets. Jonesey95 (talk) 21:37, 16 November 2020 (UTC)[reply]

  • I totally agree with this issue. Category update is sometime still not done after months and require a null-edit for being applied. An example of this behavior on WP:FR is templates categories included from the documentation sub-page: some templates (but not all... couldn't find the common element) which are categorized by their doc page do not appear in their category, leading bot maintenance to constantly report false positives for uncategorized templates. This categorization issue is very bad as when viewing the affected page, the category does appear at the bottom of the page, but actually, when looking at the category, the page is missing, which is a very misleading behavior that renders identification of the error very hard to find. Epok (talk) 07:12, 17 November 2020 (UTC)[reply]
  • There's no evil setting that delays updating, the problem is in job queue that always has lots of work to do, category membership or not. Several WMF teams have been working on improving the queue for years, so I'm not sure this is a good candidate for a wish that is supposed to take about a month worth of team's time. Max Semenik (talk) 10:34, 17 November 2020 (UTC)[reply]
    • I added a couple more related phab tickets. As I wrote above, even if the job queue were instant, I don't think this problem would be solved for updates to MediaWiki code. See the 17 Feb 2017 comment in task T157670 for details; millions of pages had gone unrefreshed for years. I am sure that I oversimplify, but it doesn't seem like a full-month task to set up a system that (1) maintains a list of stale pages, and (2) automatically null-edits pages that are stale beyond a specified age. Failing that, or if the problem really is too complex for a team of smart developers to resolve, there is a proposed workaround that would allow local bot operators to run null edits. That workaround probably would not take a month to implement. Jonesey95 (talk) 14:53, 17 November 2020 (UTC) (updated 20:05, 19 November 2020 (UTC))[reply]
  • This annoying problem exists for years now. I ran many thousands of touches on Commons using Pywikibot for to fix that partially. That's not a cosmetical problem: Cats like c:Category:Broken category redirects or c:Category:Non-empty disambiguation categories (for example, there are more) become unusable if new entries are not added automatically as they should. --Achim (talk) 18:53, 19 November 2020 (UTC)[reply]
  • We agree this is a major issue that needs to be addressed, however the engineering challenges unfortunately are beyond the scope of our team. The short-term solution might be to create a bot to make null edits to effected categories (there are several similar bots on English Wikipedia already, I think), but you can really only go so far before running into the same performance issues. As much as we'd like to, I don't think our team can help here :( Sorry! Thanks for participating in the survey, MusikAnimal (WMF) (talk) 00:57, 3 December 2020 (UTC)[reply]