Post-parse link colouring

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

The problem: to colour links either red or blue, efficiently.

Currently, Skin::makeLinkObj() calls LinkCache::addLink(), which checks if the links exists. Skin::makeLinkObj() then calls either Skin::makeKnownLinkObj() or Skin::makeBrokenLinkObj(), depending on whether the link is red or blue. Those functions return the relevant HTML.

Instead, I want Skin::makeLinkObj() to return a placeholder, containing the title to be linked to, but not containing the colour. It won't call either of the two specific functions.

Currently OutputPage::output() takes the accumulated HTML and prints it out, with the help of outputPage() of the current skin. I want it to first run the acculumulated HTML through another function, before outputting the HTML with the skin. The function will assemble a list of titles which are linked to. There will be a table in the database containing nothing but titles (prefixed DB keys, to be precise) and article IDs copied from the cur_id field. The existence of titles is checked by a single big query which looks like this: "SELECT ttl_title FROM titlecache WHERE ttl_title IN ('title1','title2',...)". It will then call makeKnownLinkObj() or makeBrokenLinkObj() for each placeholder, thus replacing the temporary placeholder with the actual HTML.

Updating the table[edit]

Update code for the title cache will have to be added to the following places:

  1. Article creation (Article.php)
  2. Deletion (Article.php)
  3. Undeletion (SpecialUndelete.php, but you should be able to put it in common code in Article.php)
  4. Page move (SpecialMovepage.php)
  5. An initialisation script for upgrading old databases, or for repairing corrupted or lost title cache tables. (script in the maintenance directory preferably called from config/index.php)


There are actually three types of links: broken, known and stub. What constitutes a "stub" depends on user preferences -- there is a stub threshold, if an article is smaller than that number of bytes, it will be coloured brown. Marking stubs isn't very important -- if it seems to hard to implement them then just comment them out or forget them. There's basically two options:

  1. Do a "SELECT LENGTH(cur_text)" query for every link on every page view. That's how it's done now, so at least we won't have lost anything.
  2. Store the length of the article in the title cache. This is complicated, because it then means you have to update the title cache on every edit. And edits are done in a few different ways. Off the top of my head: ordinary edits, rollbacks, undeletions and log page entries.

Another complication is that sometimes Skin::makeLinkObj() is called from outputPage() of the skin, that is, after the transformation and link loading has happened. It does this for navigation links, in the sidebar and elsewhere. So once the transformation has occurred, you'll have to set some kind of flag telling Skin::makeLinkObj() to do things the old way.

Currently, Skin::makeLinkObj() saves its results in the LinkCache object, recording which links are broken and which are known. The cache thus generated is used to update the links tables when a page is saved. For more information on this rationale, see docs/linkcache.doc in CVS. This happens in a single, well-isolated location in Article::showArticle(). The easy thing to do would be to disable the new behaviour of Skin::makeLinkObj() and revert back to the old way, just like for the navigation link problem above. The hard way would be to rethink the way we regenerate link tables.

Obsolete code[edit]

This code will largely obsolete LinkCache::preFill(). However it may be necessary to leave it in for the time being, to support incremental link table updates.

Optional extensions for extra efficiency[edit]

  • Have Skin::makeLinkObj() add its titles to an array stored somewhere. That way you don't have to scan the text to find which titles you need to load in OutputPage::output(). Note that this array will have to be saved to the parser cache, so perhaps it's best if the array is put in the ParserOutput object.
  • Somehow work out the string offset of each link, so that you don't have to search for them. I don't think that will be useful for ordinary page views, but it may speed things up to very slightly to save this information into the parser cache. See ParserCache.php.
  • Remove LinkCache::preFill(), find some more elegant way to perform incremental link table updates. This means removing linkscc too, and all code which invalidates or updates linkscc.
  • Somehow preload the existence of links which are commonly found in the navigation areas, and use that information when Skin::outputPage() calls Skin::makeLinkObj().
  • Currently, the parser cache is invalidated whenever any link colours on that page change, because link colours used to be saved into the parser cache. Since this feature will cause placeholders to be saved instead, the parser cache will still be valid even if a linked-to page is created or deleted. To prevent unnecessary invalidation will require adding another field to the cur table, similar to cur_touched except updated only when there is a change other than link colours.