Migration to the new preprocessor

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
Languages: English  · English · 日本語

On January 25, 2008, en.wiki and test.wiki switched to a new parser preprocessor. On February 22, 2008, the new preprocessor went live on all Wikimedia wikis. It was written by Tim Starling (MediaWiki developer and Wikimedia system administrator) to be more consistent and less flaky than the old one. This page gives a live demonstration.

To check which preprocessor is active on a MediaWiki wiki with Extension:ParserFunctions installed, use {{#if:{{#if:x|{{{2}}}|}}|new pp|old pp}}, here producing "new pp". Or, alternatively, you can check the page source for "NewPP".

Expected differences[edit]

These are differences in behavior that are expected. They are either bugs in the old parser that were corrected, or "features" in the new parser. Changes have been marked "negotiable" by Tim Starling if restoration of the old behaviour may be feasible.

Many of these are differences in the way the old parser allowed you to get away with things that the new parser is more strict on.

Test case Difference from old parser Explanation Examples/fixes Negotiable?
<h2>Unlinked header</h2> XML headers no longer make an edit link Headers created with <h1><h2><h3>... etc no longer create an editable section. They also do not create a section divide when editing sections. They do however add a level to the TOC as well as an anchor for linking. This allows you to add uneditable sections to a template, and still have editable sections on the page, which isn't possible with __NOEDITSECTION__. [1] Negotiable
<noinclude>==Unlinked header==</noinclude> Headers embedded inside noinclude/includeonly tags will now behave as XML headers (see above). The new parser has stricter header requirements. [2] [3] Not really
<!-- </includeonly> --> The includeonly/noinclude tags can now be commented out. The new parser allows these tags to be disabled inside comments, a change that has to be taken into account in template design. [4] [5] no
template with editable sections Transcluded edit section links have a T- prefix. This allows the editable sections on a page to be a bit more informative (T-(n-1) where n is the real section number). bugzilla:6563 Template:Infobox Election no
{{template|{{#if:1|a=1}}}} Old parser: {{{a}}} == "1"

New parser: {{{1}}} == "a=1"

The equals sign between parameter and value can no longer be generated (by transclusion, parameter, parserfunction, etc) as a delimiter in the template parameters, it is interpreted literally. This requires moving the equals sign outside the generator. [6] [7][8] [9] [10][11] [12] [13] no

{{#switch: {{#time:Y|blah}}
|Error: invalid time = Error

Old parser would see the #time error (a class="error") as a #switch case. Interesting bug in the old parser inadvertently utilized to catch errors. The implementors thought they were catching the plain text error of #time, but they were actually creating a new self-deleting #switch case that returned nothing. If {{#time:Y|blah}} generated an error, it generated an equals sign as part of the error, and the #switch would then see the last parameter not as a default passthrough but as a match case, that would fail, returning no output. This could also be expressed as: {{#switch:1|{{#time:Y|Z}}}} ... A new parserfunction was created for parserfunction error handling (that is, those that return errors), eg: {{#iferror:{{#time:Y|Z}}|error}} [14] no
{{#if:1|{{template{{!}}parameter}} }} {{!}} no longer works as template delimiter in Parserfunctions. This is related to bugzilla:5678 and should not have worked, and was in fact never necessary. As seen in implementations of w:Template:Self, this bug allowed parameters other than the first to pass template parameters. [15] [16][17] [18][19] not with this syntax
{{#ifeq:1|1|[[Template:{{{1}}}|foo]]}} Renders correctly in new parser. bugzilla:5678 ~ no
{{#switch:{{{g|}}}|g=|{{{g}}}}} Renders correctly in new parser. bugzilla:5678 ?? [20] [21] no
{{#if:{{{2}}}|{{{2}}}|{{PAGENAME}}}} Fails correctly in new parser. Bug in the old parser. Note that #if should never be used if there is no chance of the test case being null. Note also that this usage of a parserfunction is excessive when you can just use parameter defaults such as {{{2|{{PAGENAME}}}}} bugzilla:5678?? [22][23][24][25][26][27][28][29][30][31] no
Template:Loop/deloop -> {{Template:Loop/{{{1}}}}}
called with: {{Loop/deloop|deloop}}
Detects loop in new parser. Loop detection is more strict in the new parser. A certain number of loops are exempt on the same page, so that examples can be included such as in documentation in a noinclude section, but potential loops one or more levels deep as per example are no longer possible (even in the old parser they were somewhat buggy.) [32] Negotiable

<!-- 1 --><!-- 2 -->

Old parser: <p>x y</p>
New parser: <p>x</p><p>y</p>.
In the old parser, multiple comments on the same line, without any separation, cause the line to be "eaten". In the new parser, there must be only one comment alone on a line for line-eating to be triggered. Without line-eating, the preprocessor leaves two line breaks between the x and the y, causing a paragraph break in the main pass. [33] Negotiable
<span title="{{#if:1|">x<span>}}">y</span> Badly formed wikicode (eg a parserfunction inside an html tag) is cleaned up differently. Old parser would parse the html first (replaceVariables and escaping) and not parse the #if, showing a literal mess. New parser will expand the #if first and see <span title="">x<span>">y</span> cleansing (via removeHTMLtags) to <span title="">x<span>">y</span></span>.

A counter example would be something like: <span id="{{#expr:Z}}">z</span>. The old parser would run removeHTMLtags first and end up with <span>z</span> while the new one would expand the #expr first, generate <strong class="error">Expression error: Unrecognised word "z"</strong>, causing the external span to be escaped and producing <span id="<strong class="error">Expression error: Unrecognised word "z"</strong>">z

[34] vs [35] no
{{urlencode:<_&lt;_>_&gt;_}} {{urlencode}} and {{anchorencode}} escape < and &lt; differently between parsers. The new parser can now tell the difference between < > and &lt; &gt; inside these encode functions. The old parser would escape them first and cause different results.
old: %26lt%3B_%26lt%3B_%26gt%3B_%23gt%3B
new: %3C_%26lt%3B_%3E_%23gt%3B

= Top level heading =

Section edit link missing in new parser A wikitext-style level-1 heading (<h1>) is not marked with a section edit link if the heading appears in a place where the parser expects an equals sign separating a template parameter name from its value, as in {{foo | name = value}}. Note that this can happen even if the template syntax is malformed (e.g. unclosed), as in this test case. Use a passthru template with a named parameter. Negotiable
<onlyinclude><span>text</span> </onlyinclude> Onlyinclude newline behavior changed. <onlyinclude> had to be completely rewritten, and the new behavior will now match that of <includeonly>/<noinclude> in regards to newlines.  ? no
{{#expr:$1 + 1}} in template. $1 in transclusion to interface message. $1-style message variables working in templates that were transcluded to MediaWiki: namespace interface messages, was a quirk/bug of the old parser and will no longer work. [37] no

Pending bugs[edit]

These are bugs found in the new parser that have not yet been fixed in trunk.

Interesting bad input on a template (oldid if fixed). An apparent start of heading "\n==" breaks proper identification of template syntax on the same line, even if there is no valid end of heading.
|postcode_district = HD4
|postcode_area= HD   

==Sport|dial_code= 01484
$-style parameters in interface messages don't seem to work inside Parserfunctions unless escaped one level. (See e.g. the currently commented-out revert link on en:MediaWiki:Movepage-moved.)
$1 in w:MediaWiki:Readonlytext does not appear to resolve, during intermittent database lag. confirmation?
LabeledSectionTransclusion: sections nested in <poem> tags can no longer be transcluded. Compare S:ca:user:sanbeg and http://ca.wikisource.org/wiki/Usuari:Sanbeg?timtest=oldpp
Page-referencing magic word in a link, break with an asterisk, semicolon, or other line-start formatting character as the first character of the page name. (example)

Note that this is similar to the behavior of [[{{uc::}}foo]] [[{{uc:*}}foo]] [[{{uc:;}}foo]] (but this is consistent across parsers)


on a page titled [[*]]
A template (on template ns) omitted because post-expand include size is too large links to the main namespace (thus the link is to create it because it doesn't exist) instead of the template page. [38]

See also[edit]