Jump to content

User:Guilrom/ParserRGVersion

From Meta, a Wikimedia project coordination wiki

General Info[edit]

Hack v0.1[edit]

I hacked mediawiki parser.php (version 1.6.3) three months ago in order to improve the mediawiki extension management for my personnal use.
My changes addressed two parsing issues :

  1. parser limitations regarding template arguments handling within the extension
  2. the parser incorrect handling of certain cases of nested extensions.

Allowing template arguments handling inside extension callback functions[edit]

This tweak is based upon a previous hack by Andreas Neumann (more info here : Talk:Extending_wiki_markup#Templates_and_Extensions).
It allows template arguments handling INSIDE extension callback functions which was not the case with Andreas Hack (simple replacement before passing back to wiki but AFTER extension strip).

Improving nested extension parsing[edit]

For this second tweak, I am using a new object (mStripStateTmp) added to the parser class. It is updated within the strip() constructor after each extension tag stripping process. This allows a correct output handling withing each callback of previously stripped nested extension calls.

Content[edit]

My hack of the parser.php file consists of :

  1. new php constants definitions ;
  2. a new $mStripStateTmp object declaration ;
  3. a revision of 5 parser class functions :
    1. clearState()
    2. Strip()
    3. replace_callback()
    4. getTemplateArgs()
    5. braceSubstitution()
  4. a new simple wfUnEscapeHTMLTagsOnly() function.

My changes are clearly delimited by some special comment strings :

### HACK START ###
[...]
### HACK END ###

The main stuff is done inside the 'Strip()' and 'BraceSubstitution()' constructors. The code is not elegant and perfectly documented, sorry for that...

In addition, you can get the whole modified parser.php file in the discussion page.

The code[edit]

1. New php constants definitions[edit]

(to be placed at the top of the parser.php file after the other constants definition)


## New definitions by RG - 150406 ##
/** Here we define two simple extension parameters 
that will be required in the wikitext to manage / fine tune the desired extension parsing behaviour
Purposes :
1. calling an extension inside a template definition : 
In some cases, in order to render a proper output, an extension callback function must have 
access to the real value of one or several template arguments (triple braced variables).
This can be achieved with use of the 'checktplargs' parameter : it will tell the
Strip() function that there are some templates arguments to parse BEFORE invoking the corresponding extension callback function.

2. Extensions nesting cases
In some cases, when two (or more) extensions are nested, the output of the nested one needs to be 
accessible to the callback function of the wrapping one. 
This is possible thanks to the 'checknestedexts' parameter that will ensure a proper parsing sequence : 
It will tell the Strip() function to parse any nested extension calls before invoking the
corresponding extension callbact function.

Note : these two generic extension parameters can also be combined.
*/
### HACK START ###
## Special constants required by the two extension management hacks by RG - 150406
define( 'PARSE_TPL_ARGS_BEFORE_EXT_CALLBACK', 'checktplargs' ); // this is a generic extension parameter
// Possible values for the checkTplArgs parameter
define( 'ONLY_CONTENT', 1 );
define( 'ONLY_PARAMS', 2 );
define( 'CONTENT_AND_PARAMS', 3 );
define( 'PARSE_NESTED_EXTS_BEFORE_EXT_CALLBACK', 'checknestedexts' ); // another generic ext parameter to handle nested extensions
define( 'STRIP_CALLED_FROM_BS', 'stripCalledFromBS_ThisIsNotAnArgument' );
### HACK END ###

2. Declaring the $mStripStateTmp object[edit]

(to be placed at the top of the parser class definition)


## New declaration added by RG - 140406 ## 
## (Extension nesting management hack) ## 
### HACK START ### 
var $mStripStateTmp = array(); // will be updated within the strip() constructor after each extension tag striping process
### HACK END ###

3. Changing the code of 5 parser class functions[edit]

3.1. Tweaking the clearState() function[edit]


	function clearState() {
		$this->mOutput = new ParserOutput;
		$this->mAutonumber = 0;
		$this->mLastSection = '';
		$this->mDTopen = false;
		$this->mVariables = false;
		$this->mIncludeCount = array();
		$this->mStripState = array();

		### HACK START ###
		$this->mStripStateTmp = array(); // added by RG - 110406
		### HACK END ###

		$this->mArgStack = array();
		$this->mInPre = false;
		$this->mInterwikiLinkHolders = array(
			'texts' => array(),
			'titles' => array()
		);
		$this->mLinkHolders = array(
			'namespaces' => array(),
			'dbkeys' => array(),
			'queries' => array(),
			'texts' => array(),
			'titles' => array()
		);
		$this->mRevisionId = null;
		$this->mUniqPrefix = 'UNIQ' . Parser::getRandomString();

		# Clear these on every parse, bug 4549
 		$this->mTemplates = array();
 		$this->mTemplatePath = array();

		wfRunHooks( 'ParserClearState', array( &$this ) );
	}

3.2. Tweaking the Strip() function[edit]

This is where the main stuff is located

Caution : you should not copy this code when you're in edit mode, because every <pre> tag has been replaced by <nowiki><</nowiki>pre>


	function strip( $text, &$state, $stripcomments = false ) {
		$render = ($this->mOutputType == OT_HTML);
		$html_content = array();
		$nowiki_content = array();
		$math_content = array();
		$pre_content = array();
		$comment_content = array();
		$ext_content = array();
		$ext_tags = array();
		$ext_params = array();
		$gallery_content = array();

		# Replace any instances of the placeholders
		$uniq_prefix = $this->mUniqPrefix;
		#$text = str_replace( $uniq_prefix, wfHtmlEscapeFirst( $uniq_prefix ), $text );

		# html
		global $wgRawHtml;
		if( $wgRawHtml ) {
			$text = Parser::extractTags('html', $text, $html_content, $uniq_prefix);
			foreach( $html_content as $marker => $content ) {
				if ($render ) {
					# Raw and unchecked for validity.
					$html_content[$marker] = $content;
				} else {
					$html_content[$marker] = '<html>'.$content.'</html>';
				}
			}
		}

		# nowiki
		$text = Parser::extractTags('nowiki', $text, $nowiki_content, $uniq_prefix);
		foreach( $nowiki_content as $marker => $content ) {
			if( $render ){
				$nowiki_content[$marker] = wfEscapeHTMLTagsOnly( $content );
			} else {
				$nowiki_content[$marker] = ''.$content.'';
			}
		}

		# math
		if( $this->mOptions->getUseTeX() ) {
			$text = Parser::extractTags('math', $text, $math_content, $uniq_prefix);
			foreach( $math_content as $marker => $content ){
				if( $render ) {
					$math_content[$marker] = renderMath( $content );
				} else {
					$math_content[$marker] = '<math>'.$content.'</math>';
				}
			}
		}

		# pre
		$text = Parser::extractTags('pre', $text, $pre_content, $uniq_prefix);
		foreach( $pre_content as $marker => $content ){
			if( $render ){
				$pre_content[$marker] = '<pre>' . wfEscapeHTMLTagsOnly( $content ) . '</pre>';
			} else {
				$pre_content[$marker] = '<pre>'.$content.'</pre>';
			}
		}

		# gallery
		$text = Parser::extractTags('gallery', $text, $gallery_content, $uniq_prefix);
		foreach( $gallery_content as $marker => $content ) {
			require_once( 'ImageGallery.php' );
			if ( $render ) {
				$gallery_content[$marker] = $this->renderImageGallery( $content );
			} else {
				$gallery_content[$marker] = '<gallery>'.$content.'</gallery>';
			}
		}

		# Comments
		$text = Parser::extractTags(STRIP_COMMENTS, $text, $comment_content, $uniq_prefix);
		foreach( $comment_content as $marker => $content ){
			$comment_content[$marker] = '<!--'.$content.'-->';
		}

		# Extensions
		
    		## using the 2 small hack by RG - 0406 ##
    		## 1. - RG 110406 - to improve nested ext parsing behaviour
    		## 2. - RG 140406 - to allow Templ Args handling inside extension callback functions 
                 
    		### HACK START ###
    		$stateTmp =& $this->mStripStateTmp;
    		$stateTmp = array();
    		$extparser = new Parser(); // a new parallel parser for nested ext
    		$extparser->mTagHooks = $this->mTagHooks; // get extension hooks from the main parser
    		## Intercept call from 'BraceSubstitution'
    		$doCheckTplArgs=0;
    		$parse_tpl_args = PARSE_TPL_ARGS_BEFORE_EXT_CALLBACK;
    		if (($this->mArgStack) && count($this->mArgStack)) { // stack not empty
      			$lastargs = end( $this->mArgStack );
      			if ($lastargs[0] == STRIP_CALLED_FROM_BS) {
        			## we must remove and keep track of the last tpl args stacked
        			array_pop( $this->mArgStack ); // first trash out the dummy arg
        			$lastargs = array_pop( $this->mArgStack ); // then get the good one
        			$doCheckTplArgs = 1;
      			}
    		} 
    		### HACK END ###
		
		foreach ( $this->mTagHooks as $tag => $callback ) {
			$ext_content[$tag] = array();
			$text = Parser::extractTagsAndParams( $tag, $text, $ext_content[$tag],
				$ext_tags[$tag], $ext_params[$tag], $uniq_prefix );
			foreach ( $ext_content[$tag] as $marker => $content ) {
				$full_tag = $ext_tags[$tag][$marker];
				$params = $ext_params[$tag][$marker];
				
				### HACK START ###

				$rcontent = $content; 
        			## ARGUMENT REPLACEMENT // processed only if needed 
          			if ( $doCheckTplArgs && array_key_exists(PARSE_TPL_ARGS_BEFORE_EXT_CALLBACK , $params)) {
        				$parsemode = $params[PARSE_TPL_ARGS_BEFORE_EXT_CALLBACK];
          				if (($parsemode == CONTENT_AND_PARAMS) || ($parsemode == ONLY_CONTENT)) {
             					$rcontent = $this->replaceVariables( $rcontent, $lastargs );  
          				}
          				if (($parsemode == CONTENT_AND_PARAMS) || ($parsemode == ONLY_PARAMS)) {
             					$strparams = implode(",", $params);
             					$strparams = $this->replaceVariables( $strparams, $lastargs );
             					$params = explode(",", $strparams);
          				}
        			}
        			## PARSE NESTED EXTENSIONS // processed only if needed 
        			if (array_key_exists(PARSE_NESTED_EXTS_BEFORE_EXT_CALLBACK,$params)) {  
            				## first we must unstrip possible embedded extension calls which could
            				## have been allready stripped and replaced by uniq random code in the loop
            				if (!(strpos($rcontent, 'UNIQ') === false)) {
               					$rcontent = $this->unstrip($rcontent, $stateTmp);
            				}
        				## then we must parse possible still unstripped extension calls
            				## that may be included in the $content and would be treated AFTER next
            				## callback. We're using a parallel parsing to avoid problems.
            				$output = $extparser->parse($rcontent, $this->mTitle, $this->mOptions); 
            				$rcontent = wfUnEscapeHTMLTagsOnly($output->getText());
        			}

				### HACK END ###
				
				if ( $render )
				  	### HACK START ###
					$ext_content[$tag][$marker] = call_user_func_array( $callback, array( $rcontent, $params, &$this ) );
					### HACK END ###
				else {
					if ( is_null( $content ) ) {
						// Empty element tag
						$ext_content[$tag][$marker] = $full_tag;
					} else {
						$ext_content[$tag][$marker] = "$full_tag$content</$tag>";
					}
				}
			}
			### HACK START ###
			# updating mStripStateTmp now so that the next extension tag processed
      			# in the loop can use it.
			if ( array_key_exists( $tag, $stateTmp ) ) {
                		$stateTmp[$tag] = $stateTmp[$tag] + $ext_content[$tag];
			   	} else { $stateTmp[$tag] = $ext_content[$tag];
			}
			### HACK END ###
			
		}

    		### HACK START ###
		//$stateTmp = array();
		# we can now unset the parallel parser
		unset($extparser);
    		### HACK END ###

		# Unstrip comments unless explicitly told otherwise.
		# (The comments are always stripped prior to this point, so as to
		# not invoke any extension tags / parser hooks contained within
		# a comment.)
		if ( !$stripcomments ) {
			$tempstate = array( 'comment' => $comment_content );
			$text = $this->unstrip( $text, $tempstate );
			$comment_content = array();
		}

		# Merge state with the pre-existing state, if there is one
		if ( $state ) {
			$state['html'] = $state['html'] + $html_content;
			$state['nowiki'] = $state['nowiki'] + $nowiki_content;
			$state['math'] = $state['math'] + $math_content;
			$state['pre'] = $state['pre'] + $pre_content;
			$state['gallery'] = $state['gallery'] + $gallery_content;
			$state['comment'] = $state['comment'] + $comment_content;

			foreach( $ext_content as $tag => $array ) {
				if ( array_key_exists( $tag, $state ) ) {
					$state[$tag] = $state[$tag] + $array;
				}
				### HACK START ###
				else { $state[$tag] = $array;
				}
				### HACK END ###
			}
		} else {
			$state = array(
			  'html' => $html_content,
			  'nowiki' => $nowiki_content,
			  'math' => $math_content,
			  'pre' => $pre_content,
			  'gallery' => $gallery_content,
			  'comment' => $comment_content,
			) + $ext_content;
		}
		return $text;
	}

3.3. Tweaking the replace_callback() function[edit]


        function replace_callback ($text, $callbacks) {
		$openingBraceStack = array();	# this array will hold a stack of parentheses which are not closed yet
		$lastOpeningBrace = -1;		# last not closed parentheses

		### HACK START ###
		## need further tests - RG - 250406
		if (!(strpos($text,'UNIQ')===false)) {
			$text = $this->unstrip($text,$this->mStripStateTmp);
		}
		### HACK END ###

		for ($i = 0; $i < strlen($text); $i++) {
			# check for any opening brace
			$rule = null;
			$nextPos = -1;
			foreach ($callbacks as $key => $value) {
				$pos = strpos ($text, $key, $i);
				if (false !== $pos && (-1 == $nextPos || $pos < $nextPos)) {
					$rule = $value;
					$nextPos = $pos;
				}
			}

			if ($lastOpeningBrace >= 0) {
				$pos = strpos ($text, $openingBraceStack[$lastOpeningBrace]['braceEnd'], $i);

				if (false !== $pos && (-1 == $nextPos || $pos < $nextPos)){
					$rule = null;
					$nextPos = $pos;
				}

				$pos = strpos ($text, '|', $i);

				if (false !== $pos && (-1 == $nextPos || $pos < $nextPos)){
					$rule = null;
					$nextPos = $pos;
				}
			}

			if ($nextPos == -1)
				break;

			$i = $nextPos;

			# found openning brace, lets add it to parentheses stack
			if (null != $rule) {
				$piece = array('brace' => $text[$i],
							   'braceEnd' => $rule['end'],
							   'count' => 1,
							   'title' => '',
							   'parts' => null);

				# count openning brace characters
				while ($i+1 < strlen($text) && $text[$i+1] == $piece['brace']) {
					$piece['count']++;
					$i++;
				}

				$piece['startAt'] = $i+1;
				$piece['partStart'] = $i+1;

				# we need to add to stack only if openning brace count is enough for any given rule
				foreach ($rule['cb'] as $cnt => $fn) {
					if ($piece['count'] >= $cnt) {
						$lastOpeningBrace ++;
						$openingBraceStack[$lastOpeningBrace] = $piece;
						break;
					}
				}

				continue;
			}
			else if ($lastOpeningBrace >= 0) {
				# first check if it is a closing brace
				if ($openingBraceStack[$lastOpeningBrace]['braceEnd'] == $text[$i]) {
					# lets check if it is enough characters for closing brace
					$count = 1;
					while ($i+$count < strlen($text) && $text[$i+$count] == $text[$i])
						$count++;

					# if there are more closing parentheses than opening ones, we parse less
					if ($openingBraceStack[$lastOpeningBrace]['count'] < $count)
						$count = $openingBraceStack[$lastOpeningBrace]['count'];

					# check for maximum matching characters (if there are 5 closing characters, we will probably need only 3 - depending on the rules)
					$matchingCount = 0;
					$matchingCallback = null;
					foreach ($callbacks[$openingBraceStack[$lastOpeningBrace]['brace']]['cb'] as $cnt => $fn) {
						if ($count >= $cnt && $matchingCount < $cnt) {
							$matchingCount = $cnt;
							$matchingCallback = $fn;
						}
					}

					if ($matchingCount == 0) {
						$i += $count - 1;
						continue;
					}

					# lets set a title or last part (if '|' was found)
					if (null === $openingBraceStack[$lastOpeningBrace]['parts'])
						$openingBraceStack[$lastOpeningBrace]['title'] = substr($text, $openingBraceStack[$lastOpeningBrace]['partStart'], $i - $openingBraceStack[$lastOpeningBrace]['partStart']);
					else
						$openingBraceStack[$lastOpeningBrace]['parts'][] = substr($text, $openingBraceStack[$lastOpeningBrace]['partStart'], $i - $openingBraceStack[$lastOpeningBrace]['partStart']);

					$pieceStart = $openingBraceStack[$lastOpeningBrace]['startAt'] - $matchingCount;
					$pieceEnd = $i + $matchingCount;

					if( is_callable( $matchingCallback ) ) {
						$cbArgs = array (
										 'text' => substr($text, $pieceStart, $pieceEnd - $pieceStart),
										 'title' => trim($openingBraceStack[$lastOpeningBrace]['title']),
										 'parts' => $openingBraceStack[$lastOpeningBrace]['parts'],
										 'lineStart' => (($pieceStart > 0) && ($text[$pieceStart-1] == '\n')),
										 );
						# finally we can call a user callback and replace piece of text
						$replaceWith = call_user_func( $matchingCallback, $cbArgs );
						$text = substr($text, 0, $pieceStart) . $replaceWith . substr($text, $pieceEnd);
						$i = $pieceStart + strlen($replaceWith) - 1;
					}
					else {
						# null value for callback means that parentheses should be parsed, but not replaced
						$i += $matchingCount - 1;
					}

					# reset last openning parentheses, but keep it in case there are unused characters
					$piece = array('brace' => $openingBraceStack[$lastOpeningBrace]['brace'],
								   'braceEnd' => $openingBraceStack[$lastOpeningBrace]['braceEnd'],
								   'count' => $openingBraceStack[$lastOpeningBrace]['count'],
								   'title' => '',
								   'parts' => null,
								   'startAt' => $openingBraceStack[$lastOpeningBrace]['startAt']);
					$openingBraceStack[$lastOpeningBrace--] = null;

					if ($matchingCount < $piece['count']) {
						$piece['count'] -= $matchingCount;
						$piece['startAt'] -= $matchingCount;
						$piece['partStart'] = $piece['startAt'];
						# do we still qualify for any callback with remaining count?
						foreach ($callbacks[$piece['brace']]['cb'] as $cnt => $fn) {
							if ($piece['count'] >= $cnt) {
								$lastOpeningBrace ++;
								$openingBraceStack[$lastOpeningBrace] = $piece;
								break;
							}
						}
					}
					continue;
				}

				# lets set a title if it is a first separator, or next part otherwise
				if ($text[$i] == '|') {
					if (null === $openingBraceStack[$lastOpeningBrace]['parts']) {
						$openingBraceStack[$lastOpeningBrace]['title'] = substr($text, $openingBraceStack[$lastOpeningBrace]['partStart'], $i - $openingBraceStack[$lastOpeningBrace]['partStart']);
						$openingBraceStack[$lastOpeningBrace]['parts'] = array();
					}
					else
						$openingBraceStack[$lastOpeningBrace]['parts'][] = substr($text, $openingBraceStack[$lastOpeningBrace]['partStart'], $i - $openingBraceStack[$lastOpeningBrace]['partStart']);

					$openingBraceStack[$lastOpeningBrace]['partStart'] = $i + 1;
				}
			}
		}

		return $text;
	}

3.4. Tweaking the getTemplateArgs() function[edit]


	function getTemplateArgs( $argsString ) {
		if ( $argsString === '' ) {
			return array();
		}
		
                ### HACK START ###
		if (!(strpos($argsString,'UNIQ')===false)) {
			$argsString = $this->unstrip($argsString,$this->mStripStateTmp);
		}
		### HACK END ###

		$args = explode( '|', substr( $argsString, 1 ) );

		# If any of the arguments contains a '[[' but no ']]', it needs to be
		# merged with the next arg because the '|' character between belongs
		# to the link syntax and not the template parameter syntax.
		$argc = count($args);

		for ( $i = 0; $i < $argc-1; $i++ ) {
			if ( substr_count ( $args[$i], '[[' ) != substr_count ( $args[$i], ']]' ) ) {
				$args[$i] .= '|'.$args[$i+1];
				array_splice($args, $i+1, 1);
				$i--;
				$argc--;
			}
		}

		return $args;
	}

3.5. Tweaking the braceSubstitution() function[edit]


function braceSubstitution( $piece ) {
		global $wgContLang;
		$fname = 'Parser::braceSubstitution';
		wfProfileIn( $fname );

		# Flags
		$found = false;             # $text has been filled
		$nowiki = false;            # wiki markup in $text should be escaped
		$noparse = false;           # Unsafe HTML tags should not be stripped, etc.
		$noargs = false;            # Don't replace triple-brace arguments in $text
		$replaceHeadings = false;   # Make the edit section links go to the template not the article
		$isHTML = false;            # $text is HTML, armour it against wikitext transformation
		$forceRawInterwiki = false; # Force interwiki transclusion to be done in raw mode not rendered

		# Title object, where $text came from
		$title = NULL;

		$linestart = '';

		# $part1 is the bit before the first |, and must contain only title characters
		# $args is a list of arguments, starting from index 0, not including $part1

		$part1 = $piece['title'];
		# If the third subpattern matched anything, it will start with |

		if (null == $piece['parts']) {
			$replaceWith = $this->variableSubstitution (array ($piece['text'], $piece['title']));
			if ($replaceWith != $piece['text']) {
				$text = $replaceWith;
				$found = true;
				$noparse = true;
				$noargs = true;
			}
		}

		$args = (null == $piece['parts']) ? array() : $piece['parts'];
		$argc = count( $args );


		# SUBST
		if ( !$found ) {
			$mwSubst =& MagicWord::get( MAG_SUBST );
			if ( $mwSubst->matchStartAndRemove( $part1 ) xor ($this->mOutputType == OT_WIKI) ) {
				# One of two possibilities is true:
				# 1) Found SUBST but not in the PST phase
				# 2) Didn't find SUBST and in the PST phase
				# In either case, return without further processing
				$text = $piece['text'];
				$found = true;
				$noparse = true;
				$noargs = true;
			}
		}

		# MSG, MSGNW, INT and RAW
		if ( !$found ) {
			# Check for MSGNW:
			$mwMsgnw =& MagicWord::get( MAG_MSGNW );
			if ( $mwMsgnw->matchStartAndRemove( $part1 ) ) {
				$nowiki = true;
			} else {
				# Remove obsolete MSG:
				$mwMsg =& MagicWord::get( MAG_MSG );
				$mwMsg->matchStartAndRemove( $part1 );
			}
			
			# Check for RAW:
			$mwRaw =& MagicWord::get( MAG_RAW );
			if ( $mwRaw->matchStartAndRemove( $part1 ) ) {
				$forceRawInterwiki = true;
			}
			
			# Check if it is an internal message
			$mwInt =& MagicWord::get( MAG_INT );
			if ( $mwInt->matchStartAndRemove( $part1 ) ) {
				if ( $this->incrementIncludeCount( 'int:'.$part1 ) ) {
					$text = $linestart . wfMsgReal( $part1, $args, true );
					$found = true;
				}
			}
		}

		# NS
		if ( !$found ) {
			# Check for NS: (namespace expansion)
			$mwNs = MagicWord::get( MAG_NS );
			if ( $mwNs->matchStartAndRemove( $part1 ) ) {
				if ( intval( $part1 ) || $part1 == "0" ) {
					$text = $linestart . $wgContLang->getNsText( intval( $part1 ) );
					$found = true;
				} else {
					$index = Namespace::getCanonicalIndex( strtolower( $part1 ) );
					if ( !is_null( $index ) ) {
						$text = $linestart . $wgContLang->getNsText( $index );
						$found = true;
					}
				}
			}
		}

		# LCFIRST, UCFIRST, LC and UC
		if ( !$found ) {
			$lcfirst =& MagicWord::get( MAG_LCFIRST );
			$ucfirst =& MagicWord::get( MAG_UCFIRST );
			$lc =& MagicWord::get( MAG_LC );
			$uc =& MagicWord::get( MAG_UC );
			if ( $lcfirst->matchStartAndRemove( $part1 ) ) {
				$text = $linestart . $wgContLang->lcfirst( $part1 );
				$found = true;
			} else if ( $ucfirst->matchStartAndRemove( $part1 ) ) {
				$text = $linestart . $wgContLang->ucfirst( $part1 );
				$found = true;
			} else if ( $lc->matchStartAndRemove( $part1 ) ) {
				$text = $linestart . $wgContLang->lc( $part1 );
				$found = true;
			} else if ( $uc->matchStartAndRemove( $part1 ) ) {
				 $text = $linestart . $wgContLang->uc( $part1 );
				 $found = true;
			}
		}

		# LOCALURL and FULLURL
		if ( !$found ) {
			$mwLocal =& MagicWord::get( MAG_LOCALURL );
			$mwLocalE =& MagicWord::get( MAG_LOCALURLE );
			$mwFull =& MagicWord::get( MAG_FULLURL );
			$mwFullE =& MagicWord::get( MAG_FULLURLE );


			if ( $mwLocal->matchStartAndRemove( $part1 ) ) {
				$func = 'getLocalURL';
			} elseif ( $mwLocalE->matchStartAndRemove( $part1 ) ) {
				$func = 'escapeLocalURL';
			} elseif ( $mwFull->matchStartAndRemove( $part1 ) ) {
				$func = 'getFullURL';
			} elseif ( $mwFullE->matchStartAndRemove( $part1 ) ) {
				$func = 'escapeFullURL';
			} else {
				$func = false;
			}

			if ( $func !== false ) {
				$title = Title::newFromText( $part1 );
				if ( !is_null( $title ) ) {
					if ( $argc > 0 ) {
						$text = $linestart . $title->$func( $args[0] );
					} else {
						$text = $linestart . $title->$func();
					}
					$found = true;
				}
			}
		}

		# GRAMMAR
		if ( !$found && $argc == 1 ) {
			$mwGrammar =& MagicWord::get( MAG_GRAMMAR );
			if ( $mwGrammar->matchStartAndRemove( $part1 ) ) {
				$text = $linestart . $wgContLang->convertGrammar( $args[0], $part1 );
				$found = true;
			}
		}

		# PLURAL
		if ( !$found && $argc >= 2 ) {
			$mwPluralForm =& MagicWord::get( MAG_PLURAL );
			if ( $mwPluralForm->matchStartAndRemove( $part1 ) ) {
				if ($argc==2) {$args[2]=$args[1];}
				$text = $linestart . $wgContLang->convertPlural( $part1, $args[0], $args[1], $args[2]);
				$found = true;
			}
		}

		# Extensions functions
		if ( !$found ) {
			$colonPos = strpos( $part1, ':' );
			if ( $colonPos !== false ) {
				$function = strtolower( substr( $part1, 0, $colonPos ) );
				if ( isset( $this->mFunctionHooks[$function] ) ) {
					$funcArgs = array_merge( array( &$this, substr( $part1, $colonPos + 1 ) ), $args );
					$result = call_user_func_array( $this->mFunctionHooks[$function], $funcArgs );
					$found = true;
					if ( is_array( $result ) ) {
						$text = $linestart . $result[0];
						unset( $result[0] );

						// Extract flags into the local scope
						// This allows callers to set flags such as nowiki, noparse, found, etc.
						extract( $result );
					} else {
						$text = $linestart . $result;
					}
				}
			}
		}

		# Template table test

		# Did we encounter this template already? If yes, it is in the cache
		# and we need to check for loops.
		if ( !$found && isset( $this->mTemplates[$piece['title']] ) ) {
			$found = true;

			# Infinite loop test
			if ( isset( $this->mTemplatePath[$part1] ) ) {
				$noparse = true;
				$noargs = true;
				$found = true;
				$text = $linestart .
					'{{' . $part1 . '}}' .
					'<!-- WARNING: template loop detected -->';
				wfDebug( "$fname: template loop broken at '$part1'\n" );
			} else {
				# set $text to cached message.
				$text = $linestart . $this->mTemplates[$piece['title']];
			}
		}

		# Load from database
		$lastPathLevel = $this->mTemplatePath;
		if ( !$found ) {
			$ns = NS_TEMPLATE;
			# declaring $subpage directly in the function call
			# does not work correctly with references and breaks
			# {{/subpage}}-style inclusions
			$subpage = '';
			$part1 = $this->maybeDoSubpageLink( $part1, $subpage );
			if ($subpage !== '') {
				$ns = $this->mTitle->getNamespace();
			}
			$title = Title::newFromText( $part1, $ns );

			if ( !is_null( $title ) ) {
				if ( !$title->isExternal() ) {
					# Check for excessive inclusion
					$dbk = $title->getPrefixedDBkey();
					if ( $this->incrementIncludeCount( $dbk ) ) {
						if ( $title->getNamespace() == NS_SPECIAL && $this->mOptions->getAllowSpecialInclusion() ) {
							# Capture special page output
							$text = SpecialPage::capturePath( $title );
							if ( is_string( $text ) ) {
								$found = true;
								$noparse = true;
								$noargs = true;
								$isHTML = true;
								$this->disableCache();
							}
						} else {
							$articleContent = $this->fetchTemplate( $title );
							if ( $articleContent !== false ) {
								$found = true;
								$text = $articleContent;
								$replaceHeadings = true;
							}
						}
					}

					# If the title is valid but undisplayable, make a link to it
					if ( $this->mOutputType == OT_HTML && !$found ) {
						$text = '[['.$title->getPrefixedText().']]';
						$found = true;
					}
				} elseif ( $title->isTrans() ) {
					// Interwiki transclusion
					if ( $this->mOutputType == OT_HTML && !$forceRawInterwiki ) {
						$text = $this->interwikiTransclude( $title, 'render' );
						$isHTML = true;
						$noparse = true;
					} else {
						$text = $this->interwikiTransclude( $title, 'raw' );
						$replaceHeadings = true;
					}
					$found = true;
				}
				
				# Template cache array insertion
				# Use the original $piece['title'] not the mangled $part1, so that
				# modifiers such as RAW: produce separate cache entries
				if( $found ) {
					$this->mTemplates[$piece['title']] = $text;
					$text = $linestart . $text;
				}
			}
		}

		# Recursive parsing, escaping and link table handling
		# Only for HTML output
		if ( $nowiki && $found && $this->mOutputType == OT_HTML ) {
			$text = wfEscapeWikiText( $text );
		} elseif ( ($this->mOutputType == OT_HTML || $this->mOutputType == OT_WIKI) && $found ) {
			if ( !$noargs ) {
				# Clean up argument array
				$assocArgs = array();
				$index = 1;
				foreach( $args as $arg ) {
					$eqpos = strpos( $arg, '=' );
					if ( $eqpos === false ) {
						$assocArgs[$index++] = $arg;
					} else {
						$name = trim( substr( $arg, 0, $eqpos ) );
						$value = trim( substr( $arg, $eqpos+1 ) );
						if ( $value === false ) {
							$value = '';
						}
						if ( $name !== false ) {
							$assocArgs[$name] = $value;
						}
					}
				}

				# Add a new element to the templace recursion path
				$this->mTemplatePath[$part1] = 1;
			}

			if ( !$noparse ) {
				# If there are any <onlyinclude> tags, only include them
				if ( in_string( '<onlyinclude>', $text ) && in_string( '</onlyinclude>', $text ) ) {
					preg_match_all( '/<onlyinclude>(.*?)\n?<\/onlyinclude>/s', $text, $m );
					$text = '';
					foreach ($m[1] as $piece)
						$text .= $piece;
				}
				# Remove <noinclude> sections and <includeonly> tags
				$text = preg_replace( '/<noinclude>.*?<\/noinclude>/s', '', $text );
				$text = strtr( $text, array( '<includeonly>' => '' , '</includeonly>' => '' ) );

				if( $this->mOutputType == OT_HTML ) {
					# Strip <nowiki>, <pre>, etc.
					
					### HACK START ###
  					# We need to update the tpl arguments stack mArgStack used in ReplaceVariables() 
    			        	# function BEFORE the strip call below. The new args added will be removed naturally 
    			        	# within Strip() or within ReplaceVariables() depending on the existence
    			        	# (inside the $text stripped below) of an extension tag containing the 
    			        	# argument CHECK_TPL_ARGS.
    			        	$dummyarg = array();
    			        	$dummyarg[0] = STRIP_CALLED_FROM_BS ; 
    			        	array_push( $this->mArgStack, $assocArgs ); // updating the 'precious' current tpl args
      		                	array_push( $this->mArgStack, $dummyarg ); // this is an ugly intercept tweak for the Strip() function
  					### HACK END ###

					$text = $this->strip( $text, $this->mStripState );
					$text = Sanitizer::removeHTMLtags( $text, array( &$this, 'replaceVariables' ), $assocArgs );
				}
				$text = $this->replaceVariables( $text, $assocArgs );

				# If the template begins with a table or block-level
				# element, it should be treated as beginning a new line.
				if (!$piece['lineStart'] && preg_match('/^({\\||:|;|#|\*)/', $text)) {
					$text = "\n" . $text;
				}
			} elseif ( !$noargs ) {
				# $noparse and !$noargs
				# Just replace the arguments, not any double-brace items
				# This is used for rendered interwiki transclusion
				$text = $this->replaceVariables( $text, $assocArgs, true );
			}
		}
		# Prune lower levels off the recursion check path
		$this->mTemplatePath = $lastPathLevel;

		if ( !$found ) {
			wfProfileOut( $fname );
			return $piece['text'];
		} else {
			if ( $isHTML ) {
				# Replace raw HTML by a placeholder
				# Add a blank line preceding, to prevent it from mucking up
				# immediately preceding headings
				$text = "\n\n" . $this->insertStripItem( $text, $this->mStripState );
			} else {
				# replace ==section headers==
				# XXX this needs to go away once we have a better parser.
				if ( $this->mOutputType != OT_WIKI && $replaceHeadings ) {
					if( !is_null( $title ) )
						$encodedname = base64_encode($title->getPrefixedDBkey());
					else
						$encodedname = base64_encode("");
					$m = preg_split('/(^={1,6}.*?={1,6}\s*?$)/m', $text, -1,
						PREG_SPLIT_DELIM_CAPTURE);
					$text = '';
					$nsec = 0;
					for( $i = 0; $i < count($m); $i += 2 ) {
						$text .= $m[$i];
						if (!isset($m[$i + 1]) || $m[$i + 1] == "") continue;
						$hl = $m[$i + 1];
						if( strstr($hl, "<!--MWTEMPLATESECTION") ) {
							$text .= $hl;
							continue;
						}
						preg_match('/^(={1,6})(.*?)(={1,6})\s*?$/m', $hl, $m2);
						$text .= $m2[1] . $m2[2] . "<!--MWTEMPLATESECTION="
							. $encodedname . "&" . base64_encode("$nsec") . "-->" . $m2[3];

						$nsec++;
					}
				}
			}
		}
		# Prune lower levels off the recursion check path
		$this->mTemplatePath = $lastPathLevel;

		if ( !$found ) {
			wfProfileOut( $fname );
			return $piece['text'];
		} else {
			wfProfileOut( $fname );
			return $text;
		}
	}

4. Defining a new simple wfUnEscapeHTMLTagsOnly() function[edit]

(to add at the end of the parser.php file)


### HACK START ###
## RG - 150406
/**
 * UnEscape html tags
 * Basicly replacing HTML entities ( ", >, <) with " > and <
 *
 * @param string $in Text that might contain Escaped string
 * @return string HTML tags
 *
 * based upon wfEscapeHTMLTagsOnly() constructor which do the reverse thing
 * RG - 150406
 */

 function wfUnEscapeHTMLTagsOnly( $in ) {
        return str_replace(
                array( '"', '>', '<' ),
                array( '"', '>', '<' ),
                $in );
 }
### HACK END ###

Documentation[edit]

Usage[edit]

To use this parser hack correctly, you only have to adapt the way you call extension in the wikitext by adding special parameters inside the extension tag. See the examples below.

Examples[edit]

1. calling an extension inside a template page definition[edit]

In some cases when calling an extension inside a template page definition, in order to render a proper output, the extension callback function must have access to the real value of one or several template arguments (triple braced variables).

This can be achieved by using the checktplargs parameter : it will tell the Strip() function that there are some templates arguments to parse BEFORE invoking the associated extension callback function.

  • If the _template arguments checking_ only applies to the content of the extension, the 'checktpargs' parameter should be set to 1 :

Example :

<ext1 checktplargs=1> 
extvar1 = {{{1}}}
extvar2 = {{{2}}} 
</ext1>
  • If the checking only applies to the extension parameters, the checktpargs parameter should be set to 2 :

Example :

<ext1 extparam={{{1}}} checktplargs=2> 
[some content]
</ext1>
  • If the checking applies to both the content of the extension and its parameters, the checktpargs parameter should be set to 3 :

Example :

<ext1 extparam={{{1}}} checktplargs=3> 
extvar1 = {{{1}}}
extvar2 = {{{2}}} 
</ext1>

2. Extension nesting examples[edit]

In some cases, when two (or more) extensions are nested, the output of the nested one needs to be accessible to the callback function of the wrapping one. This is made possible thanks to the checknestedexts parameter that ensures the proper parsing sequence: It tells the Strip() function to parse any nested extension calls before invoking the associated extension callback function. example :

<ext1 checknestedexts=1> 
[...] 
	<ext2> 
	[...] 
	</ext2> 
[...] 
</ext1>

3. Further examples[edit]

The two generic extension parameters can also be combined.

Example :

<ext1 checktplargs=1 checknestedexts=1> 
extvar1 = {{{1}}}
extvar2 = {{{2}}}
	<ext2> 
	[...] 
	</ext2>  
</ext1>

Caution with nested extensions

Nested extensions require a special attention because the range of the parameters is limited to the extension in which they are defined :

this example works:

<ext1 checknestedexts=1> 
	<ext2 checktplargs=1> 
	extvar1 = {{{1}}}
	extvar2 = {{{2}}}
	</ext2>  
</ext1>

but this one doesn't:

<ext1 checktplargs=1 checknestedexts=1> 
	<ext2> 
	extvar1 = {{{1}}}
	extvar2 = {{{2}}}
	</ext2>  
</ext1>