User:TChin (WMF)/Using Rector On MediaWiki

From Meta, a Wikimedia project coordination wiki

Intro[edit]

These are my findings when working with Rector on MediaWiki to automatically refactor code. Rector is an automatic refactoring tool that works by constructing an abstract syntax tree on a particular file and then traversing each node. When it hits a targeted node, it applies a transformation, which is also confusingly called a rector. A few quirks popped up while working with this:

  1. It is contextless. To transform a node based on a condition of another node, you must apply the transformation to their most recent ancestor and manually parse.
  2. It ignores code style. Most can be automatically fixed by a codesniffer, but some, like line length, has to be fixed manually.
  3. It auto-imports EVERYTHING
  4. It treats doc blocks as attributes of a node instead of its own node.

How These Limitations Affect Its Use On MediaWiki[edit]

Let's view this in the context of my task for Rector: given global $wgVar, replace it with $var = MediaWikiServices::getInstance()->getMainConfig()->get( 'Var' );, with some exceptions.

Let's go through the limitations:

It is contextless[edit]

When I replace $wgVar with $var, I obviously want to replace all references to it as well. But I can't do it all at once because the nodes are traversed individually. In my instance, to get around this issue the rector had to target both global nodes and variable nodes.

  1. Convert all the globals into variables
  2. Rename wg prefixed variables to non-prefixed variables.

It ignores code style[edit]

All rectored files can be mostly fixed by running phpcbf:

vendor/bin/phpcbf -p example.php

Any unordered auto-imports can be fixed using PhpStorm by right-clicking on the file and selecting 'Optimize Imports'. The rest have to be fixed manually.

It auto-imports everything[edit]

Rector has a few parameters you can set to auto-import things.

Option::AUTO_IMPORT_NAMES
Option::IMPORT_SHORT_CLASSES
Option::IMPORT_DOC_BLOCKS
Option::APPLY_AUTO_IMPORT_NAMES_ON_CHANGED_FILES_ONLY

These parameters apply globally. To be fair, most things should be imported so we don't have the fully qualified name being used everywhere. But when we're using Rector, we only want it to touch the part of the code we tell it to. Rector imports things not by employing its own rector, but by using a post rector that isn't easily modifiable, but we're going to modify it anyways.

Go to NameImportingPostRector::shouldImportName in Rector's source code and add this to the beginning:

if (\substr_count($name->toCodeString(), 'your import name here') < 1) {
  return \false;
}

However, when you do this, and I have no idea why, having Option::APPLY_AUTO_IMPORT_NAMES_ON_CHANGED_FILES_ONLY set to true breaks it. Luckily, it shouldn't affect things anyways since what you're importing is what you're trying to change.

It treats doc blocks as attributes of a node instead of its own node[edit]

Now this one is a doozy. When Rector auto-imports something, it checks if there's a namespace and add it below that, or it adds it to the top. Unfortunately, at the top of almost every MediaWiki file is a license header stored in a doc block. After that would be the namespace and use statements if any. After that would be another doc block with documentation of the actual file.

This means that we need to specifically tell Rector to add use statements below the license header, and since doc blocks aren't nodes, we need to do some really strange things to make it work.

In Rector's source code in UseImportAdder::addImportsToStmts, add these lines before the last return:

// place after other use
foreach ($stmts as $key => $stmt) {
  if ($stmt instanceof \PhpParser\Node\Stmt\Use_) {
    $nodesToAdd = $newUses;
    \array_splice($stmts, $key + 1, 0, $nodesToAdd);
    return $stmts;
  }
}

// If class has 2 or more doc blocks in a row, assume first is license header and place use between them
if ($stmts[0] instanceof \PhpParser\Node\Stmt\Class_ && $newUses !== []) {
  $comments = (array) $stmts[0]->getAttribute(\Rector\NodeTypeResolver\Node\AttributeKey::COMMENTS);
  if (count($comments) >= 2) {
    $emptyLineAboveUse = new \PhpParser\Node\Stmt\Nop();
    \array_splice($newUses, 0, 0, [$emptyLineAboveUse]);
    $newUses[0]->setAttribute(\Rector\NodeTypeResolver\Node\AttributeKey::COMMENTS, [ $comments[0] ]);
    $stmts[0]->setAttribute(\Rector\NodeTypeResolver\Node\AttributeKey::COMMENTS, [ $comments[1] ]);
    $emptyLineBelowUse = new Nop();
    \array_splice($stmts, 0, 0, [$emptyLineBelowUse]);
    \array_splice($stmts, 0, 0, $newUses);
    return $stmts;
  }
}

What we are doing is adjusting the method that places a use statement on files with no namespace. The first for loop checks to see if it can find another use statement to hook onto as a reference and then place itself there. If it doesn't, it does a (terribly fragile) check to see if it has multiple doc block comments by checking the attributes of the first node and then we assume the first comment is the license header and insert the use statements after it. Unfortunately, MediaWiki is massive and there are edge cases these don't cover, so a manual check of the changed files is recommended.

Step-by-step[edit]

  1. Pull the wikimedia-rector repo
  2. Change the Rector version in composer.json to dev-main. This is because auto-import works on namespaced files in their dev branch, but hasn't been officially versioned out yet (as of 12/2021)
  3. Run composer install
  4. Install MediaWiki and its dependencies in another folder. I personally have it in ../core
  5. Create your custom rector and $services->set it in rector.php, along with any custom options
  6. Adjust the source code as discussed before
  7. Inside your wikimedia-rector folder, you can run Rector on a file/folder using
    vendor/bin/rector process --working-dir=../core --autoload-file=vendor/autoload.php example.php
  8. Rector sometimes doesn't auto-import things even when we tell it to. Not sure if it's because of the source code modifications or if it's caused by something else. You can fix it by running Rector again, but to make it faster you can comment out your actual rector so it only runs the auto-importer. This runs Rector on all changed files:
    vendor/bin/rector process --working-dir=../core --autoload-file=vendor/autoload.php $(cd ../core && git ls-files -om --exclude-standard)
  9. Go into your MediaWiki folder and run codesniffer on changed files:
    vendor/bin/phpcbf -p $(git ls-files -om --exclude-standard)
  10. Then right-click the includes folder and click on 'Optimize Imports' and check the box to only 'process only VCS changed files' so it only affects our rectored files.
  11. Push the patch, let CI run and inform you of any code style violations you have to fix manually. During the time CI is running, you might as well skim over the changed files and make sure Rector didn't do anything unexpected.
  12. After fixing all CI errors, you're done :)

Things To Keep In Mind[edit]

  1. When you run rector, there will be a lot of warnings about the 'use of undefined constants'. This is fine.
  2. Rector sometimes (again, I have no idea why), absolutely destroy a file. You can see if it did this if it starts to throw warnings about how it can't process a file. This is due to some dependency being broken. These unprocessed files will have to be reprocessed when you find out which file broke.