Jump to content

User:Msz2001/AbuseFilter analyzer

From Meta, a Wikimedia project coordination wiki
Evaluation tree as made by gadget
AbuseFilter analyzer has been recognized with Coolest Tool Award 2025!

The AbuseFilter analyzer is a user script that's capable of parsing the AbuseFilter conditions and evaluate them. All the processing is done in the user browser and values in all the intermediate nodes are recorded as well as is the final result.

The script might be especially useful when debugging filters and checking for a reason why a false-positive was generated.

This script is hosted in Wikimedia GitLab repository msz2001/abusefilter-analyzer and is loaded from there. It's written in TypeScript.

How does it work

[edit]

Hit details mode

[edit]

When you navigate to AbuseLog and display details of a log entry, the evaluation tree will render above the attempted diff. From there you can see the value of every expression that is part of the filter conditions.

The script is shipped with its own AF rule parser and evaluator that are executed in the user browser. This is done to reduce the dependency on AbuseFilter's limited API and can allow to create more complex applications of the library.

In this mode, you can click on any of the string matching operators (such as: rlike, glob, in) to see a Match details view, which highlights the exact text that triggered the match.

Mass check mode

[edit]
Result of the Mass check mode

The second mode allows to perform a batch analysis of hits of the same filter. You can use it on AbuseLog, when it's filtered by the filter identifier. In this mode, the user is asked to type the number of log entries to be loaded and processed.

The result would be a syntax tree, similar to what's displayed in Hit details mode. However, this time every tree node will have the value distribution shown. For every node, you'll be able to see the most common value right away and the exact frequencies of all encountered values are available upon clicking on it.

Please note, that while there's no upper limit on the number of entries that can be processed at once, there may be a performance drop if you ask for dozens of thousands hits. The exact moment when it happens is dependent on the filter complexity and variable sizes.

Install

[edit]

Add to your common.js file:

mw.loader.load("//pl.wikipedia.org/w/index.php?title=Wikipedysta:Msz2001/abusefilter-analyzer-primer.js&action=raw&ctype=text/javascript");

This will load the script only on abuse log page.

Features

[edit]
  • Display a calculated value in every tree node.
  • Allows to analyze multiple hits of the same filter.
  • Evaluate the whole tree, regardless of conditional and short-circuiting operators (but some of it is calculated speculatively, without effect of the result).
  • Report evaluation errors where they happen but continue evaluation.
  • Highlights the part of string which triggered a regular expression or glob pattern.
  • Process PCRE syntax of regex.
  • Code architecture is highly modular and designed to be reusable.

Known limitations

[edit]
  • The log entry is tested against the most recent version of the filter, so evaluation can yield false as the result.
  • Some variables are generated and saved by AbuseFilter only if they are explicitly read by the filter (in the specific execution path). Therefore some variables will be null, even though they would have another value at the actual execution.
  • Regular expressions are translated into JS ones using a custom-made library for that purpose. It may be imperfect.
  • Some errors are reported in English only or with code string like dividebyzero.

Report errors and ask for features

[edit]

If you encounter any errors or would like to see a new feature, please contact Msz2001 or create an issue in the GitHub repo.