Jump to content

User:Msz2001/AbuseFilter analyzer

From Meta, a Wikimedia project coordination wiki
Evaluation tree as made by gadget

The AbuseFilter analyzer is a user script that's capable of parsing the AbuseFilter conditions and evaluate them. All the processing is done in the user browser and values in all the intermediate nodes are recorded as well as is the final result.

The script might be especially useful when debugging filters and checking for a reason why a false-positive was generated.

This script is hosted in a GitHub repository whpac/abusefilter-analyzer and the compiled version is hosted on plwiki: w:pl:Wikipedysta:Msz2001/abusefilter-analyzer.js. It's written in TypeScript.

How does it work

[edit]

Hit details mode

[edit]

When you navigate to AbuseLog and display details of a log entry, the evaluation tree will render above the attempted diff. From there you can see the value of every expression that is part of the filter conditions.

The script is shipped with its own AF rule parser and evaluator that are executed in the user browser. This is done to reduce the dependency on AbuseFilter's limited API and can allow to create more complex applications of the library.

Mass check mode

[edit]
Result of the Mass check mode

The second mode allows to perform a batch analysis of hits of the same filter. You can use it on AbuseLog, when it's filtered by the filter identifier. In this mode, the user is asked to type the number of log entries to be loaded and processed.

The result would be a syntax tree, similar to what's displayed in Hit details mode. However, this time every tree node will have the value distribution shown. For every node, you'll be able to see the most common value right away and the exact frequencies of all encountered values are available upon clicking on it.

Please note, that while there's no upper limit on the number of entries that can be processed at once, there may be a performance drop if you ask for dozens of thousands hits. The exact moment when it happens is dependent on the filter complexity and variable sizes. If the filter that you're analyzing makes use of ccnorm functions, please reduce the number of log entries even more, preferably to a few hundreds – these functions are evaluated on the server and we don't want to kill it with too many requests.

Install

[edit]

Add to your common.js file:

mw.loader.load("//pl.wikipedia.org/w/index.php?title=Wikipedysta:Msz2001/abusefilter-analyzer-primer.js&action=raw&ctype=text/javascript");

This will load the script only on abuse log page.

Features

[edit]
  • Display a calculated value in every tree node.
  • Allows to analyze multiple hits of the same filter.
  • Evaluate the whole tree, regardless of conditional and short-circuiting operators (but some of it is calculated speculatively, without effect of the result).
  • Report evaluation errors where they happen but continue evaluation.
  • Process PCRE syntax of regex.
  • Code architecture is highly modular and designed to be reusable.

Known limitations

[edit]
  • The log entry is tested against the most recent version of the filter, so evaluation can yield false as the result.
  • Some variables are generated and saved by AbuseFilter only if they are explicitly read by the filter (in the specific execution path). Therefore some variables will be null, even though they would have another value at the actual execution.
  • Functions from the ccnorm family are executed on server and are available only for those who can use the abusefilterevalexpression endpoint.
  • Regular expressions are translated into JS ones using a custom-made library for that purpose. It may be imperfect.
  • Only English interface messages are supported, some errors are reported with code string like dividebyzero.

Plans for the future

[edit]
  • Implement "regex explorer" view, similar to regex101.
  • Internationalize the gadget.