Community Wishlist Survey 2022/Archive/Purely adding keywards on Abusefilter

From Meta, a Wikimedia project coordination wiki

Purely adding keywards on Abusefilter

NoN Duplicate of Community Wishlist Survey 2022/Anti-harassment/Expose more detailed diff information to the AbuseFilter

  • Problem: For anti-vandalism, we want to create an abusefilter to check purely adding keywords on the article, which means that the keywardA is being added by the user. Currently we need to check both of added_lines and removed_lines with contains_any() (e.g. contains_any(added_lines, "keywordA", "keywordB", "keywordC") & !contains_any(removed_lines, "keywordA", "keywordB", "keywordC") or rewrite it with regex (e.g. keywords := (keywordA|keywordB|keywordC); (added_lines regex keywords) & !(removed_lines regex keywords)) to avoid false-positive because added_lines by editing "This keywardA is good" to "This keywardA is good, I am added" contains "keywardA" even though the edit does not add "keywardA". Such workarounds make our maintainance difficult, especially by not-so-technically-skilled users. Since it is very efficient and widely used for anti-vandalism, supporting easy-to-use function to check purely adding (and also removing) a keyward by abusefilter would be helpful.
  • Who would benefit: Users trying anti-vandalism
  • Proposed solution: I have two ideas. Other solution idea is also welcome.
    • The lighter one is that contains_any() supports array of keywords as its arguments (i.e. keywords := ["keywordA", "keywordB", "keywordC"]; contains_any(added_lines, keywords), which currently supports only variadic arguments contains_any(added_lines, "keywordA", "keywordB", "keywordC").
    • The other one is to implement a new variable including only purely added/removed words. Note that extracting words is a little difficult on the language which does not leave space between words (e.g. CJK). This sample is one of the simplest.
  • More comments:
  • Phabricator tickets:
  • Proposer: aokomoriuta (talk) 02:58, 12 January 2022 (UTC)[reply]

Discussion