Grants:IdeaLab/BadWords detector for AbuseFilter
What is the problem you're trying to solve?
When users are yelling at each other it goes to long before other people becomes aware of the situation, and because nobody intervene the whole situation escalates. If other users became aware of the situation much earlier it could be deescalated.
What is your solution?
To make other users aware of the situation we must somehow trigger warnings early enough. One way is to detect the foul language and tag the edits. That will make people aware of the post. To do so we must be able to describe what is a bad word (profanity), but those can be composites and difficult to detect in simple regexes.
What we need is probably a separate lib of some kind that analyzes the post and give a score (a fuzzy value). That score can then be used in AbuseFilter to build confidence in whether the post should be tagged or not.
Note that the first part of this is simply to get a functional bad word detector, how to use the scoring in AbuseFilter is step two.
In a previous project at nowiki I tried to make a detector for foul words in Norwegian. That went quite well until I found a file with foul words from an county in Norway called Nordland. It is common in Nordland to use animal names and names of genitalia as foul words. Not all of them are foul, and can infact be interpreted as superlatives. For example a "måspeis" (literally "dick of a seagull") is interpreted as quite offensive. On the other hand, as I was told by a "nordlending" (demonym for person from Nordland), "hestpeis" (literally "dick of a horse") can be interpreted as a superlative, it depends on the setting and who tells you so. Asked about "hestskjit" (literally "horse manure") as a foul word he said, "have you seen the glimpse in a horse eye when he nearly shit on you?". It is a foul word but not a very strong one, but that too depends on the context.
It seems to me that a working solution is to make groups of words that might be parts of bad words, and then describe how those words are merged. A group of "animals" could consist of "mås" an "hæst", and another group of "profanity" could consist of "peis" and "skjit". In addition those words could be modified by affix rules, like an infix "e". That would make it possible to detect "måspeis" / "måsepeis", and "hæstskjit" / "hæsteskjit".
It seems to me that the simplest way to make a detector is to describe how the fragments merge into composites, and use this as a description of a FSM. The FSM would take a composite word or phrase and try to parse it. If is completes then it has detected a possible bad word. This solves the detection, but it does not solve the weighting. One solution could be to learn the wights in a neural net by noting that the fragments are indexes into a feature vector. (And then because the feature vector could be long it could be folded into a subspace, and then the points in subspace could be clustered, and it would then be those clusters that would be inputs to the network.)
In short, composite words are difficult to detect, and how to score them are hard. If extracted the scores could be learned through sentiment analysis.
There is a proposal Grants:IdeaLab/Restrict words which needs this functionality, or some functionality of similar type.
For a more in-depth description see Grants:IdeaLab/BadWords detector for AbuseFilter/Technical description
About the idea creator
I've been a contributor on Wikimedia projects for more than ten years, and have a cand.sci. in math and computer sciences.
- Because it's easy to deploy n avoids entering insulting words Nbelohlavek (talk) 21:35, 17 June 2016 (UTC)
Expand your idea
Would a grant from the Wikimedia Foundation help make your idea happen? You can expand this idea into a grant proposal.