Grants:IdeaLab/BadWords detector for AbuseFilter

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
IdeaLab beaker and flask.svg
BadWords detector for AbuseFilter
To be able to detect abuse we need a way better solution to detect bad words (profanity) than the present regexes in AbuseFilter. Not only should we be able to detect single words, but also to detect composites, and we should also be able to score them.
Hex icon with lightning white.svg
idea creator
Jeblad
Hex icon with flask white.svg
researcher
Csisc
this project needs...
Hex icon with hand black.svg
volunteer
Hex icon with hexes black.svg
developer
join
endorse
created on14:39, 7 June 2016 (UTC)


Project idea[edit]

What is the problem you're trying to solve?[edit]

When users are yelling at each other it goes to long before other people becomes aware of the situation, and because nobody intervene the whole situation escalates. If other users became aware of the situation much earlier it could be deescalated.

What is your solution?[edit]

To make other users aware of the situation we must somehow trigger warnings early enough. One way is to detect the foul language and tag the edits. That will make people aware of the post. To do so we must be able to describe what is a bad word (profanity), but those can be composites and difficult to detect in simple regexes.

What we need is probably a separate lib of some kind that analyzes the post and give a score (a fuzzy value). That score can then be used in AbuseFilter to build confidence in whether the post should be tagged or not.

Note that the first part of this is simply to get a functional bad word detector, how to use the scoring in AbuseFilter is step two.

Background[edit]

In a previous project at nowiki I tried to make a detector for foul words in Norwegian. That went quite well until I found a file with foul words from an county in Norway called Nordland. It is common in Nordland to use animal names and names of genitalia as foul words. Not all of them are foul, and can infact be interpreted as superlatives. For example a "måspeis" (literally "dick of a seagull") is interpreted as quite offensive. On the other hand, as I was told by a "nordlending" (demonym for person from Nordland), "hestpeis" (literally "dick of a horse") can be interpreted as a superlative, it depends on the setting and who tells you so. Asked about "hestskjit" (literally "horse manure") as a foul word he said, "have you seen the glimpse in a horse eye when he nearly shit on you?". It is a foul word but not a very strong one, but that too depends on the context.

It seems to me that a working solution is to make groups of words that might be parts of bad words, and then describe how those words are merged. A group of "animals" could consist of "mås" an "hæst", and another group of "profanity" could consist of "peis" and "skjit". In addition those words could be modified by affix rules, like an infix "e". That would make it possible to detect "måspeis" / "måsepeis", and "hæstskjit" / "hæsteskjit".

It seems to me that the simplest way to make a detector is to describe how the fragments merge into composites, and use this as a description of a FSM. The FSM would take a composite word or phrase and try to parse it. If is completes then it has detected a possible bad word. This solves the detection, but it does not solve the weighting. One solution could be to learn the wights in a neural net by noting that the fragments are indexes into a feature vector. (And then because the feature vector could be long it could be folded into a subspace, and then the points in subspace could be clustered, and it would then be those clusters that would be inputs to the network.)

In short, composite words are difficult to detect, and how to score them are hard. If extracted the scores could be learned through sentiment analysis.

There is a proposal Grants:IdeaLab/Restrict words which needs this functionality, or some functionality of similar type.

For a more in-depth description see Grants:IdeaLab/BadWords detector for AbuseFilter/Technical description

Goals[edit]

Get Involved[edit]

About the idea creator[edit]

I've been a contributor on Wikimedia projects for more than ten years, and have a cand.sci. in math and computer sciences.

Participants[edit]

  • Researcher We can generalize the work for other languages Csisc (talk) 19:45, 17 June 2016 (UTC)

Endorsements[edit]

  • Because it's easy to deploy n avoids entering insulting words Nbelohlavek (talk) 21:35, 17 June 2016 (UTC)

Expand your idea[edit]

Would a grant from the Wikimedia Foundation help make your idea happen? You can expand this idea into a grant proposal.

Expand into a Rapid Grant
Expand into a Project Grant
(launching July 1st)