Grants talk:IdeaLab/BadWords detector for AbuseFilter

From Meta, a Wikimedia project coordination wiki

Interested in applying for a WMF Grant?[edit]

@Jeblad and Cscic: Thanks for your work on this idea during the Inspire Campaign to improve the AbuseFilter. Having read over the proposal, I wanted to ask if you were seeking funding through a Wikimedia Foundation grant: We have Rapid Grants for projects requiring up to USD 2,000 (applications are welcome anytime), and Project Grants for projects requiring more substantial funding (applications for the current round will be due Aug. 2nd). If you are interested, I wanted to offer my support in helping you develop your proposal. A few things you could do to get started include:

  • Gather some feedback on this project from nowiki, given you have some experience doing comparable work in that community
  • Specify what project you are seeking to improve with this proposal
  • Perhaps consider looking at Research:Detox to get an example of a working "bad words" detector, and how it might be improved.

As you work on your proposal, we also have sessions on Google Hangouts on July 20th, 29th, and Aug. 2nd. If you'd like to chat about your proposal at a different time, let me know and we'll try to arrange something individually. Thanks, I JethroBT (WMF) (talk) 20:45, 18 July 2016 (UTC)[reply]

Only skimmed the Research:Detox, and I think my idea is more of a tool that can be used than a research project. My idea was a very simplified approach to identify very specific language constructs, that otherwise are very hard to identify with AbuseFilter. Those language konstrukts works for some languages, but not all.
It would take some time to write a proper description of how this should be done, but it isn't very difficult. Baseline is just a FSM with words (patterns) as the edges. — Jeblad 23:24, 18 July 2016 (UTC)[reply]
Note that the current approach is in Research:Detox an n-gram model, which breaks down for agglutinative and partly for fusional languages. It is possible to compensate with increasing the amount of training material, but that has its limits as we don't have enough vandalism for the smaller projects. It is also those projects that use languages with these properties. — Jeblad 09:35, 21 July 2016 (UTC)[reply]

Grants to improve your project[edit]

Greetings! The Project Grants program is currently accepting proposals for funding. The deadline for draft submissions is tommorrow. If you have ideas for software, offline outreach, research, online community organizing, or other projects that enhance the work of Wikimedia volunteers, start your proposal today! Please encourage others who have great ideas to apply as well. Support is available if you want help turning your idea into a grant request.

The next open call for Project Grants will be in October 2016. You can also consider applying for a Rapid Grant, if your project does not require a large amount of funding, as applications can be submitted anytime. Feel free to ping me if you need help getting your proposal started. Thanks, I JethroBT (WMF) 22:49, 1 August 2016 (UTC)[reply]

Sorry, but I don't think I can do this for now. Perhaps I will reconsider later, but then it will probably be necessary to do some preliminary testing to see if this is feasible. In this specific case I suspect the rules can be much more involved than I hope. It is also an open question how the weight shall be learned, and how this should be done given rules that merge words. — Jeblad 14:10, 4 August 2016 (UTC)[reply]