Research:Spambot detection via registration page behavior

From Meta, a Wikimedia project coordination wiki
Created
21:16, 19 December 2017 (UTC)
Duration:  2017-12 – 2018-3
This page documents a completed research project.


Wikimedia's captcha system has a ~20-30% user failure rate while at the same time too weak to stop even off-the-shelf OCR tools. Making it less distracting for users while keeping at least its current effectiveness against spambots might positively impact registrations. We approach that by observing registration page behavior (such as mouse or keyboard dynamics) to detect human-like traits and so that most real users can be let through without filling a captcha, and inconclusive cases can fall back to the traditional captcha (or possibly a harderned version of it).

The output of the project will be better monitoring tools for captcha effectiveness, a report on the viability of the concept, and - if it proves viable - software that integrates with the registration page and performs the detection.

For details on the project, see T158909 and T178463.

Methods[edit]

Information on user behavior will be collected via client-side analytics (EventLogging - see schema) and used to train a classifier, with labels obtained from block logs and similar sources. The collected data will be examined for the ability to uniquely identify the user (which is something to avoid). If the method performs well, the final software will be a service that judges user registrations in real time, without recording any data.

Timeline[edit]

See T178463.