Research:ORES-powered TeaHouse Invites

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.


The research aims to use machine learning to predict how "good" new editors to Wikipedia are and invite them to the TeaHouse based on that data, within their first few edits. The technology builds on the ORES platform, aggregating predictions about edits into higher-level predictions about edit-sessions, and finally to predictions about users.


Newcomer retention is one of the largest problems facing Wikipedias today. One approach that has found success is to provide newcomer welcoming and mentoring programs such as the English Wikipedia's TeaHouse (or French Wikipedia's Forum des nouveaux.) However getting new editors to those forums usually involves either a) inviting all newcomers, which has the problems of overwhelming mentors and potentially invites vandals, or b) inviting a subset of newcomers based on heuristics, which could miss out on some good editors. Artificial intelligence or machine learning could potentially mitigate those problems by inviting only the best newcomers without humans having to sort through the hundreds or thousands of newly registered editors each day.

Background on HostBot[edit]

Since 2012 HostBot has been working in tandem with the English Wikipedia TeaHouse to do the repetitive work of inviting newly-registered users. In order to keep the number of invitees manageable for TeaHouse hosts, HostBot limits itself to inviting just 300 users among the approximately 2000 qualifying every day.

Proposed Initial Experiment: TeaHouse invites[edit]

In order to test this new technology, we propose an A/B test between the current iteration of HostBot and an AI-powered prototype (HostBot-AI) to determine if it can help retain more newcomers.

How does HostBot currently work?[edit]

The current way that HostBot works is that every day it searches for users with the following criteria:

  1. User registered within the last 2 days
  2. User has made at least 5 edits
  3. User has not been blocked
  4. User is not a registered bot
  5. User page does not contain any level 4 user warnings, or keywords that indicate likely bad faith or other highly problematic behavior.[1]

It then selects 300 users randomly meeting those criteria and invites them to the TeaHouse.

How would HostBot-AI work?[edit]

HostBot-AI would perform the same operation as HostBot—inviting users to the TeaHouse—but it would prioritize the editors it invites with AI, rather than selecting randomly.

The AI would prioritize the newcomers based on their predicted goodfaith-ness. That "goodfaith" definition comes from ORES, which plain predicts if the "user was acting in goodfaith". It would be alternatively possible to prioritize the non-damaging-ness of editors, as ORES was trained to predict that too, but we believe that the goodfaith measure is more inline with the TeaHouse's values.

Another way that HostBot-AI would be different is that it would operate more quickly than HostBot. HostBot checks daily to see if users have crossed an edit threshhold. HostBot-AI could make predictions for any user after their second or third edit (depending on the community's preference), in close to real-time.

How different would theses two methods even be?[edit]

The AI-powered prototype would still respect the 300 users per day invite limit, but it would select the 300 users it had the highest confidence were goodfaith. For instance if a new user makes 5 edits on a page that are each vaguely promotional or vain, they might not be blocked and HostBot might invite them. HostBot-AI on the other hand would predict that they are only moderately goodfaith and select even more goodfaith editors in their place. As there are about 2,000 users registering every day on English Wikipedia this prioritization could make a substantial difference.

Example list of differences[edit]

See an example differences in the invite lists, in a simulated output of the two bots on the same day—Research:ORES-powered_TeaHouse_Invites/Comparison.

What are the exact parameters of the experiment?[edit]

  1. The test conducted would be an A/B test between HostBot (A) and HostBot-AI (B).
  2. The exact statistical test and retention metric have not been finalized (we welcome your input). We would like to copy as much of the statiscal measures used as previous papers[2] on the TeaHouse so that our results could be comparable.
  3. The experiment would run for one month.
  4. The experiment randomization would occur based on each user and both bots would be running concurrently.
  5. The experiment would be "blind" meaning that the hosts and invitees would not know which method was being used.

What are the risks?[edit]

The main risk to conducting this experiment is that HostBot-AI will not be as effective at inviting high-quality users as HostBot, and as such we will fail to invite the the more deserving editors to the TeaHouse. In that case, in the very worst-case scenario, 150 users/day * 30 days = 4,500 users will not get the TeaHouse invites they would have otherwise. This calculation is an upper-bound on the risk, as it is likely that some of the users HostBot-AI invites will be the same as HostBot, which occured 100 times in the simulated comparison.

Does committing to the experiment mean committing to switching to HostBot-AI?[edit]

No. For now we only want to run the experiment, if it is successful a second community consultation could be had to instating the bot. If the experiment is unsuccessful it would be a learning experience for the developers of the technology.

Who is running this experiment exactly?[edit]

I Maximilianklein would be the principal researcher and software engineer in my capacity as a contractor for the Wikimedia Scoring Platform team. Experiment design consultation also comes from user:Jtmorgan, maintainer of HostBot.

Technical details[edit]

More technical details can be found on mw:ORES/Newcomerquality


  2. Evaluating the impact of the Wikipedia Teahouse on newcomer socialization and retention. Jonathan T. Morgan, Aaron Halfaker.