Research talk:Community-centered Evaluation of AI Models on Wikipedia

From Meta, a Wikimedia project coordination wiki

Community Artificial Intelligence governance & management tools[edit]

Hey folks! I hope you don't mind the ping. I'm pinging you because you have been involved in training, using, or evaluating mw:ORES (and ORES adjacent) models in the past. If you're interested in working with us to explore the design of tools to support the collaborative auditing of these models (see the project page for details), please respond below and submit this Google Form ( If you don't feel comfortable submitting the form, please feel free to email my team member, Tzu-Sheng via the wiki. Thanks!
@pinging: User:Rotpunkt, User:Rosiestep, User:Putnik, User:Ciell, User:RonnieV, User:Isaac (WMF), User:MGerlach (WMF), User:Diego (WMF), User:Petrb, User:Vermont, User:Theklan, User:1997kB, User:Zhuyifei1999, User:Stang, User:4shadoww, User:Zache, User:Lsanabria, User:Kizule, User:Aca, User:Srđan, User:Bencemac, User:Bodhisattwa, User:MarcoAurelio, User:Lokal Profil, User:Tshrinivasan, User:Ivi104, User:Geraki, User:He7d3r, User:Krinkle, User:Kenrick95, User:YMS, User:Iluvatar, User:Superzerocool, User:SQL, User:Evad37, User:Nettrom, User:Bamyers99, User:Ragesoss, User:SD0001, User:Nirmos, User:West.andrew.g, User:Daniuu, User:Bas_dehaan, User:Dajasj, and User:Fuzheado
Note that if you don't want to participate and you'd like me to leave you alone (for this project at least), just disregard this notification and I won't ping you again. --EpochFail (talk) 20:03, 11 November 2022 (UTC)[reply]

MBH, you came recommended by another Russian Wikipedian. Would you be interested in participating in this project? See details above and on the project page. --EpochFail (talk) 17:02, 17 November 2022 (UTC)[reply]
I don't exactly understand what you want from me. Since autumn 2017 I run a bot in ruwiki, that reverts possible vandal edits using ORES scores, the ruwiki community is mostly supportive of this bot. MBH (talk) 17:30, 17 November 2022 (UTC)[reply]
We're looking to build tools that help people evaluate and improve the machine learning models like the ones used in ORES. We plan to interview people involved in the production and use of tools that use ORES' predictions to understand needs and opportunities and building a system (probably some sort of tool but that depends on what we learn) to help with evaluating/improving models. We'd like to interview you, learn from you, and if you're interested, have you try out the system we plan to build. --EpochFail (talk) 17:36, 17 November 2022 (UTC)[reply]
I agree. MBH (talk) 09:59, 18 November 2022 (UTC)[reply]
Hi MBH, glad that you agreed! If you're available and comfortable with sharing your experience with us via an 1-hour interview, please submit this Google Form ( We will follow up to schedule a time that works best for you. Thanks! Tzusheng (talk) 23:59, 18 November 2022 (UTC)[reply]
I prefer to answer the questions by text rather then be interviewed by video. My English speaking and listening experience is not enough for this and I don't like video chats at all. MBH (talk) 02:15, 19 November 2022 (UTC)[reply]
Thank you, and of course! I just sent the interview questions to you via email. Please kindly share your experience and feedback with us by text whenever you're available. We really appreciate it. Tzusheng (talk) 23:08, 20 November 2022 (UTC)[reply]
I answered you by email. MBH (talk) 14:54, 23 November 2022 (UTC)[reply]
Thank you very much! Tzusheng (talk) 06:05, 25 November 2022 (UTC)[reply]
@Tzusheng can I see results of your research? MBH (talk) 12:09, 29 January 2023 (UTC)[reply]
Of course! We are now in the process of system prototyping and will keep everyone posted whenever there are any updates! Tzusheng (talk) 23:23, 29 January 2023 (UTC)[reply]
@Tzusheng @EpochFail so, what's results? MBH (talk) 09:24, 7 May 2023 (UTC)[reply]
@MBH Thank you for following up! I updated the timeline and included the results from the formative study. If you have any questions, please let me know! Meanwhile, we almost finish building the system and plan to recruit a small group for pilot testing around June. Please stay tuned! Tzusheng (talk) 15:30, 7 May 2023 (UTC)[reply]
What "system" you mean, what will do this system? MBH (talk) 15:51, 7 May 2023 (UTC)[reply]
@Tzusheng MBH (talk) 08:13, 8 May 2023 (UTC)[reply]
Great question! To evaluate AI systems used in Wikipedia, we first need up-to-date data that can be used for evaluation. We are building a system that can facilitate the curation of this data. Prior tools like Wiki labels require Wikipedians to visit a standalone website for labeling data. In contrast, our system is a script that embeds a plug-in into Wikipedia's existing interface so that people may easily provide data while continuing their work on Wikipedia. More details will be available soon! Tzusheng (talk) 10:39, 8 May 2023 (UTC)[reply]
[done] 08.2023: analyze results, write paper, document the results on this research page - so, could I read the paper? MBH (talk) 04:29, 4 November 2023 (UTC)[reply]
@Tzusheng @EpochFail MBH (talk) 04:29, 4 November 2023 (UTC)[reply]
@MBH Thanks for following up! The paper is currently under review. If accepted, we will publicly share it around February. Sorry for the lengthy academic cycle. Tzusheng (talk) 04:34, 4 November 2023 (UTC)[reply]
I've also updated the timeline to reflect the wait time for the paper review. Thanks again for following up! Tzusheng (talk) 04:39, 4 November 2023 (UTC)[reply]