Research talk:Wikipedia type Articles Generated by LLM (Not for Publication on Wikipedia)

From Meta, a Wikimedia project coordination wiki


Hello, I'm pinging you because you have been involved in training, using, or evaluating mw:ORES (and ORES adjacent) models in the past. We are working on a project exploring the capabilities of LLM models. See Research:Wikipedia type Articles Generated by LLM (Not for Publication on Wikipedia) for more details. Please fill out this form if you are interested in participating. We will be providing monetary compensation for participants. @pinging: User:Rotpunkt, User:Rosiestep, User:Putnik, User:Ciell, User:RonnieV, User:Isaac (WMF), User:MGerlach (WMF), User:Diego (WMF), User:Petrb, User:Vermont, User:Theklan, User:1997kB, User:Zhuyifei1999, User:Stang, User:4shadoww, User:Zache, User:Lsanabria, User:Kizule, User:Aca, User:Srđan, User:Bencemac, User:Bodhisattwa, User:MarcoAurelio, User:Lokal Profil, User:Tshrinivasan, User:Ivi104, User:Geraki, User:He7d3r, User:Krinkle, User:Kenrick95, User:YMS, User:Iluvatar, User:Superzerocool, User:SQL, User:Evad37, User:Nettrom, User:Bamyers99, User:Ragesoss, User:SD0001, User:Nirmos, User:West.andrew.g, User:Daniuu, User:Bas_dehaan, User:Dajasj, and User:Fuzheado

I won't ping you again, but we appreciate any participation or feedback. Terribilis11 (talk) 22:23, 28 November 2023 (UTC)[reply]

Hello, I'm glad to hear this. I just filled out the form, I'm looking forward to participating. Kizule (talk) 02:00, 29 November 2023 (UTC)[reply]
Thank you! We are hoping to begin the evaluation next week. Terribilis11 (talk) 20:39, 30 November 2023 (UTC)[reply]


If this project has been in development since 2021, I'd love to hear how your thoughts about it have changed over that time :) –SJ talk  00:31, 29 November 2023 (UTC)[reply]

Stanford's OVAL lab has been working since 2021, but this project in particular is just a few months old. Terribilis11 (talk) 20:40, 30 November 2023 (UTC)[reply]

Possibilities to align more directly with Wikipedia processes[edit]

Hey @Terribilis11 thanks for the ping. I did a readthrough of the project and I had a few thoughts about how you might consider how this work could more clearly provide benefits to Wikipedia. I appreciate that you clearly state that the goal is not to publish the articles but a tool for auto-generating Wikipedia articles can still feel like its main purpose is to replicate Wikipedia but without the important human input that leads to its reliability and quality. The obvious goal for this sort of tool would be to help editors in generating content but that's still something that relies heavily on good-faith usage and careful checking of outputs. Instead, I've been thinking about more constrained use-cases where this sort of tool could be of use to aid in Wikipedia processes while being less likely to be abused or lead to problematic content.

One possibility is to connect it with New Page Patrol. This is where trusted English Wikipedia (enwiki) editors vet new enwiki articles that were created by new users. There is a large backlog there because it can take a lot of work to vet a new article (as you acknowledge for the evaluation), especially when you're less familiar with a topic so don't necessarily know what should be in the article. I wonder if your tool could be used to provide a "comparison" article for any new draft article to help New Page Patrollers to more quickly evaluate these drafts (either it almost exactly matches the AI-generated version, which could be suspect, or it doesn't and then the differences still might help the patroller to identify errors or even help to improve the draft). In practice, this would mean focusing on those sort of draft articles (which might have a different distribution than the existing articles) and maybe even building the interface to directly compare the draft with the AI-generated version.

You might find additional inspiration from WikiProject AI Cleanup (group of editors collaborating around a specific topic or task) too. Isaac (WMF) (talk) 16:49, 5 December 2023 (UTC)[reply]

Hey @Isaac (WMF) I appreciate your feedback, and I fully agree that the use case you suggested could be a great way for the research project to be beneficial to Wikipedia. From the outputs we have been producing, our model would do a great job of producing articles that at the least through comparison could help improve an existing draft. Our current steps are to generate results to contextualize the scoring of our articles. These sound like great next steps for us to work on after. Terribilis11 (talk) 01:24, 6 December 2023 (UTC)[reply]

Hello, I'm pinging you because you are a frequent editor on the Wikipedia:New Pages Patrol in the past. We are working on a project exploring the capabilities of LLM models. See Research:Wikipedia type Articles Generated by LLM (Not for Publication on Wikipedia) for more details. We fill that your help would be extraordinarily valuable in scoring the outputs of our model. We aren't intending on publishing these articles, but there are possibilities for integration with Wikipedia Process as suggested by @Isaac (WMF) above. We are currently beginning the evaluation process. Please fill out this form if you are interested in participating. We will be providing monetary compensation for participants.

@pinging: @WikiOriginal-9 @Thilsebatti@Hey man im josh@Utopes@BoyTheKingCanDance@Ipigott@Stuartyeates@MPGuy2824 @Reading Beans @Aviram7 @TechnoSquirrel69@DreamRimmer@Scope creep@Kline@Significa liberdade @SunDawn @ARandomName123 @NotAGenious @Hughesdarren @WaddlesJP13 @FULBERT @Buidhe @Clovermoss @Elmidae @Pppery @Graeme Bartlett @Ingratis @Timothytyy @MicrobiologyMarcus @Mccapra @StarTrekker @Schminnte @FuzzyMagma @Simon Peter Hughes @Pichpich @Alexandermcnabb @JackFromWisconsin @Joseywales1961 @Rosguill @Tails Wx @Slgrandson @Ratnahastin @Shellwood @ULPS Terribilis11 (talk) 19:33, 6 December 2023 (UTC)[reply]


Thank you, Terribilis11 for bringing this to my attention. According to the research page, you do not have plans to publish this work, and as Isaac has indicated, there are various concerns regarding auto-generated Wikipedia articles. Given this, I'm curious what the overarching purpose/goal of this research is. What is the objective/purpose statement, and what are the use-cases for your desired end-product? Significa liberdade (talk) 19:41, 6 December 2023 (UTC)[reply]

Hi, by not for publication on Wikipedia, I mean our model is not meant to produce articles that will be published on Wikipedia. Our research is for publication at ACL conferences. We will be submitting to their rolling review and hopefully be asked to participate.
The use case mentioned above, would be future work beyond the scope of our current research, but I think easily attainable. Terribilis11 (talk) 20:30, 6 December 2023 (UTC)[reply]

As a former NLP researcher, I am extremely skeptical of the prospects of using LLMs for this purpose, and question whether it is a good idea for Wikipedia editors to be supporting this venture with their time and effort. I fully understand the experimental nature of what is being proposed and that none of it will hurt Wikipedia yet, but feel that advancing this line of scientific inquiry is at best a distraction and at worst the advancement of an existential threat to the construction of knowledge. signed, Rosguill talk 20:06, 6 December 2023 (UTC)[reply]

You make a great point about the growing danger of the ubiquitous nature of LLM generated content. While our research may contribute to the development of LLM technology, we are hopeful that it can help us move in a positive and necessary direction. One of the main focuses of our research is to improve the ability of our LLM to ground or only make factual claims. This is quite challenging, but in order to do it we have to develop the ability to be able to determine what is factual from evidence, and I think that this skill in particular can hopefully curb the damage of LLM content. Terribilis11 (talk) 07:31, 9 December 2023 (UTC)[reply]

As a new page patroller (in a broader sense than just articles) who comes into contact with promotional pages created by LLMs every day, I'm wondering if this LLM would be trained to avoid promotional editing. Your current editor scoring (as described on your sign-up form) accounts for two of Wikipedia's core content policies (no original research and verifiability) but doesn't seem to account for the equally as important neutral point of view. I've been supportive of studies of AI on Wikipedia before, though limited to anti-vandalism, but I am sceptical of an article-creating LLM based on my prior experience. I would like to know if the LLM would be available for use by the public after testing, and if so, does it as planned have safeguards against bad-actors who would use it for promotion? Schminnte (talk) 20:49, 6 December 2023 (UTC)[reply]

We are hoping to publish our research, so while our particular model will not be available for use through an app or api, our paper will make clear the steps we took to build it. As such, it will be reproducible. Terribilis11 (talk) 07:34, 9 December 2023 (UTC)[reply]

I take exception to your claim that The primary risk of our work is that the Wikipedia articles written by our system are grounded on information on the Internet which may contain some biased or discriminative contents. The primary risk of your work is that wikipedia and / or other textual information sources will be polluted with 'AI'-generated palp which will in the medium to long term degrade their quality. I'm interested to hear what precautions you are taking / are planning on taking to mitigate this risk. Stuartyeates (talk) 20:53, 6 December 2023 (UTC)[reply]

I agree that there is a definite danger of LLM generated content drowning out new content. While this is an issue with the development of LLM technologies in general, it is one with which we are concerned. In particular, an important aspect of our research is to develop a LLM that is grounded. Or rather one that will not "hallucinate" or make of facts. For our Wikipedia article, we want it additionally to be able to support any claim with a source, similar to Wikipedia. We are hopeful that a LLM of this type can help the "pollution" in a couple of different ways such as detecting the trash, or unsupported claims.
There is a risk with the continued development of LLM technology in the continued proliferation of its content causing a deluge of low-quality content and articles. We are hopeful that our ability to detect "hallucinations" can help curb this. Terribilis11 (talk) 07:40, 9 December 2023 (UTC)[reply]


@Terribilis11: Some of the milestones you have previously given have now passed. Are there any updates? Stuartyeates (talk) 07:21, 12 February 2024 (UTC)[reply]

@Terribilis11: Are there any updates? Stuartyeates (talk) 05:01, 9 April 2024 (UTC)[reply]

I'm not involved in the project but I saw that a paper has been released about this work: --Isaac (WMF) (talk) 18:42, 12 April 2024 (UTC)[reply]