Research:Between Prompt and Publish: Community Perceptions and Practices Related to AI-Generated Wikipedia Content
This page documents a planned research project.
Information may be incomplete and change before the project starts.
This project investigates how experienced Wikimedians across language communities perceive the increasing use of generative AI content on Wikipedia.
Introduction
[edit]Generative AI tools such as ChatGPT and Bard are reshaping how online content is produced.[1][2] While these tools are increasingly used to generate Wikipedia content,[1] there is concern that the volume and pace of AI-generated content may compromise article quality and Wikipedia’s long-standing principles of verifiability and neutrality.[3] While AI tools may lower the entry barrier for newcomers and enhance editorial productivity for experienced editors, they also raise concerns around misinformation, bias, and the adequacy of existing quality-control mechanisms on Wikipedia.
Wikipedia’s language editions have their own communities, with each community being able to make decisions autonomously about their edition of Wikipedia without having to consult anyone externally. This provides the communities with leverage and power in terms of how to deal with generative AI content in their language edition. Therefore, it is important to know the perceptions of members from different language communities about generative AI content, to understand how these may shape the AI related policies of their communities, and to know if there are any cultural differences in terms of perceptions towards AI.
We aim to explore how experienced editors understand and engage with generative AI: what tools they use, how they evaluate AI-assisted contributions, and what opportunities or risks they associate with AI generated content on Wikipedia. This research will provide insights to support the Wikimedia movement in developing community-informed guidelines, workflows, and policies for safeguarding knowledge integrity in the age of AI. Our findings will also contribute to shaping inclusive AI governance policies in the Wikimedia movement.
Methods
[edit]Distinguishing AI-generated content from human-created text has become increasingly difficult. Existing content verification mechanisms, including community oversight and automated tools like ORES, were not designed to handle the scale and complexity introduced by generative AI. Communities are often unequipped both in terms of capacity and technology to verify content at this scale using volunteer labor alone. To date, English Wikipedia has no formal policy governing AI-generated content, but there are certain guidelines on how to handle them. This would mean that the communities have to navigate a rapidly evolving generative AI landscape with unclear expectations.
This project responds to this gap by exploring how experienced editors across language communities are currently dealing with generative AI. We ask:
- How informed are experienced Wikipedians about generative AI?
- What are their perceptions of the quality of AI-generated content in their language?
- Which AI tools (if any) do they use, and in what contexts?
- What workflows do they adopt when integrating AI-generated content?
- How do they address the issue of information reliability when they integrate AI into their editorial workflow?
- What opportunities and challenges do they see in generating AI-driven content on Wikipedia?
We focus on the lived experiences and practices of Wikimedians themselves to address these research questions. Furthermore, we examine community perceptions across different languages and cultural contexts, aiming at providing practical, grounded insights that can inform future community guidelines and AI governance strategies.
By centering editor perspectives, this project aims to generate insights from practitioners that can guide community decisions and policy-making related to handling generative AI content on Wikipedia.
Participants
[edit]We will engage active editors from five language Wikipedias—English, Italian, Malayalam, Swedish and Bangla (as we have a high level of proficiency in these languages), focusing on administrators and experienced content creators. Participants will be recruited through direct outreach (e.g., via Telegram groups, mailing lists, Village Pumps), and a short screening form will assess eligibility.
Sampling will be based on purposive and snowball strategy, continuing until thematic saturation is achieved. Diversity in experience level, regional background, and familiarity with AI will be maintained.
Semi-Structured Interviews
[edit]We will conduct in-depth interviews exploring the following themes: familiarity with generative AI, perceptions of AI-generated content quality, use of AI tools (if any) in content creation, verification techniques used for checking sources of AI-generated content and attitudes toward future AI integration on Wikipedia.
Informed consent will be obtained from all the interviewees. All interviews will be anonymized and the primary data gathered would be made accessible only to the authors of the study.
Thematic Analysis
[edit]We will use Clarke and Braun’s approach to thematic analysis to code and interpret interview transcripts. This flexible method allows emergent patterns and editor attitudes to surface without imposing predefined categories.
Further Content Analysis
[edit]We will identify AI-related discussions from: a) different language Wikipedia talk pages (such as English and Italian), b) Diff blog, c) Wikimedia-I and d) Wikimedia Signpost. After that, we will perform content analysis on that dataset to let the key frames and themes emerge, while also describing the exact volume of data analysed.[4] Our aim is to include voices from non-English, non-European languages as well, hence we are actively on the lookout for such discussions among Bangla and Malayalam Wikipedians (the two Indian languages that we have native fluency in), although our preliminary assessment indicates that such content is sparse.
Timeline
[edit]May 2025 | Project planning |
June-October 2025 | Participant recruitment and semi-structured interviews |
November-December 2025 | Data analysis |
January-April 2026 | Drafting of research manuscript |
May–August 2026 | Drafting of summarized version of the manuscript, dissemination of findings |
August 2026 | Dissemination of findings at Wikimania 2026 |
Policy, Ethics and Human Subjects Research
[edit]It's very important that researchers do not disrupt Wikipedians' work. Please add to this section any consideration relevant to ethical implications of your project or references to Wikimedia policies, if applicable. If your study has been approved by an ethical committee or an institutional review board (IRB), please quote the corresponding reference and date of approval.
Results
[edit]Describe the results and their implications here. We encourage you to share preliminary data. Don't forget to make status=complete above when you are done.
Resources
[edit]Provide links to presentations, blog posts, or other ways in which you disseminate your work.
References
[edit]- ↑ a b Brooks, C., Eggert, S., & Peskoff, D. (2024). The Rise of AI-Generated Content in Wikipedia. arXiv preprint arXiv:2410.08044.
- ↑ Ooi, K. B., Tan, G. W. H., Al-Emran, M., Al-Sharafi, M. A., Capatina, A., Chakraborty, A., … Wong, L. W. (2023). The Potential of Generative Artificial Intelligence Across Disciplines: Perspectives and Future Directions. Journal of Computer Information Systems, 65(1), 76–107. https://doi.org/10.1080/08874417.2023.2261010
- ↑ Vetter, M.A., Jiang, J. & McDowell, Z.J. An endangered species: how LLMs threaten Wikipedia’s sustainability. AI & Soc (2025). https://doi.org/10.1007/s00146-025-02199-9
- ↑ White, M. D. & Marsh, E. E.(2006). Content analysis: A flexible methodology. Library trends, 55(1), 22-45.