Research:Between Prompt and Publish: Community Perceptions and Practices Related to AI-Generated Wikipedia Content
This page documents a research project in progress.
Information may be incomplete and change as the project progresses.
Please contact the project lead before formally citing or reusing results from this page.
This project investigates how experienced Wikimedians across language communities perceive the increasing use of generative AI content on Wikipedia.
Introduction
[edit]Generative AI tools such as ChatGPT and Bard are reshaping how online content is produced.[1][2] While these tools are increasingly used to generate Wikipedia content,[1] there is concern that the volume and pace of AI-generated content may compromise article quality and Wikipedia’s long-standing principles of verifiability and neutrality.[3] While AI tools may lower the entry barrier for newcomers and enhance editorial productivity for experienced editors, they also raise concerns around misinformation, bias, and the adequacy of existing quality-control mechanisms on Wikipedia.
Wikipedia’s language editions have their own communities, with each community being able to make decisions autonomously about their edition of Wikipedia without having to consult anyone externally. This provides the communities with leverage and power in terms of how to deal with generative AI content in their language edition. Therefore, it is important to know the perceptions of members from different language communities about generative AI content, to understand how these may shape the AI related policies of their communities, and to know if there are any cultural differences in terms of perceptions towards AI.
We aim to explore how experienced editors understand and engage with generative AI: what tools they use, how they evaluate AI-assisted contributions, and what opportunities or risks they associate with AI generated content on Wikipedia. This research will provide insights to support the Wikimedia movement in developing community-informed guidelines, workflows, and policies for safeguarding knowledge integrity in the age of AI. Our findings will also contribute to shaping inclusive AI governance policies in the Wikimedia movement.
Research Design and Methods
[edit]Distinguishing AI-generated content from human-created text has become increasingly difficult. Existing content verification mechanisms, including community oversight and automated tools like ORES, were not designed to handle the scale and complexity introduced by generative AI. Communities are often unequipped both in terms of capacity and technology to verify content at this scale using volunteer labor alone. To date, English Wikipedia has no formal policy governing AI-generated content, but there are certain guidelines on how to handle them. This would mean that the communities have to navigate a rapidly evolving generative AI landscape with unclear expectations.
This project responds to this gap by exploring how experienced editors across language communities are currently dealing with generative AI. We ask:
- How informed are experienced Wikipedians about generative AI?
- What are their perceptions of the quality of AI-generated content in their language?
- Which AI tools (if any) do they use, and in what contexts?
- What workflows do they adopt when integrating AI-generated content?
- How do they address the issue of information reliability when they integrate AI into their editorial workflow?
- What opportunities and challenges do they see in generating AI-driven content on Wikipedia?
We focus on the lived experiences and practices of Wikimedians themselves to address these research questions. Furthermore, we examine community perceptions across different languages and cultural contexts, aiming at providing practical, grounded insights that can inform future community guidelines and AI governance strategies.
By centering editor perspectives, this project aims to generate insights from practitioners that can guide community decisions and policy-making related to handling generative AI content on Wikipedia.
Participants
[edit]We will engage active editors from five language Wikipedias—English, Italian, Malayalam, Swedish and Bangla (as we have a high level of proficiency in these languages), focusing on administrators and experienced content creators. Participants will be recruited through direct outreach (e.g., via Telegram groups, mailing lists, Village Pumps), and a short screening form will assess eligibility.
Sampling will be based on purposive and snowball strategy, continuing until thematic saturation is achieved. Diversity in experience level, regional background, and familiarity with AI will be maintained.
Semi-Structured Interviews
[edit]We are currently conducting in-depth interviews exploring the following themes: use of generative AI tools in content creation and editing in Wikipedia and sister projects, perceptions of AI-generated content quality, verification techniques used for checking sources of AI-generated content, presence of language community specific guidelines and attitudes toward future AI integration on Wikipedia.
Informed consent is being obtained from all the interviewees. All interviews will be anonymized and the primary data gathered would be made accessible only to the authors of the study.
As of 30 November, 2025, 15 interviews with participants from 7 countries and 5 language editions have already been conducted.
Thematic Analysis
[edit]We will use Clarke and Braun’s approach to thematic analysis to code and interpret interview transcripts. This flexible method allows emergent patterns and editor attitudes to surface without imposing predefined categories.
Further Content Analysis
[edit]We will identify AI-related discussions from: a) different language Wikipedia talk pages (such as English and Italian), b) Diff blog, c) Wikimedia-I and d) Wikimedia Signpost. After that, we will perform content analysis on that dataset to let the key frames and themes emerge, while also describing the exact volume of data analysed.[4] Our aim is to include voices from non-English, non-European languages as well, hence we are actively on the lookout for such discussions among Bangla and Malayalam Wikipedians (the two Indian languages that we have native fluency in), although our preliminary assessment indicates that such content is sparse.
Timeline
[edit]| May 2025 | Project planning | |
| June-December 2025 | Participant recruitment and semi-structured interviews | |
| November 2025-February 2026 | Collection of other data and data analysis | |
| January 2026 | Abstract for WikiWorkshop | |
| March-April 2026 | Drafting of research manuscript (first draft) | |
| May–August 2026 | Update data, preparation of further drafts, informal dissemination of findings in the community | |
| August 2026 | Dissemination of findings at Wikimania 2026 | |
| September 2026-February 2027 | Different dissemination outputs submitted following feedback from Wikimania |
Policy, Ethics and Human Subjects Research
[edit]Recognizing the importance of time of our research participants, we have been reaching out to them at least two to three weeks in advance requesting them to speak with us at a time and on a date which is most suitable for them. We share the project information sheet and consent form when we reach out to them. Because of the sensitivity of the subject matter and our effort to ensure anonymity of participants, we have taken the decision to not use names of any participant in any of the dissemination outputs. The personal data that we are gathering - such as age cohort, gender, educational background, profession - are solely for research purposes and will not be shared in any dissemination output either.
Results
[edit]A preliminary analysis of interviews reveals a complex and ambivalent relationship between Wikipedia contributors and generative AI tools. While some editors describe AI as a productive collaborator, particularly useful for tasks such as translation, drafting, or sparking initial ideas, many voice deep concern over its broader implications for knowledge credibility, editorial integrity, and the superiority of human-authored content. Some editors have noted that AI-generated summaries, increasingly prioritized by search engines, are reshaping how users access and trust information, often bypassing the contextual richness and transparency that Wikipedia traditionally offers. Contributors highlight the ethical dilemmas posed by non-attributed AI-generated text, especially given that many large language models have been trained, in part, on Wikipedia content without community consent or credit.
Underlying these discussions is a more systemic anxiety: that of epistemic circularity. Wikipedia has long served as a foundational training resource for major language models such as OpenAI’s GPT, Microsoft’s CoPilot, and Google’s Gemini. Now, as these very models are increasingly used to draft or edit Wikipedia content, contributors worry that the platform may begin to recursively train on its own AI-generated derivatives. This feedback loop raises urgent questions about knowledge reliability, source provenance, and editorial accountability.
Rather than reflecting a single perspective, our data reveals a fragmented field of opinion across language communities. Contributors range from cautious adopters to vocal skeptics, reflecting broader societal debates about AI’s role in labor, authorship, and power. Concerns also include the potential deskilling of volunteer editors, the risk of automating away collective decision-making, and the geopolitical dimensions of AI infrastructure and ownership.
We have also found that several language communities already have their own AI-generated content guidelines, such as Bangla, Italian and Swedish. As the project moves forward, efforts will be made to bring all these different understandings and measures together for a broader community consensus.
Resources
[edit]Provide links to presentations, blog posts, or other ways in which you disseminate your work.
References
[edit]- ↑ a b Brooks, C., Eggert, S., & Peskoff, D. (2024). The Rise of AI-Generated Content in Wikipedia. arXiv preprint arXiv:2410.08044.
- ↑ Ooi, K. B., Tan, G. W. H., Al-Emran, M., Al-Sharafi, M. A., Capatina, A., Chakraborty, A., … Wong, L. W. (2023). The Potential of Generative Artificial Intelligence Across Disciplines: Perspectives and Future Directions. Journal of Computer Information Systems, 65(1), 76–107. https://doi.org/10.1080/08874417.2023.2261010
- ↑ Vetter, M.A., Jiang, J. & McDowell, Z.J. An endangered species: how LLMs threaten Wikipedia’s sustainability. AI & Soc (2025). https://doi.org/10.1007/s00146-025-02199-9
- ↑ White, M. D. & Marsh, E. E.(2006). Content analysis: A flexible methodology. Library trends, 55(1), 22-45.