Grants:Programs/Wikimedia Community Fund/Rapid Fund/Wikipedia's Factual Assistant (ID: 23544802)
Applicant details
[edit]- Main Wikimedia username. (required)
عباد ديرانية
- Organization
N/A
- If you are a group or organization leader, board member, president, executive director, or staff member at any Wikimedia group, affiliate, or Wikimedia Foundation, you are required to self-identify and present all roles. (required)
I'm a board member or president of a Wikimedia Affiliate or mission-allied organization.
- Describe all relevant roles with the name of the group or organization and description of the role. (required)
- Board member of Wikimedia NYC
- Previous board member of Wikimedia Levant
- Grantee of WikiTermBase (MS Strategy grants)
- Editor on Arabic Wikipedia
Main proposal
[edit]- 1. State the title of your proposal. This will also be the Meta-Wiki page title.
Wikipedia's Factual Assistant
- 2. and 3. Proposed start and end dates for the proposal.
2025-09-25 - 2025-12-31
- 4. What is your tech project about, and how do you plan to build the product?
Include the following points in your answer:
- Project goal and problem you solve
- Product strategy or project roadmap
- Technical approach (infrastructure, tech stack, key tools and services)
- Integrations or dependencies (if any)
*Project Goal*
The goal of the proejct is to test a novel approach of using open source large language models (LLMs) as fact-checkers that can provide a reliable paraphrasing of Wikipedia's content through a GenAI assistant. Through this experiment, we seek to gauge a new way of bringing more reliable use cases of LLMs and to collect data that can show the validity of LLMs as fact-checkers in future implementations.
- Problem*
This project is inspired by my personal experience as a Wikipedia editor (since 2009) and movement organizer, who's currently developing chatbots for a living. Unfortunately, most commercial generative AI chatbots have little guardrails for reliability and factuality, yet generative AI is quickly taking over Wikipedia's role as the primary source of information on the internet. Wikipedia's readership has been decreasing since 2013, and the trend is more alarming for smaller languages. Integrating GenAI more into our work is probably inevitable, so it's crucial to start experimenting with ways to align it with our values and principles sooner than later.
- Solution*
We'll build an experimental AI assistant for readers that exclusively draws answer from Wikipedia pages, and integrates an explicit and novel fact-checking step into its architecture that's inspired by Wikipedia's own fact-checking process by editors. This assistant is not intended for public use but only as a time-bound experiment, which will be used for rigorous testing and evaluation of this model's reliability compared to Wikipedia's baseline of reliable information. We'll enlist the support of editors and collaborators in manually fact-checking ~500 responses, and will collect other qualitative feedback to learn about the viability of such an assistant and how it compares to non-fact-checked off the shelf LLMs.
*Project Roadmap*
Setup & Basic architecture (September):
- Build initial Wikipedia-based RAG chatbot prototype.
- Choose evaluation benchmarks for factuality (Humanity’s Last Exam + others).
- Design testing plan with Wikipedian volunteers (blind evaluation to assess reliability).
- Generate responses across 3 pipelines: A) Plain LLM response generator, B) Added fact-checking step with a specialized fact-checking model (e.g. MiniCheck), C) Fact-checking step with other open source LLMs (for comparison).
- Set up on Toolforge with a user interface (e.g. Streamlit) for testing with Wikipedia users, including an interface to provide feedback on the accuracy of the answers (this will be an additional feedback collection channel in addition to more thorough testing below).
- Enlist volunteer Wikipedia editors and subject matter experts to evaluate a subset of responses for factuality, with a target of ~500 responses blindly-tested from across LLMs as well as native Wikipedia content for control.
- Collect qualitative feedback on potential reliability issues, and analyze the errors to understand the in-depth strengths and limitations of the AI assistant.
- Consolidate benchmark and manual evaluation findings to find out where the experimental assistant stands compared to commercial LLMs as well as Wikipedia's content as a baseline.
- Draft conclusions on real-world applicability (Wikipedia chatbot + industry use), potentially as a white-paper or research piece.
- Share preliminary findings with the Wikipedia community through Diff and the village pump.
The project will build a GenAI Retrieval-Augmented Generation (RAG) chatbot that exclusively uses Wikipedia's content as a knowledge base. The chatbot will be built on open source LLMs, will be accessible through Toolforge, and will be explicitly built for the purposes of gathering feedback and testing. It's not intended as a chatbot for public use at this point. Main technical components:
- Knowledge base: The knowledge base will be an English Wikipedia dump that's pre-processed into embeddings and fed into an open source vector database, likely to be either FAISS or ChromaDB.
- RAG architecture: The RAG code will use Python, and will test various open source models for the generation task, likely to include Llama, Mixtral and/or Gemma (this is a tentative list and we'll definitely comply with any requirements by the WMF regarding open weight vs. open source, etc.). Additionally, the architecture will integrate open source fact-checking models like MiniCheck to address the reliability of the responses.
- Front-end: The front-end will likely be built with the Streamlit Python library, as it typical with chatbots.
- Hosting: While we're hoping to host the project on Wikimedia Cloud, we're aware from previous experience of the high privacy and securtiy requirements around projects on the platform. Alternatively, we're also able to host through HuggingFace if that's an allowed option.
- Data: We're not collecting any required user data, but the chatbot interface will invite users to share optional qualitative feedback on the assistant's reliability issue, to help us crowdsource the evaluation process.
'
- 5. What is the expected impact of your project, and how will you measure success?
Include the following points in your answer:
- Milestones and progress tracking
- Project impact and success metrics
- 6. Who is your target audience, and how have you confirmed there is demand for this project? How did you engage with the Wikimedia community?
Include the following points in your answer:
- Project demand and target audience description
- Links to interaction(s) with Wikimedia community
- Evidence from community consultation such as the [Community Wishlist]
- 7. How will your team predict and manage potential user security and privacy risks, and what risks do you currently see?
Include the following points in your answer:
- The level of in-house or consulted security and privacy expertise you will have available to you during delivery of this project
- How your development, testing, and deployment processes mitigate the introduction of unnecessary security or privacy risks
- 8. Who is on your team, and what is your experience?
Include the following points in your answer:
- Your experience as a developer, relevant past projects
- Wikimedia SUL (developer), Gerrit, Github, Gitlab or other relevant public account handles
- Other team members, their roles and expertise
- 9. How will the project be maintained long-term?
Include the long-term maintenance plan with maintainer(s) in your answer. If you expect the long-term maintenance to incur expenses, please list those and the plan for long-term expense coverage.
- 10. Under what license will your code be released, and how will you ensure the product is well documented?
Include the following points in your answer:
- Code license and compatibility with Wikimedia projects
- Documentation plan
- 11. Will your project depend on or contribute to third-party tools or services?
- 12. Is there anything else you’d like to share about your project? (optional)
Budget
[edit]- 13. Upload your budget for this proposal or indicate the link to it. (required)
- 14. and 15. What is the amount you are requesting for this proposal? Please provide the amount in your local currency. (required)
- 16. Convert the amount requested into USD using the Oanda converter. This is done only to help you assess the USD equivalent of the requested amount. Your request should be between 500 - 5,000 USD.
USD
- We/I have read the Application Privacy Statement, WMF Friendly Space Policy and Universal Code of Conduct.
No
Endorsements and Feedback
[edit]Please add endorsements and feedback to the grant discussion page only. Endorsements added here will be removed automatically.
Community members are invited to share meaningful feedback on the proposal and include reasons why they endorse the proposal. Consider the following:
- Stating why the proposal is important for the communities involved and why they think the strategies chosen will achieve the results that are expected.
- Highlighting any aspects they think are particularly well developed: for instance, the strategies and activities proposed, the levels of community engagement, outreach to underrepresented groups, addressing knowledge gaps, partnerships, the overall budget and learning and evaluation section of the proposal, etc.
- Highlighting if the proposal focuses on any interesting research, learning or innovation, etc. Also if it builds on learning from past proposals developed by the individual or organization, or other Wikimedia communities.
- Analyzing if the proposal is going to contribute in any way to important developments around specific Wikimedia projects or Movement Strategy.
- Analysing if the proposal is coherent in terms of the objectives, strategies, budget, and expected results (metrics).
This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the web service of Wikimedia Foundation Funds, where the user has submitted their application. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.
