Jump to content

Grants:Programs/Wikimedia Community Fund/Rapid Fund/Swahili Wikipedia Citation Checker (ID: 23746161)

From Meta, a Wikimedia project coordination wiki
statusNot funded
Swahili Wikipedia Citation Checker
request or grant IDR-RF-2601-21527
proposed start date2026-05-01
proposed end date2026-07-15
requested budget (local currency)12474000 TZS
requested budget (USD)4912.08 USD
grant typeIndividual
funding regionSSA
decision fiscal year2025-26
applicantjacobgijjah
organization (if applicable)N/A

Applicant details

[edit]
Main Wikimedia username. (required)

jacobgijjah

Organization

N/A

If you are a group or organization leader, board member, president, executive director, or staff member at any Wikimedia group, affiliate, or Wikimedia Foundation, you are required to self-identify and present all roles. (required)

N/A

Describe all relevant roles with the name of the group or organization and description of the role. (required)

Main proposal

[edit]
1. State the title of your proposal. This will also be the Meta-Wiki page title.

Swahili Wikipedia Citation Checker

2. and 3. Proposed start and end dates for the proposal.

2026-05-01 - 2026-07-15

4. What is your tech project about, and how do you plan to build the product?

Include the following points in your answer:

  • Project goal and problem you solve
  • Product strategy or project roadmap
  • Technical approach (infrastructure, tech stack, key tools and services)
  • Integrations or dependencies (if any)

The Swahili Wikipedia, is one among the largest Wikipedia edition in African language, hosts over 82,000 articles but faces significant challenges in maintaining reference quality and citation reliability due to limited active editors and administrators (only 54% are primary Swahili contributors) and resource constraints ([1]), MDPI study shows ([2]). Research also shows smaller language editions often experience higher risks of broken links and outdated sources, emphasizing the need for automated tools like a citation checker to improve content accuracy and editor efficiency as per ([3]). For updated metrics, see the Wikimedia Statistics Dashboard for Swahili Wikipedia ([4]).

The swWikipedia Citation Checker  is a tool designed to automatically detect flag and broken or outdated references on Swahili Wikipedia. Many articles in Swahili Wikipedia contain dead links, inaccessible sources, or formatting errors, which reduce content reliability and reader trust. Manual verification by editors is time-consuming and difficult to scale, especially given that only ~54% of editors are primary Swahili contributors.

Many Swahili Wikipedia articles contain broken or outdated references due to website changes, expired domains, or deleted pages. Editors often spend significant time manually checking links and formatting citations.

This leads to:

  • Reduced reliability of information,
  • Frustration for editors and readers,
  • Loss of valuable references that could be recovered from web archives.

Our goal is to improve citation quality, ensure verifiable references, and increase editor efficiency.

Product Strategy / Project Roadmap

  1. Research & design                                                                  1 week
  2. Development: Bot/Parser & Wayback Machine integration     1 months
  3. Dashboard UI for editor review                                                1 week  
  4. Pilot testing on a subset of Swahili Wikipedia articles              2 weeks
  5. Community feedback, bug fixing, and iteration                        1 week 
  6. Full deployment and documentation                                        1 week


Key features:

- Automatically scan articles for `<ref>` tags and external URLs.

- Detect broken or inaccessible links.

- Suggest archived URLs using the Wayback Machine API.

- Identify common citation formatting errors (missing fields, invalid templates).

- Provide a Swahili-language dashboard for editors to review flagged references.

Technical Approach (Infrastructure, Tech Stack, Key Tools and Services)

- Bot / Parser: Python-based, using libraries like `mwclient` or `pywikibot` to access and edit Wikipedia pages.

- URL verification: Python’s `requests` library to check HTTP response status codes; detect 404s, 410s, timeouts, or SSL errors.

- Wayback Machine integration: Query the API (`[5]`) to suggest archived versions automatically.

- Citation validation: Regex and template parsing to detect formatting errors in `<ref>` tags and `Empty citation (help) `, `Empty citation (help) `, and `Empty citation (help) ` templates.

- Dashboard: Web-based front-end using React.js with a backend API in Python Flask/FastAPI to display flagged references and suggested fixes.

- Database / storage: PostgreSQL or SQLite for storing scanned articles, flagged links, and audit logs.

- Deployment & hosting: Cloud hosting (Heroku, Railway, or AWS Lightsail) for scalability and reliability.

- Version control & CI/CD: GitHub for repository management, with automated testing scripts.

Integrations or Dependencies

- Wayback Machine API: for retrieving archived versions of dead URLs.

- MediaWiki API: for reading and editing Swahili Wikipedia pages programmatically.

- Pywikibot or mwclient: for secure bot operations and API interactions.

- Optional: Email or Slack notifications for editors when references are flagged (enhances community adoption).

5. What is the expected impact of your project, and how will you measure success?

Include the following points in your answer:

  • Milestones and progress tracking
  • Project impact and success metrics

Milestones and Progress Tracking

The project will follow a milestone-based development and deployment process to ensure steady progress and measurable outcomes.

  • Milestone 1: Core system development Development of the citation scanning engine capable of parsing &lt;ref&gt; tags, extracting external URLs, and identifying broken or inaccessible links on Swahili Wikipedia articles.
  • Milestone 2: Wayback Machine integration Successful integration of the Wayback Machine API to automatically detect and suggest archived versions of dead links.
  • Milestone 3: Citation validation logic Implementation of checks for citation formatting issues, including missing required fields in Empty citation (help) , Empty citation (help) , and Empty citation (help)  templates.
  • Milestone 4: Editor review dashboard Deployment of a web-based dashboard where editors can review flagged citations, view suggested fixes, and track resolution status.
  • Milestone 5: Pilot deployment and testing Running the tool on a selected subset of Swahili Wikipedia articles, collecting editor feedback, and refining the system.
  • Milestone 6: Full deployment and documentation Scaling the tool to scan the full Swahili Wikipedia, publishing technical documentation, and onboarding editors through community training.

Progress will be tracked using version control (GitHub commits and issues), automated logs of scanned articles and detected issues, and dashboard analytics showing tool usage and fixes applied.

Project Impact and Success Metrics

The project aims to improve content reliability, editor efficiency, and long-term sustainability of Swahili Wikipedia.

Quantitative success metrics include:

  • Number of articles scanned by the tool.
  • Number of broken or inaccessible references detected.
  • Number of archived links successfully suggested via the Wayback Machine.
  • Number and percentage of flagged citations that are fixed by editors.
  • Reduction in unresolved dead links over time.
  • Number of active editors using the dashboard.

Qualitative success indicators include:

  • Positive feedback from Swahili Wikipedia editors on usability and usefulness.
  • Increased confidence among editors in maintaining citation quality.
  • Adoption of the tool as a regular maintenance aid within the Swahili Wikipedia community.
6. Who is your target audience, and how have you confirmed there is demand for this project? How did you engage with the Wikimedia community?

Include the following points in your answer:

  • Project demand and target audience description
  • Links to interaction(s) with Wikimedia community
  • Evidence from community consultation such as the [Community Wishlist]

Project Demand and Target Audience

The primary target audience for this project is active editors and contributors of Swahili Wikipedia, particularly those involved in article maintenance, referencing, and quality improvement. This includes experienced editors, new contributors who struggle with citation maintenance, and community organizers responsible for improving content quality.

Swahili Wikipedia has a large and growing number of articles but a relatively small pool of active editors, which creates a high maintenance burden. Editors frequently encounter dead links, outdated sources, and poorly formatted citations, yet manual checking is time‑consuming and difficult to scale. As a result, citation quality issues often remain unresolved for long periods. This creates clear demand for a technical tool that automates citation checking and supports editors with suggested fixes, rather than relying solely on manual review.

Secondary beneficiaries include readers of Swahili Wikipedia, who gain access to more reliable and verifiable information, and new editors, who benefit from clearer signals about which citations need attention.

Evidence of Demand and Community Need

Demand for this project is informed by:

  • Ongoing discussions within the Admins of Swahili Wikipedia community about article quality, reference reliability, and maintenance challenges.
  • The absence of a dedicated, active citation‑checking tool tailored to Swahili Wikipedia, despite similar tools existing for larger Wikipedias.
  • Broader Wikimedia community priorities, including the Community Wishlist, which consistently highlights needs around content reliability, automation, and maintenance tools that reduce editor workload and improve quality at scale.

Community Engagement and Consultation

The project idea was developed through direct engagement with members of the Swahili Wikimedia community,Admins, and Global WikiEducation Initiative including informal discussions with editors and contributors involved in article improvement and technical activities. Feedback from these interactions highlighted:

  • The difficulty of manually identifying broken or outdated references.
  • Interest in tools that provide actionable suggestions, such as archived links from the Wayback Machine.
  • The importance of a simple, editor‑friendly interface and a swahili tool that will help with the word.

The project team plans to continue structured engagement by:

  • Sharing the project plan on Swahili Wikipedia and Meta‑Wiki.
  • Inviting editor feedback during the pilot phase.
  • Iterating on the tool based on community input before full deployment.
7. How will your team predict and manage potential user security and privacy risks, and what risks do you currently see?

Include the following points in your answer:

  • The level of in-house or consulted security and privacy expertise you will have available to you during delivery of this project
  • How your development, testing, and deployment processes mitigate the introduction of unnecessary security or privacy risks

The project team includes members with hands‑on experience working with Wikimedia tools, bots, and MediaWiki APIs, which provides a strong baseline understanding of Wikimedia’s security, privacy, and bot policy requirements. The team will follow established Wikimedia best practices for bot development, API usage, and data handling.

By following Wikimedia’s established technical policies, limiting data collection, avoiding unnecessary automation, and engaging the community throughout development, the project will minimize security and privacy risks while delivering a valuable maintenance tool. This cautious and transparent approach will ensures that the Wikipedia Citation Checker is safe, trusted, and suitable for long‑term use within the Swahili Wikipedia ecosystem.

8. Who is on your team, and what is your experience?

Include the following points in your answer:

  • Your experience as a developer, relevant past projects
  • Wikimedia SUL (developer), Gerrit, Github, Gitlab or other relevant public account handles
  • Other team members, their roles and expertise

1. Jacob Gijjah (User: Jacobgijjah) – Project Lead & Backend Developer Jacob Gijjah is a software developer and experienced Wikimedia contributor who will serve as the Project Lead. He has been actively involved in Wikipedia, Wikidata, and Wikimedia Commons, contributing as both an editor and a technical contributor.Jacob has strong experience in backend development and API-based tools and will be responsible for:

  • Overall technical leadership and project coordination
  • Backend architecture and bot development
  • MediaWiki API and Wayback Machine API integration
  • Code quality, testing, and deployment oversight

Public accounts:

  • Wikimedia username: Jacobgijjah
  • GitHub: [6]
  • LinkedIn: [7]

2. Khalid Maumba – Frontend & UI/UX Developer Khalid Maumba is a frontend developer and UI/UX designer with experience building user‑friendly web interfaces. He will lead the design and implementation of the editor dashboard, ensuring it is intuitive and accessible for Swahili Wikipedia editors. His responsibilities include:

  • Frontend development and interface design
  • Dashboard usability and accessibility
  • Integration with backend services

Public accounts:

  • GitHub / GitLab: [8]
  • LinkedIn: [9]

3. Pellagia Njau (User:Pellagia Njau)  – Community Lead & Project Coordinator She is an active Wikimedia community organizer and contributor, with experience supporting Wikipedia and Wikidata initiatives in the Swahili‑speaking community. She will be responsible for:

  • Community engagement and consultation
  • Coordinating feedback from Swahili Wikipedia editors
  • Documentation, reporting, and grant compliance
  • Supporting testing, rollout, and community adoption

Public accounts:

4. Hussein Issa (User: Husseyn Issa) – Swahili Wikipedia Administrator & Community Advisor Hussein is a Swahili Wikipedia Administrator with deep experience in content moderation, policy enforcement, and community governance. He will serve as a community advisor, ensuring that the tool:

  • Aligns with Swahili Wikipedia policies and norms
  • Respects editorial workflows and admin practices
  • Is introduced in a way that supports trust, transparency, and long‑term adoption

Public accounts:

9. How will the project be maintained long-term?

Include the long-term maintenance plan with maintainer(s) in your answer. If you expect the long-term maintenance to incur expenses, please list those and the plan for long-term expense coverage.

Primary long‑term maintenance will be led by:

  • Jacob Gijjah (Project Lead &amp; Backend Developer) – responsible for core code maintenance, bug fixes, dependency updates, and backend stability.
  • Pellagia Njau (Community Lead) – responsible for coordinating community feedback, documenting issues, and ensuring alignment with Swahili Wikipedia needs.
  • Hussein Issa (Swahili Wikipedia Administrator) – providing oversight to ensure continued policy compliance and appropriate use within the community.

The codebase will be hosted on GitHub under an open‑source license, allowing other Wikimedia contributors to review, report issues, and contribute improvements. Clear documentation will be provided to lower the barrier for future contributors.

Operational Sustainability

After initial deployment, the tool will operate primarily in monitoring and reporting mode, which reduces operational complexity and maintenance overhead. Maintenance activities will include:

  • Periodic updates to API dependencies (MediaWiki and Wayback Machine)
  • Monitoring for changes in citation templates or Wikimedia policies
  • Responding to bug reports and community feedback
  • Incremental improvements based on editor needs

The team will continue to study tool usage and performance, using logs and dashboard metrics to guide improvements and ensure the tool remains useful and relevant.

Long‑Term Costs and Expense Coverage

The project is intentionally designed to have minimal recurring costs. Expected long‑term expenses include:

  • Basic server or cloud hosting
  • Domain or infrastructure monitoring (if applicable)

These costs are expected to remain low and can be covered through:

  • Continued volunteer maintenance by the project team
  • Low‑cost hosting options or community‑supported infrastructure
  • Potential future Wikimedia grants if expansion or major upgrades are required

No paid subscriptions or proprietary services are required for the tool to function.

10. Under what license will your code be released, and how will you ensure the product is well documented?

Include the following points in your answer:

  • Code license and compatibility with Wikimedia projects
  • Documentation plan

The project’s code will be released under the MIT Open License, a permissive open‑source license that is fully compatible with Wikimedia projects and policies. The MIT License allows free use, modification, and redistribution of the software, which supports Wikimedia’s principles of open knowledge, transparency, and reuse.

Thus:

  • Other Wikimedia communities and developers can freely adapt or extend the tool.
  • The code can be reused in other Wikimedia‑related projects without legal or licensing barriers.
  • Long‑term sustainability is improved by encouraging external contributions and forks if needed.

The license will be clearly stated in the project repository (LICENSE file) and referenced in all documentation.

Documentation Plan

To ensure the product is easy to understand, maintain, and adopt, the project will follow a comprehensive documentation strategy:

  • Technical documentation
  • A detailed README.md explaining the project purpose, architecture, and setup instructions.
  • Clear instructions for installation, configuration, and deployment.
  • Documentation of APIs, data flow, and integration points (MediaWiki API and Wayback Machine API).
  • Developer documentation
  • Inline code comments explaining core logic and critical functions.
  • Contribution guidelines (CONTRIBUTING.md) to help future developers understand coding standards and workflows.
  • User and community documentation
  • Step‑by‑step guides for editors explaining how to use the dashboard and interpret flagged citations.
  • Simple usage documentation written with Swahili Wikipedia editors in mind.

All documentation will be maintained alongside the code in the public GitHub repository, so as it stays up to date as the project evolves.

11. Will your project depend on or contribute to third-party tools or services?

The project will have minimal dependency on third‑party tools or services. The primary external service used will be the Wayback Machine API, provided by the Internet Archive.

The Wayback Machine API will be used to:

  • Check whether archived versions of dead or inaccessible reference URLs exist
  • Retrieve stable archived links that can be suggested to editors as replacements

This dependency is read‑only and non‑intrusive, as the project will only query the API for publicly available archived content. No data will be written to or modified on third‑party systems.

Apart from the Wayback Machine API, the project relies mainly on Wikimedia’s own infrastructure, including the MediaWiki API, and standard open‑source libraries. No proprietary services or paid third‑party platforms are required.

The Internet Archive is a well‑established, mission‑aligned organization that already collaborates closely with the Wikimedia ecosystem, making this dependency stable, appropriate, and low‑risk. If the Wayback Machine API is temporarily unavailable, the tool will continue to function by flagging broken links without archived suggestions.

12. Is there anything else you’d like to share about your project? (optional)


Budget

[edit]
13. Upload your budget for this proposal or indicate the link to it. (required)

https://docs.google.com/spreadsheets/d/1xvZ8mREawBBq4PncMuSXxioLyDtXFMMwdBVzh5pLiAk/edit?usp=sharing


14. and 15. What is the amount you are requesting for this proposal? Please provide the amount in your local currency. (required)

12474000 TZS

16. Convert the amount requested into USD using the Oanda converter. This is done only to help you assess the USD equivalent of the requested amount. Your request should be between 500 - 5,000 USD.

4912.08 USD

We/I have read the Application Privacy Statement, WMF Friendly Space Policy and Universal Code of Conduct.

Yes

Endorsements and Feedback

[edit]

Please add endorsements and feedback to the grant discussion page only. Endorsements added here will be removed automatically.

Community members are invited to share meaningful feedback on the proposal and include reasons why they endorse the proposal. Consider the following:

  • Stating why the proposal is important for the communities involved and why they think the strategies chosen will achieve the results that are expected.
  • Highlighting any aspects they think are particularly well developed: for instance, the strategies and activities proposed, the levels of community engagement, outreach to underrepresented groups, addressing knowledge gaps, partnerships, the overall budget and learning and evaluation section of the proposal, etc.
  • Highlighting if the proposal focuses on any interesting research, learning or innovation, etc. Also if it builds on learning from past proposals developed by the individual or organization, or other Wikimedia communities.
  • Analyzing if the proposal is going to contribute in any way to important developments around specific Wikimedia projects or Movement Strategy.
  • Analysing if the proposal is coherent in terms of the objectives, strategies, budget, and expected results (metrics).

Endorse


This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the web service of Wikimedia Foundation Funds, where the user has submitted their application. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.