Grants:Programs/Wikimedia Community Fund/Rapid Fund/Updating ArabicCategoryMaker Bot and supporting API (ID: 23552349)/Final Report
Application type: Tech project
Parts 1-3: Project and impact
1. What was built or achieved during the project, and how did it align with your original goals, milestones and technical plan? (required)
During the grant period, a comprehensive technical overhaul of the Arabic Categories Bot and its supporting infrastructure was successfully executed, fully aligning with our original goals and technical plan. I developed a complete system utilizing a multi-layered modular architecture (core library, web service, and execution bot) to ensure high performance and long-term sustainability.
The technical deliverables:
- ArWikiCats Python Library: The core engine responsible for category localization.
- The legacy codebase was entirely rewritten from scratch. This restructuring aimed to improve code organization, decouple logic, significantly increase the accuracy of Arabic naming conventions, and optimize overall performance.
- Memory consumption was drastically reduced to under 80 MB, compared to the legacy code which consumed over 2048 MB during runtime.
- The legacy codebase was entirely rewritten from scratch. This restructuring aimed to improve code organization, decouple logic, significantly increase the accuracy of Arabic naming conventions, and optimize overall performance.
- Comprehensive Test Suite for the Core Library:
- Developed a massive suite of over 65,000 tests to guarantee code quality, ensure system stability, and mitigate future bugs.
- Test coverage spans core functionalities, temporal patterns, countries, nationalities, and complex categorization patterns (e.g., nationality, sport, occupation). It also covers system performance, dictionary matching, sports teams, tournaments, film/TV, and advanced job parsers.
- The test suite is robustly structured across four tiers: Unit Tests, Integration Tests, End-to-End (E2E) Tests, and Big Data Tests.
- Developed a massive suite of over 65,000 tests to guarantee code quality, ensure system stability, and mitigate future bugs.
- Test coverage spans core functionalities, temporal patterns, countries, nationalities, and complex categorization patterns (e.g., nationality, sport, occupation). It also covers system performance, dictionary matching, sports teams, tournaments, film/TV, and advanced job parsers.
- Application Programming Interface (API): Engineered a dedicated API to expose the localization service beyond the bot itself, enabling seamless integration with Wikipedia gadgets, such as the Category Creator tool.
- Category Creation Bot: An operational bot that actively leverages the core library and the web service to automatically generate categories.
Additional Achievements & Impact:
- Published all source code on GitHub, fulfilling the goal of making the project fully open-source and community-owned.
- The bot successfully created over 15,000 new categories.
- Beyond structural upgrades, the updated library was deployed to address legacy issues on Arabic Wikipedia. We successfully corrected and refined the names of over 15,000 existing categories to ensure higher accuracy.
- Developed and provided a streamlined bug-reporting tool for Arabic naming errors, significantly facilitating community feedback and collaboration
2. Share links that demonstrate your project's impact, usage, and technical outcomes. (required)
Required links:
- Project page on relevant Wikimedia spaces (e.g. Phabricator, Wikimedia projects, Toolforge)
- Code repository (e.g. Gerrit, GitHub or GitLab)
- Documentation or user guides
- Dashboards, metrics tools, or analytics used to track usage or contributions
Optional links you may include:
- Diff or mailing list announcements
- Community feedback
- Demos or product presentations
- Survey results or user testing feedback
- Examples of integrations or usage within Wikimedia projects
Below are the key links categorized to demonstrate the project's technical infrastructure, quantitative impact, and community integration:
Code & Technical Infrastructure:
- GitHub Repositories: github.com/ArWikiCats (Includes the complete source code and the technical development changelog from November 2025 to January 2026).
- Python Core Library (PyPI): ArWikiCats published on the official Python Package Index.
- Web Service: ArWikiCats API hosted on Wikimedia Toolforge.
Quantitative Impact & Metrics:
- New Categories Created (+15,000): Query #103646 tracking the newly generated categories.
- Categories Corrected (+15,000): Query #103365 showing the correction of existing categories, which positively affected over 258,000 pages. (See also the community's Category Move Requests Archive).
Community Integration & On-Wiki Tools:
- Main Project Page: Arabic Category Creation System on Arabic Wikipedia.
- Bug Reporting Tool: Gadget-ArWikiCatsReporter provided to the community to facilitate quick and easy error reporting.
3. What are the key lessons you learned during this project, both technical and non-technical? (required)
Technically, the experience showed that small issues in legacy tools can accumulate to significantly impact maintainability and sustainability. The complex coupling between the components of the legacy system led to difficulties in developing it, which necessitated rewriting approximately 90% of the code. This experience confirmed the importance of designing systems from the beginning according to best practices, as the new system adopted a modular architecture that separates components, and separates business logic from execution, which improves maintainability and facilitates future scalability.
Non-technically, it became apparent that relying on a non-open sourced projects represents a risk to the project's sustainability. Also, enhancing transparency with the community, and providing simple means for non-technical participation (such as bug reporting tools), contributes directly to improving the quality of the results and increases their accuracy.
4. How did the Wikimedia community or your target audience engage with your project during its development or release? (required)
'The technical interaction from the Arabic community was limited, but it was fruitful regarding category naming, and most of this interaction was concentrated in category move requests.
After development, the bug reporting tool enabled editors to easily contribute to improving the quality of categories, enhancing practical participation even from non-technical users.
Here are some links that include community discussions about the bot and its work:
'
5. What risks or challenges did you encounter (related to delivery, safety, or security), and how did you address them? (required)
The most prominent challenge was the complexity of the legacy code and the difficulty of refactoring it without affecting its continuous operation on the encyclopedia. Solution: I built a separate codebase that underwent development and experimental testing without affecting the operation of the legacy code during the development phase and before deploying the new code.
Another challenge was how to maintain the naming conventions followed in Wikipedia and control the quality of incorrect names. Solution: I provided over 65,000 tests to evaluate the quality of the Arabic names according to various category patterns.
6. Who will maintain the project going forward, and what is your plan for long-term maintenance? (required)
The current developer will continue developing and maintaining the project, which is fully open-source, in collaboration with established developers on Wikipedia. Furthermore, the project's reliance on Wikimedia infrastructure (Toolforge) and open-source tools will ensure its long-term sustainability.
(questions 7-9 are skipped)
Part 4: Financial reporting
[edit]10. Please state the total amount spent in your local currency. (required)
789608
11. Please state the total amount spent in US dollars. (required)
3300
12. Report the funds spent in the currency of your fund. (required)
Provide the link to the financial report https://docs.google.com/spreadsheets/d/13LCPpv7kjF_eCAPN7mrKw06EsRaD9QVodg4DLX7ZWgw
12.2. If you have not already done so in your financial spending report, please provide information on changes in the budget in relation to your original proposal. (optional)
13. Do you have any unspent funds from the Fund?
No
13.1. Please list the amount and currency you did not use and explain why.
N/A
13.2. What are you planning to do with the underspent funds?
N/A
13.3. Please provide details of hope to spend these funds.
N/A
14.1. Are you in compliance with the terms outlined in the fund agreement?
Yes
14.2. Are you in compliance with all applicable laws and regulations as outlined in the grant agreement?
Yes
14.3. Are you in compliance with provisions of the United States Internal Revenue Code (“Code”), and with relevant tax laws and regulations restricting the use of the Funds as outlined in the grant agreement? In summary, this is to confirm that the funds were used in alignment with the WMF mission and for charitable/nonprofit/educational purposes.
Yes
15. If you have additional recommendations or reflections that don’t fit into the above sections, please write them here. (optional)