Jump to content

Grants:Programs/Wikimedia Community Fund/Rapid Fund/WikiTermBase 2.0 (ID: 23553206)

From Meta, a Wikimedia project coordination wiki
statusFunded
WikiTermBase 2.0
request or grant IDG-RF-2508-20079
proposed start date2025-10-31
proposed end date2026-03-30
requested budget (local currency)4900 USD
requested budget (USD)4900 USD
amount funded (USD)4900
amount funded (local currency)4900 USD
grant typeIndividual
funding regionMENA
decision fiscal year2025-26
applicantMichel Bakni
organization (if applicable)N/A
Review Final Report

Applicant details

[edit]
Main Wikimedia username. (required)

Michel Bakni

Organization

N/A

If you are a group or organization leader, board member, president, executive director, or staff member at any Wikimedia group, affiliate, or Wikimedia Foundation, you are required to self-identify and present all roles. (required)

N/A

Describe all relevant roles with the name of the group or organization and description of the role. (required)

Main proposal

[edit]
1. State the title of your proposal. This will also be the Meta-Wiki page title.

WikiTermBase 2.0

2. and 3. Proposed start and end dates for the proposal.

2025-10-31 - 2026-03-30

4. What is your tech project about, and how do you plan to build the product?

Include the following points in your answer:

  • Project goal and problem you solve
  • Product strategy or project roadmap
  • Technical approach (infrastructure, tech stack, key tools and services)
  • Integrations or dependencies (if any)

Project Goal

Our project is building upon WikiTermBase, a tool that we launched in March 2025 which already acquired +200 users organically on Arabic Wikipedia (~5% of all active users). In this second phase of the tool development, we have two objectives:

  1. Morpho-syntactic parser: Build a parser to add rich linguistic relationships and lexicographic data to lemmas in our term base (addressing a leading demand we have from user feedback and in our project charter).
  2. Default Arabic Wikipedia gadget: Build a robust software infrastructure and conduct extensive testing that can qualify WikiTermBase to become a default gadget of Arabic Wikipedia, thus extending the benefits of the tool to the majority of the community.

Problem

●      Lack of consistency in Wikipedia’s vocabulary: many words have 2 - 5 alternatives that are inconsistently used in different articles, almost completely up to the discretion of the individual Wikipedians.

●      Time-consuming vocabulary translation: 5,000 words of linguistic discussions are written each month on average in Arabic Wikipedia, to define only a handful of words.

●      Difficulty of making vocabulary decisions: there is no standard way to conclude the lengthy discussions on vocabulary, ‘’nor to enforce whatever term is agreed upon’’. Months of discussion are spent to reach an agreement that cannot be implemented.

Solution

●      Making WikiTermBase a default gadget: WikiTermBase’s prototype is already ranked among the top 50% of Arabic Wikipedia’s most used tools by active users, based on purely organic community adoption. We're aiming to make WikiTermBase as widely adopted as possible, which means that our ultimate objective is becoming a default Arabic Wikipedia gadget. Reaching this goal requires meticulous testing and technical troubleshooting, which would require extensive software development resources.

●      Expand dictionary data: Our database currently includes 50 dictionaries with nearly 1 million terms, but we are still aware that various domain fields (e.g. AI) are largely missing even from this fairly big collection. We aim to digitize more dictionaries or find more ways to collect data.

●      Morphological analysis and parsing: A morphological parser has been a key element in our planned architecture from the start. Arabic language is very morphologically rich, and standardizing translation terminology involves morphological parsing to determine related words and context. We are working with Arabic linguists to determine the best parsers. WikiTermBase already has 300K Arabic terms from 50 dictionaries, which means it has many duplicates and similar words. So far, we have only done simple matching of words based on spelling, but as the tool is evolving users have been needing to access related words and see more complex linguistic relationships.

Project Roadmap

As laid out in our project charter, we’re just kicking off the second phase of WikiTermBase which is projected to continue through June 2026, with the following milestones:

●      Technical preparation for default gadget (Oct – Nov 2025): Backend and frontend readiness to meet Wikimedia standards for a default tool.

●      Hand-written rules for morphosyntactic parser (Nov 2025 – Jan 2026): Draft and implement grammar and morphology rules for Arabic parsing.

●      Dictionary expansion (Dec 2025 – Mar 2026): Digitize and add missing domain dictionaries (e.g., AI, medicine) to broaden coverage.

●      Linguist feedback and iteration (Jan – Feb 2026): Arabic linguists review parser rules, provide corrections, and guide refinements.

●      User testing (March – April 2026): Community members test parser and dictionary expansions in real editing workflows.

●      Finalizing the parser and dictionary integration (April – May 2026): Consolidate testing feedback, fix edge cases, and stabilize system performance.

●      Community discussion on launching as default (May 2026): Hold open discussion on Arabic Wikipedia about enabling WikiTermBase as a default gadget.

●      Launch Version 2 (June 2026): Release WikiTermBase as a default gadget with integrated parser and expanded data.

●      Post-launch survey and reporting (June 2026): Collect community feedback, evaluate adoption, and publish impact reports.


Technical Approach

●      NLP-based Python parser: We’re collaborating with professional linguists involved in the Damascus’ Academy of Arabic Language and the UAE Historical Dictionary to make hand-drawn rules of morphosyntactic parsing through Python scripts. To avoid reinventing the wheel, our software developer is first rewriting the open source Al-Khalili Parser from JavaScript to Python in order to make it secure and safe to implement on Toolforge.

●      Python library: We’re planning on creating our own Python library for the WikiTermBase project complete with lemma structures, morphosyntactic parsing, spelling mapping, and in the future semantic matching to resolve word sense disambiguation between dictionaries.

●      Toolforge hosting: Our tool’s open source code and full data (stored on a MariaDB instance) have been fully hosted on Toolforge since its launch, and we have been able to comply with various community requests to ensure code transparency, documentation, and robust security and compliance with the Wikimedia infrastructure.

5. What is the expected impact of your project, and how will you measure success?

Include the following points in your answer:

  • Milestones and progress tracking
  • Project impact and success metrics

Success metrics

●      Adoption as a default tool: In order to maximize reach in the Arabic community, our top goal is to be adopted as a default tool for all users.

●      +150 active users / month: WikiTermBase has been standing at an average of 50 active users a month, we seek to triple this number by adding new features, engaging the community more, and through our goal above (making it a default tool).

●      20% time saving in translation: Users self-estimated an average 15% cutback in the time needed to translate articles from foreign languages, we’re hoping to improve this even further with additional functionalities.

●      80% Positive feedback on utility: So far, 95% of community respondents considered the tool “useful” or “extremely useful” (rated on five qualitative dimensions), based on participation by 20% of the tool’s users at the time. As we grow our user base, we’ll launch more surveys and seek to maintain a strong attitude of usability.

●      +90% morphosyntactic parsing: We have worked with linguists to thoroughly exhaustively map all derivative edge cases of Arabic language verbs. Our technical goal is to hand-code rules for at least 90% of these verbs into our morphosyntactic parser, making it as comprehensive as possible.

6. Who is your target audience, and how have you confirmed there is demand for this project? How did you engage with the Wikimedia community?

Include the following points in your answer:

  • Project demand and target audience description
  • Links to interaction(s) with Wikimedia community
  • Evidence from community consultation such as the [Community Wishlist]

Our target audience is the Arabic Wikipedia community, which we have engaged in the following ways:

●      Exploratory research: Before even starting to develop WikiTermBase, we already engaged the community in preliminary user research to understand their needs and define our target features.

●      Prototype success: We launched a prototype in March 2025 which has been adopted by +200 Arabic Wikipedia users, nearly 5% of the active user base.

●      +90% positive feedback: We surveyed 20% of the tool's users as of April 2025, and received 90 – 95% positive ratings for usability, user experience, and impact on translation quality and time.

●      Organic use: The tool has been organically cited by the community in discussion to agree on technical article translation (example), and has become the officially recommended tool for translation in a technical articles competition.

●      Village Pump: At every stage of development and upon prototype release, we extensively engaged the community on Arabic Wikipedia’s Village Pump, leading to engagement by dozens of users and extensive feedback loops.

●      Community wishlist: We surveyed the tool’s users in April 2025 to gather feedback and understand the next stages. While we have already been able to fix bugs and small features, this grant seeks to move us along the more challenging development requirements.

7. How will your team predict and manage potential user security and privacy risks, and what risks do you currently see?

Include the following points in your answer:

  • The level of in-house or consulted security and privacy expertise you will have available to you during delivery of this project
  • How your development, testing, and deployment processes mitigate the introduction of unnecessary security or privacy risks

Our tool is exclusively hosted on Toolfroge'and is accessible either through it or through Arabic Wikipedia’s gadgets.'We are not collecting any personal data of users throughout the tool’s usage or development, besides completely voluntary surveys or user feedback that we invite users to provide. Additionally, our team includes a software engineer with a decade of professional experience who’s dedicated to addressing any related concerns that come to our attention.

8. Who is on your team, and what is your experience?

Include the following points in your answer:

  • Your experience as a developer, relevant past projects
  • Wikimedia SUL (developer), Gerrit, Github, Gitlab or other relevant public account handles
  • Other team members, their roles and expertise

●      Wael Tellat: Full-stack software engineer at Servier, with extensive history of open source contributions, including developing many Arabic NLP tools available on Github. He successfully created the entire first prototype of WikiTermBase, and will be in charge of all the technical aspects of development for this grant.

●      Michel Bakni: Professor at ESTIA Institute of Technology in France, researcher, published author, and Arabic Wikipedia admin. Michel has authored multiple Arabic language dictionaries on topics including Python, and has extensive connections with Arabic lexicographers, language academics and institutions that WikiTermBase is closely working with to develop a solid Arabic morphosyntactic parser.

●      Abbad Diraneyya: AI Conversational Designer at Uber and a Wikimedian since 2009, with extensive experience in managing grant projects for Wikimedia Levant, Wikimedia NYC, Project Al-Marefa and the first prototype of WikiTermBase. He will be coordinating the project’s tasks, roadmap execution, reporting, and meeting timelines and metrics.

9. How will the project be maintained long-term?

Include the long-term maintenance plan with maintainer(s) in your answer. If you expect the long-term maintenance to incur expenses, please list those and the plan for long-term expense coverage.

The project requires minimum technical maintenance, because hosting is already provided on the Wikimedia Foundation’s native technical infrastructure without integrating any third party tools. The project’s team is fully composed of Wikimedians with up to 15 years of history in the movement, who will remain dedicated to maintaining it in the future. Additionally, the extensive project’s transparency, documentation and community engagement ensure that other movement developers would be able to support in the future. 

10. Under what license will your code be released, and how will you ensure the product is well documented?

Include the following points in your answer:

  • Code license and compatibility with Wikimedia projects
  • Documentation plan

Our code will be released under the GNU General Public License v3 (GPL-3.0), which is fully compatible with Wikimedia projects and aligns with the movement’s long-standing commitment to free and open-source software. This ensures that WikiTermBase’s code can be freely reused, adapted, and improved by other Wikimedia developers and community members in the future.

To guarantee strong documentation, we will follow Wikimedia’s technical documentation standards. Specifically, we will:

●      Maintain a public Github repository (already exists) that includes clear README files, installation/setup instructions, and contribution guidelines.

●      Invite the community to report bugs and technical issues (already being done through our official Wikipedia-based documentation + Github issues).

●      Create an open source Python library with all the modules we are using for our tool, including the morphosyntactic parser.

●      Engage the Arabic Wikipedia community on the Village Pump to ensure transparency and awareness, as past engagements have already helped extensively develop security measures and integrate the tool as a Wikipedia gadget by interface admins.

●      Publish a usage guide and feature overview on Arabic Wikipedia (Village Pump, project page, and Help namespace), explaining how the gadget works for both editors and admins.

●      Maintain developer documentation for future contributors, covering system architecture, database schema, and parser rule-writing methodology.

●      Document updates and release notes on-wiki and in the repository to ensure transparency and traceability.

By combining open licensing, transparent hosting on Wikimedia infrastructure, and robust technical and user-facing documentation, we will make sure WikiTermBase remains a sustainable, community-owned tool.

11. Will your project depend on or contribute to third-party tools or services?

N/A

12. Is there anything else you’d like to share about your project? (optional)

This is the second planned phase of a project that was already launched in 2025. Kindly review our full project roadmap and report on Meta for our complete vision of WikiTermBase.

Budget

[edit]
13. Upload your budget for this proposal or indicate the link to it. (required)

https://docs.google.com/spreadsheets/d/1SB9mQrHjQIDmdHYNZh5qbd50M7Dgp1SA7XJzytjL-Ko/edit?usp=sharing


14. and 15. What is the amount you are requesting for this proposal? Please provide the amount in your local currency. (required)

4900 USD

16. Convert the amount requested into USD using the Oanda converter. This is done only to help you assess the USD equivalent of the requested amount. Your request should be between 500 - 5,000 USD.

4900 USD

We/I have read the Application Privacy Statement, WMF Friendly Space Policy and Universal Code of Conduct.

Yes

Endorsements and Feedback

[edit]

Please add endorsements and feedback to the grant discussion page only. Endorsements added here will be removed automatically.

Community members are invited to share meaningful feedback on the proposal and include reasons why they endorse the proposal. Consider the following:

  • Stating why the proposal is important for the communities involved and why they think the strategies chosen will achieve the results that are expected.
  • Highlighting any aspects they think are particularly well developed: for instance, the strategies and activities proposed, the levels of community engagement, outreach to underrepresented groups, addressing knowledge gaps, partnerships, the overall budget and learning and evaluation section of the proposal, etc.
  • Highlighting if the proposal focuses on any interesting research, learning or innovation, etc. Also if it builds on learning from past proposals developed by the individual or organization, or other Wikimedia communities.
  • Analyzing if the proposal is going to contribute in any way to important developments around specific Wikimedia projects or Movement Strategy.
  • Analysing if the proposal is coherent in terms of the objectives, strategies, budget, and expected results (metrics).

Endorse


This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the web service of Wikimedia Foundation Funds, where the user has submitted their application. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.