Community Resources and Partnerships/India Rapid Project/Creating a modern open source spellchecker for Tamil Language
Applicant
[edit]- Main Wikimedia username. (required)
Tshrinivasan
- Organization
Kaniyam Foundation
- If you are a group or organization leader, board member, president, executive director, or staff member at any Wikimedia group, affiliate, or Wikimedia Foundation, you are required to self-identify and present all roles. (required)
N/A
- Describe all relevant roles with the name of the group or organization and description of the role. (required)
N/A
Project
[edit]- 1. Please state the title of your proposal. This will also be the Meta-Wiki page title.
Creating a modern open source spellchecker for Tamil Language
- 2. and 3. Proposed start and end dates for the proposal.
2025-08-31 - 2026-08-30
- 4. Where will this proposal be implemented? (required)
India
- 5. Are your activities part of a Wikimedia movement campaign, project, or event? If so, please select the relevant project or campaign. (required)
Not applicable
- 6. What is the change you are trying to bring? What are the main challenges or problems you are trying to solve? Describe this change or challenges, as well as main approaches to achieve it. (required)
Despite the rich literary heritage of the Tamil language, there is currently no fully functional and freely available open-source spellchecker for Tamil. This gap has significantly impacted digital communication and content creation, particularly among the Tamil diaspora. Without reliable tools to assist with spelling and grammar, many Tamil speakers—especially younger generations, including Tamil wikimedia projects contributors, struggle to write accurately, leading to frequent errors and a decline in language proficiency. Addressing this need is essential to support linguistic confidence, preserve cultural identity, and promote the use of Tamil in digital spaces.
An Aspell-based spellchecker for Tamil does exist, but its capabilities are limited—it can only handle word formations with up to two levels of combination. However, Tamil is an agglutinative language, capable of forming highly complex words through multiple levels of morphological composition. Existing desktop-based spellchecking tools are outdated, offer limited functionality, and lack modern features such as APIs and interoperability with web and mobile platforms. As a result, they fall short of meeting the needs of contemporary users and developers.
The objective is to develop a modern Tamil spellchecker with the following features:
- Fully open source (Apache 2 license)
- Web-based interface
- REST API for integration
- Extensive collection of correct Tamil words for lookup
- Very fast word lookup
- Suggestions for misspelled words
- Sandhi (phonetic combination) checking
- Replacement of Tanglish words with pure Tamil equivalents
Once developed, the proposed spellchecker can be hosted on the Wikimedia Cloud and integrated into Tamil Wikimedia projects as a user gadget, enhancing the quality of collaborative content. It can also be deployed as a standalone website for broader public access, making high-quality spellchecking available to all Tamil users. As an open-source solution, it can be freely installed on local networks, enabling adoption by schools, libraries, NGOs, and other community institutions without licensing barriers.
- 7. What are the planned activities? (required) Please provide a list of main activities. You can also add a link to the public page for your project where details about your project can be found. Alternatively, you can upload a timeline document. When the activities include partnerships, include details about your partners and planned partnerships.
Current State of Development
[edit]- A comprehensive word list has been compiled from existing tamil datasets.
- Fast lookup functionality is implemented using the Bloom filter algorithm.
- A web application with a Python/Flask backend has been developed.
- A user interface for spellchecking is functional.
- Suggestions are generated using the Levenshtein distance algorithm.
- The current version can be accessed athttps://iyal.kaniyam.ca
Next Steps with timeline
[edit]- Preserve HTML formatting during the spellcheck process.
- Log the Misspelled words and their suggestions in the backend for continuous improvement.
- Gather more Tamil word datasets.
- Enhance the wordlist through further data processing.
- Implement manual verification for less frequent words.
- Create a custom dictionary for frequently misspelled words.
- Incorporate sandhi grammar rules.
- Develop a dictionary for converting Tanglish to Tamil words.
- Create a program to generate derived from base words.
- Migrate the backend to Django for improved web features and performance.
- Containerize the application using Helm charts for Kubernetes deployment.
- Implement a notification system for new release updates.
- create a custom thesaurus and integrate
- Tamil Spellchecker Timeline.xlsx
- 8. Describe your team. Please provide their roles, Wikimedia Usernames and other details. (required) Include more details of the team, including their roles, usernames, Wikimedia group, and whether they are salaried, volunteers, consultants/contractors, etc.
Role - Name - Wikimedia username
- UI Developer - Syed Jafer - Iamsyedjaferk
- Backend Developer - Shrinivasan - Tshrinivasan
- Tester1 - Hariharan - Hariharanumapathi
- Tester2 - Lenin Gurusamy - Guruleninn
- Data Curator - Boopalan - Boopalan28012003
- Data Validator - Kalaiarasan - Kalaiarasanpandi
- Tamil Grammar / Linguist Expert - SME - Sathyaraj - Neyakkoo
- 9. Who are the target participants and from which community? How will you engage participants before and during the activities? How will you follow up with participants after the activities? (required)
The project team is assembled from active open-source communities in Tamil Nadu. Kaniyam Foundation, with its long-standing experience in providing trainings on Python programming and GNU/Linux, has nurtured a strong network of contributors who have actively worked on various Tamil language projects, including open-tamil. Team members are selected based on their demonstrated interest, technical skill sets, and prior contributions to relevant projects.
We will conduct weekly online meetings to plan and monitor progress, ensuring transparency and accountability. Additionally, monthly demo sessions will be held to showcase milestones and gather feedback.
Once the core development is complete, the team will continue to be involved in fine-tuning the tool based on user feedback, ensuring that the spellchecker remains accurate, user-friendly, and aligned with community needs.
- 9.1. If your project includes in-person activities, are there any international participants travelling to India for them? (required)
- No
- 9.1.1. List all countries of participation. (required)
- 9.2. Will the project be transferring funds to any international participants? (required)
- No
- 9.2.1. List all international participants receiving funding and their countries. (required)
N/A
- 10. Does your project involve work with children or youth? (required)
- No
- 10.1. Please provide a link to your Youth Safety Policy. (required) If the proposal indicates direct contact with children or youth, you are required to outline compliance with international and local laws for working with children and youth, and provide a youth safety policy aligned with these laws. Read more here.
N/A
- 11. How did you discuss the idea of your project with your community members and/or any relevant groups? Please describe steps taken and provide links to any on-wiki community discussion(s) about the proposal. (required) You need to inform the community and/or group, discuss the project with them, and involve them in planning this proposal. You also need to align the activities with other projects happening in the planned area of implementation to ensure collaboration within the community.
Will discuss this project in tamil wikipedia, tamil wikisource village pump and get the wikipedian’s input.
- 12. Does your proposal aim to work to bridge any of the content knowledge gaps (Knowledge Inequity)? Select one option that most apply to your work. (required)
Language
- 13. Does your proposal include any of these areas or thematic focus? Select one option that most applies to your work. (required)
Open Technology
- 14. Will your work focus on involving participants from any underrepresented communities? Select one option that most apply to your work. (required)
Linguistic / Language
- 15. In what ways do you think your proposal most contributes to the Movement Strategy 2030 recommendations. Select one that most applies. (required)
Innovate in Free Knowledge
Metrics
[edit]- 17. What do you hope to learn from your work in this project or proposal? (required)
Through this project, we hope to gain a deeper understanding of the linguistic structure and computational challenges involved in building language tools for agglutinative languages like Tamil. Specifically, I aim to learn how to design scalable, community-driven solutions that handle complex word formations, regional variations, and user input patterns effectively.
we also hope to learn from the collaborative process—how to manage and motivate a distributed open-source team, gather meaningful feedback from real users, and iteratively improve a tool based on real-world usage. By achieving the described change, We want to better understand how open digital infrastructure can empower native language users, especially in under-resourced communities, and how technology can play a key role in preserving and promoting linguistic heritage.
- 18. What are your Wikimedia project targets in numbers (metrics)? (required)
| Other Metrics | Target | Optional description |
|---|---|---|
| Number of participants | 1000 | |
| Number of editors | 50 | |
| Number of organizers | 7 |
| Wikimedia project | Number of content created or improved |
|---|---|
| Wikipedia | 1000 |
| Wikimedia Commons | |
| Wikidata | |
| Wiktionary | 1000 |
| Wikisource | 10000 |
| Wikimedia Incubator | |
| Translatewiki | |
| MediaWiki | |
| Wikiquote | |
| Wikivoyage | |
| Wikibooks | |
| Wikiversity | |
| Wikinews | |
| Wikispecies | |
| Wikifunctions or Abstract Wikipedia |
- Optional description for content contributions.
N/A
- 19. Do you have any other project targets in numbers (metrics)? (optional)
No
| Main Open Metrics | Description | Target |
|---|---|---|
| N/A | N/A | N/A |
| N/A | N/A | N/A |
| N/A | N/A | N/A |
| N/A | N/A | N/A |
| N/A | N/A | N/A |
- 20. What tools would you use to measure each metrics? Please refer to the guide for a list of tools. You can also write that you are not sure and need support. (required)
collected error free Tamil words - 1,00,000
custom dictionary for frequently misspelled words - 1000
dictionary for converting Tanglish to Tamil words - 1000
custom thesaurus - 500
Budget
[edit]- 21. Please upload your budget for this proposal or indicate the link to it. (required)
- 22. What is the amount you are requesting for this proposal? Please provide the amount in Indian Rupees. (required)
400000 INR
- 22.1. Convert the amount requested into USD using the Oanda converter. This is done to help you assess the USD equivalent of the requested amount. Your request should be between 500 - 5,000 USD. (required)
4673.16 USD
- By submitting this proposal request you agree with the Institutional Partner Privacy Policy, Application Privacy Statements, WMF Friendly Space Policy and Universal Code of Conduct.
Yes
This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the web service of Wikimedia Foundation Funds, where the user has submitted their application. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.