Grants:IEG/Easy Micro Contributions for Wiki Source
What is the problem you're trying to solve?
There are many old books scanned for preserving. We can type them all and publish in wiki source. But as there are many, many scanned books, it may take years to type with current systems. We dont have even good OCR for most of the languages.
What is your solution?
- Build an application to split the scanned pages into small chunks with single words. i.e, one small image per word.
- Store all the images with proper name/numbering.
- Create a web application and show the words one by one, for users to type easily. Users should type one word at a time.
- Make the web application mobile friendly, so that users can type from mobile too.
- Users should be scored based on their contributions.
- Save the text as the users type them.
- Show all the text as the users types in a seperate page.
- Once all the single images are typed, publish the entire text as a page, so that users can copy and publish in wiki source.
- Tom improve quality, We can show the same words to two users. Get input and compare. Reshare the images until, we get most accurate typing.
The goal is to encourage more users to contribute and to get more books in wiki source. When the contribution is very easy and simple, more users contribute.
- Build an application to split scanned images into small images per word
- Build web app to show the images and get them typed
- Publish the pages after all the images in a page are typed completely.
Project Manager, Web app developer , Tester
Cost : 10 hrs /week , 15 USD / hr, 3 resources, 6 month (26 week)
Total : 11700 USD
The web application will be developed in Python and Django web framework. This is stand alone application and does not need integration with MediaWiki's core.
We need a dedicated server to run the web application. Initially, we can host the application in our VPS, for development and to get initial contributions.
Then, we can request for Wikimedia Labs for scalable hosting and continious support.
The web application will be developed for desktop users to contribute easily. REST API support and mobile theme will be added in the web application. Mobile users can access the site with mobile browsers and contribute. If required, independant mobile applications can be developed in future.
We will ask the wiki source contributors to give inputs on the project, user interface, mobile user interface design, giving scores,
We ask them to test the application every week, so that we can correct the issues in the early stage itself.
Inintially the system is developed for Tamil Wiki Source. After the project completion, it can be used for any language. Even it can be used to generate CAPTCHA systems with non english charecters.
It can be used to integrate with existing OCR systems to train OCR and to compare the OCRed charecters.
Measures of success
Contributions to wiki source increase by 30-40% as the new system makes the contribution very simpler. 3-4 completed books in next 6 months after the project is completed and released.
Tshrinivasan - I am involving in publishing ebooks with creative commons license in Tamil. Released 110 ebooks so far at http://FreeTamilEbooks.com
I am python programmer for years and created a mediawiki uploder in python for bulk uploading images for commons. https://code.google.com/p/mediawiki-uploader/
Notified in Tamil Wiki Village Pump.
Links for the discussions.
- W:ta:விக்கிப்பீடியா:ஆலமரத்தடி_(தொழினுட்பம்)#Grants:IEG.2FEasy_type_tools_for_wiki_source (Village pump - Technology)
- W:ta:விக்கிப்பீடியா:ஆலமரத்தடி#Grants:IEG.2FEasy_type_tools_for_wiki_source(Village pump - General)
Do you think this project should be selected for an Individual Engagement Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).
- Probably the first of its kind in mobile platform .
- It is an important tool.--சஞ்சீவி சிவகுமார் (talk) 10:38, 30 September 2014 (UTC)
- Very innovative solution--Sodabottle (talk) 09:50, 4 October 2014 (UTC)
- This is good idea. This idea can be used to train the Tesseract OCR as well. balavignesh (talk) 18:45, 4 October 2014 (UTC)
- To preserve, ancient Tamil literature, this unique tool has to be created---- த♥ உழவன் +உரை.. 07:54, 5 October 2014 (UTC)
- Its is a very important initiative to preserve the books,manuscripts in tamil language. Still there are many books which canbe found Google NGRAMS which are based on year way back to 1890s those books can be converted using this tool. This will help to preserve the language heritage and its culture. Seesiva (talk) 06:26, 24 October 2014 (UTC)
- This would help in growing the language and will help the original manuscript in electronic form in large number of people. This will foster further reasearch and help to retain the history and tradition of a age old language. Seesiva (talk) 06:27, 24 October 2014 (UTC)
- Support A good idea to tailor Captcha to the Indian language context. Much needed to reach to the millions who are getting onto the internet via the mobile and bypassing the desktop/laptop in India. I would like to personally track the progress of this project if it comes through. Best wishes! --Visdaviva (talk) 12:49, 27 October 2014 (UTC)