Please make sure to log in to Wikimedia Meta before creating your Proposal. If your submission is selected, we will contact your Wikipedia account so make sure your Username is correct.
- Title of the submission
Introduction to OCR4WikiSource
- Your Username (For the submission author)
- Type of presentation
- Abstract (in about 300 words)
Recently. Tamil Virtual Academy in TamilNadu released around 2000 nationalized ebooks in creative commons license in PDF format. To add them all in Tamil WikiSource, we need in plaintext format. Shrinivasan wrote a script to use google's OCR and paste the text in relevant wiki source page. This script is being used by many indian wikisource communities.
It is a great collaborative development project as many indian wiki source communities participated on development, testing, reporting issues and enhancements.
So far, around 4 lakh pages are uploaded in Tamil wiki Source using this script. This is being using by Bengali, Telugu, Sanskrit, Odiya language wiki sources
He will explain and demonstrate this tool.
Source URL : https://github.com/tshrinivasan/OCR4wikisource
Interested attendees and comments
- I hv used this software in Bengali Wikisource and OCRed more than 300 books. Want to know more.
Sumita Roy Dutta (talk) 19:36, 20 July 2016 (UTC)