Jump to content

WikiConference India 2016/Submissions/Introduction to OCR4WikiSource

From Meta, a Wikimedia project coordination wiki
Hashtag: #WCI2016
Main pageHackathonProgramsEdit-a-thonPress coverageFAQSitemap
Please make sure to log in to Wikimedia Meta before creating your Proposal. If your submission is selected, we will contact your Wikipedia account so make sure your Username is correct.
Title of the submission

Introduction to OCR4WikiSource

Your Username (For the submission author)

Tshrinivasan (Link)

Type of presentation


Abstract (in about 300 words)

Recently. Tamil Virtual Academy in TamilNadu released around 2000 nationalized ebooks in creative commons license in PDF format. To add them all in Tamil WikiSource, we need in plaintext format. Shrinivasan wrote a script to use google's OCR and paste the text in relevant wiki source page. This script is being used by many indian wikisource communities. It is a great collaborative development project as many indian wiki source communities participated on development, testing, reporting issues and enhancements.

So far, around 4 lakh pages are uploaded in Tamil wiki Source using this script. This is being using by Bengali, Telugu, Sanskrit, Odiya language wiki sources

He will explain and demonstrate this tool.

Links : Source URL : https://github.com/tshrinivasan/OCR4wikisource



Interested attendees and comments[edit]

  • I hv used this software in Bengali Wikisource and OCRed more than 300 books. Want to know more.

Sumita Roy Dutta (talk) 19:36, 20 July 2016 (UTC)[reply]