WikiConference India 2016/Submissions/Introduction to OCR4WikiSource

Hashtag: #WCI2016

WCI Submissions

Please make sure to log in to Wikimedia Meta before creating your Proposal. If your submission is selected, we will contact your Wikipedia account so make sure your Username is correct.

Title of the submission

Introduction to OCR4WikiSource

Your Username (For the submission author)

Tshrinivasan (Link)

Type of presentation

Talk

Abstract (in about 300 words)

Recently. Tamil Virtual Academy in TamilNadu released around 2000 nationalized ebooks in creative commons license in PDF format. To add them all in Tamil WikiSource, we need in plaintext format. Shrinivasan wrote a script to use google's OCR and paste the text in relevant wiki source page. This script is being used by many indian wikisource communities. It is a great collaborative development project as many indian wiki source communities participated on development, testing, reporting issues and enhancements.

So far, around 4 lakh pages are uploaded in Tamil wiki Source using this script. This is being using by Bengali, Telugu, Sanskrit, Odiya language wiki sources

He will explain and demonstrate this tool.

Links : Source URL : https://github.com/tshrinivasan/OCR4wikisource