Talk:CIS-A2K/Events/Bangalore/Digitization workshop 18 August 2013

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Does "creating text based documents" mean editable text - ie OCR-ed? Shyamal (talk) 12:24, 18 August 2013 (UTC)

Dear Shyamal, this workshop was for demonstrating the participants about create a home made set up to scan the books, edit the scanned images and make an eBook. OCR is not still stable for Indic languages and would need another workshop for presentation. Here the "creating text based documents" is basically the manual editing on WikiSource. Please feel free to edit and correct if any mistakes found. --Subhashish Panigrahi (talk) 10:34, 19 August 2013 (UTC)
That sounds too basic. Almost any office assistant these days is expected to know how to use a scanner and create at least a PDF. It would be good if you can post your proposed workshops. There are really excellent (and local if I may add) resources who can be proposed by the community. Shyamal (talk) 15:00, 19 August 2013 (UTC)
At this point of team any kind of basic digitisation would be good. However, it would be great if we can collaboratively, develop a better tutorial with some of the resources you know, Shyamal. AshLin (talk) 15:07, 19 August 2013 (UTC)
I agree that collaboration will produce better material, have commented on the email discussion group as well on this matter. Since this is being planned on an India wide scale it would be good to set up a page for this activity and then I can suggest my workshop expectations and coverage there. Shyamal (talk) 15:36, 19 August 2013 (UTC)
For reference: the "Indic print material digitization workshop query" thread on wikimediaindia-l is useful as a followup here. Sharihareswara (WMF) (talk) 17:12, 19 August 2013 (UTC)
The workshop was meant as a DIY digitization without having to invest in a scanner but to use a simple digital camera for effective digitization of books and documents. The following were covered during the Workshop by Viswaprabha, who mainly led this workshop, along with Subhashish (who was trained by Viswaprabha beforehand) and Shiju Alex: a) Best practices in capturing images using a camera and tripod through demonstration; b) how to hold books and the need to treat old books with respect; c) discussion on image formats and some basic comparison (i.e. djvu, PDF, JPEG, TIFF, BMP, GIF); d) Introduction and practical use of SM Tether (using Nikon dSLR) in capturing images; e) Practical demonstration of using Scan Tailor (a Free Software) in post-processing of scanned pages. Splitting, Deskewing, Rearranging borders, De-speckling of scanned pages; f) some basic discussion on Copyright and introduction to Wiki Source; g) importance of online archival resources and when to do or not to redo scanning of books that already available in scanned format; h) OCR and Indian languages. Basically the workshop focused on how an ordinary Wikimedian without access to high tech infrastructure can effectively undertake digitization of old books in a collaborative manner and use Wiki Source to openly share knowledge. I think it was a successful workshop, especially after I had personally witnessed how much money various institutions and govt. are spending on this--Visdaviva (talk) 18:01, 19 August 2013 (UTC)