Learning patterns/Digitising books with minimal apparatus
This learning pattern was developed as a solution for the challenges faced while digitising books under WikiProject Pothi.
What problem does this solve?
Digitising books is an important part of Wikisource. However, scanning usually needs expensive apparatus that not every individual or community has access to. The process itself is also a time-taking process. Most books that are scanned are text-only, and for such books, an unnecessarily high scan quality is not of much use. In such cases, owing to the inefficient scanning process, rare books might not be digitised even when they are easily available. This pattern aims to solve this frequent problem by using less expensive, easily available devices such as a mobile phone and a table lamp.
You probably have something worthy of digitisation
Remain conscious of the possibility of a digitisation-worthy book or document in your surroundings. Is there an old book in your home? In the attic? As Wikisourcers, we should always be on the lookout for such stuff.
You do not need expensive equipment to contribute
Sure, expensive scanners & high-end DSLRs produce super-high quality scans, but is it worth it to ignore a book because of the lack of those? Or is it better to get at least a readable scan and preserve the text for posterity? That being said, quality can be useful in some scenarios, and trying to further up the quality should always be one of your priorities.
The text is more important than the medium
A 300 DPI scan is nice, but what if you have access to a very rare out-of-copyright book that needs preservation but have no scanner? A readable scan would be enough to save the text. Once you ensure the text's protection, further attempts to improve the quality can be made.
What is the solution?
This pattern uses a simple setup consisting of a typical mobile phone, a scanning app and a lighting setup for scanning books.
Things you'll need
- A mobile device with a decent camera (A typical Android or iOS phone works great. A tablet will be fine as well.)
- A lighting setup (A desk lamp will do.)
- The book you've to scan
- A non-reflective black surface (optional but recommended)
Things to consider
Ask yourself these questions. Only when the situation satisfies all of these points should you try this method.
- Regarding the book to be digitised : Does the book consist of rich illustrations, photographs, lithographs or woodblock prints? (basically any kind of non-textual information) If it does contain any of those or is printed in color, then you'll probably want to use a high quality scanner, so that the scans produced are at least 300 DPI in quality. This is so that the photographs and illustrations are captured with their fine lines. Most old books consist of not many images or media, and only black and white text- this pattern is recommended for those books. In such books, the text is important- too high a scan quality is simply unwanted.
- Regarding the device camera : Does the device have a decent camera? An 8 megapixel device will work fine, and anything above that can handle this situation with ease. Keep in mind that megapixels are not everything- if the image quality is readable and clear enough, you should be fine.
- Regarding the device OS : A typical Android or iOS mobile phone is recommended. Both of these Operating systems have multiple free applications that allow scanning pages.
About the setup
The image to the right shows a typical setup using this method. The book to be scanned is placed below the small table, on which the mobile device (here an iPad) is placed. The device is lifted to a height such that the viewfinder shows the book edge-to-edge. In the front is a small desk lamp to illuminate the pages. Position it such that both the pages are equally lit. If you have a non-reflective black surface, spread it underneath the book and then place the book on it. This is so that there is a greater contrast between the edge of the book and the surface beneath it, which makes it easier for whatever app you are using to detect the corners.
Ensure that you have a scanning app for your device. The free CamScanner app is great on both iOS and Android. You also have Scanbot and Scanner Pro- try whichever works better for you. All of these will detect the corners of the book and crop(& in some apps, straighten) the page- you need not worry about that. Switching on the Black and White scan is a good way to ensure lesser file sizes. Turn on the contrast all the way up and select a suitable value of the brightness to obtain an absolute black and white scan. You can turn on the 'Batch Scan' modes to automate the entire process, and all one needs to do is turn the pages.
With the setup shown beside, the average speed was greater than that of an 'actual' scanner. The preliminary scan quality obtained with an iPad Air 2 and the Scanner Pro app can be observed here- PDF.
- Ideally, the device should be positioned such that the plane of the device and the page to be scanned are parallel. This ensures minimum distortion.
- In case the planes are not parallel, the rectangular shape of the page will be distorted. This will then need plane/distortion correction. Note that the more distorted the plane is, the lower the quality towards an end will be. Hence, checking a few scans in the beginning is advisable. That being said, minor changes can be easily corrected.
- How does one use distortion correction? Apps such as CamScanner and Scanner Pro include features that can handle this.
- Using mobile apps invariably gives better cropping results while scanning books. This is because these services depend on AI-trained cropping algorithms that receive heavy data inflow which is used to feed the AI and hence their development is more rapid than desktop equivalents.
- Older books frequently have chopped edges and uneven sides. These can confuse cropping algorithms, leading them to crop straight through text. Remain aware of these issues and correct them manually.
- The lighting setup should be overhead to avoid shadows. Older type printed with metal blocks can have deep imprints that cast shadows (and in some cases, cause aliasing) when lighted perpendicularly. Avoid this situation with overhead lighting.
- Pictures and lithographs can be separately photographed with a DSLR and then added onto the pages.
- You might need to break the binding of a book in some situations. This is better for the long run and in some cases reveals some text that may have been hidden by the binding. Try to seek permission from the owner to break the binding then re-bind the book.
- Use the Book template in Commons while uploading books; this allows you to add several fields relevant to books. This will be useful later while importing books into Wikisource.
- Customise the setup to your scenario ; nothing works for all situations.
- great idea! Kritzolina (talk) 13:57, 15 April 2017 (UTC)
- Excellent and much-needed. Tito Dutta (talk) 14:05, 30 January 2018 (UTC)
- It's amazing , helpful for others Aliva Sahoo (talk) 03:28, 1 February 2018 (UTC)
- Interesting. --Regards, Krishna Chaitanya Velaga (talk — mail) 05:21, 16 February 2018 (UTC)