Digitization/Capture devices

From Meta, a Wikimedia project coordination wiki

The differences between capture devices are mainly set in the type of quality you are able to achieve according a determined set of variables. Quality is always the result of a combination of factors and the device can do great things for you, but if you are doing it wrong you will still get not so good results.

In this section, we will analyze the main points that you need to consider when buying a camera or a scanner. Of course, these points are also subject to the budget that you have for the work that you want to do. If image quality is an important factor for you, you need to go to the high-end cameras, such as medium format or DSLR cameras. If you only want to extract the content of the material that you are scanning, your possibilites expand: destructive scanning or most mobile cameras will do the job, provided that you control certain aspects such as dewarping of the pages.

Another important aspect to consider, no matter the device you end up buying, is that quality takes time. There is no way around it. If you go to the high-end flatbed scanners or decide yourself for a Haselblad camera, each capture is going to take a significant amount of time compared to other devices. Whether this is worth or not it is for you to assess. It is likely that if you make a good capture right from the beginning you won't need to repeat it later in time.

Which device to choose[edit]

Digital cameras[edit]

You need to bear in mind that when you are working with digital cameras there is an external component that will affect your capture greatly: lighting. While high-end cameras (such as DSLR or medium format) can compensate for poor lighting conditions up to a certain point, in most compact cameras you need to make sure that you have good lighting conditions than can compensate the fact that you are doing your scanning with a cheap camera. We'll go into lighting further on.

What makes a camera good[edit]

Before going into the specific details, there are several ways to assess the quality of a camera. The best source to assess that and to compare cameras between each other is the website DxOMark. They are an independent source that runs laboratory tests in order to assess camera quality, and they publish their protocols, results and comparisons publicly in the web.

The quality of a camera is determined by these factors:

  • sensor (including its size)
  • optics & lenses

The sensor is the element, device or surface that captures light. It is composed by photodiodes that absorb the light and convert it into an electrical signal. The sensor is colorblind, so in order to capture color different methods are applied. The most important concept is that in the vast majority of cases they will have some sort of filter that allows to capture colors in three channels: Red, Green and Blue (RGB), and that depending on the method that they use you will have varying degrees of quality (combined with other elements). Of course, as bigger the sensor gets, it has a bigger surface and more photodiodes to capture light, therefore being able to achieve more quality.

Lenses are the optical elements or devices used to divert light beams. Lenses (or optics) are by itself a great field to study and understand, because the sort of aberrations that putting together lenses causes also determine which lenses (or frames for capture devices) you will need to capture different materials (i.e. maps or artwork).

Type of cameras[edit]

Concepts to be explained: optical path & optical path length

Medium format cameras[edit]

  • Sensor larger than 36mm.
  • Exchangable lenses.

DSLR cameras[edit]

  • 24x36 mm sensor (or 36 mm sensor).
  • Exchangable lenses.
  • More setting options.

Bridge cameras[edit]

  • No exchangable lenses (fixed objective).

Compact cameras[edit]

  • Small sensor (varying sizes).
  • No exchangable lenses.
  • No eyepiece.
  • Less settings options.

Although there are a lot of options in the market of compact cameras, the reason why many members of the DIY Book Scanner community uses Canon compact cameras is straightforward: CHDK. Canon Hack Development Kit is an unofficial firmware installation that goes straight into the SD camera (it doesn't modify the firmware itself of the camera) that allows to actually hack the camera in multiple ways, enhancing options that are disabled by the manufacturer (such as accesing the RAW images) or enabling scripting inside the camera. And several other pieces of software have been built around it, such as CHDKPTP, a picture transfer protocol (PTP) to enable transfer from CHDK cameras to computers directly.

That has made these cameras a really unexpensive option for two tethered camera control.

What to look for when buying a camera[edit]

In this part, is recommended that aside from making yourself these specific questions, you also go to the DxOMark website, specially if you are deciding to buy a compact camera. If you are buying a high-end camera, such as a DSLR or a Hasselblad (medium format) camera, there are only going to be slight differences between each other. But if you are going to buy a compact camera or you are going to use your mobile camera, the only way to understand what are your trade-offs is going to be that website.

The main questions that you need to ask yourself when buying a camera are the following:

How many megapixels do you need?[edit]

If you are going to scan books or text, you might want to make sure that you have enough megapixels to at least hit 300DPI. Although in other sections you will read about the ambiguity and confusion around DPI and PPI, it is important that for text you ensure that at least you have 300DPI, otherwise OCR (Optical Character Recognition) will perform poorly or have multiple mistakes.

The formula to calculate the DPI is performing a conversion, since DPI is actually measuring "inches" while an image is measured in "pixels" (that then determine the image size). In order to do that, what you need to do is:

  • Measure your book. To provide an example, let's say that your book is 9x11 inches.
  • Mutiply 9*300 and 11*300, and that will give you the amount in pixels that you will need for each of the sides.
  • Mutiply both results and then divide the result by a million, and that will give you the amount of megapixeles that you need in order to be able to scan at 300 DPI. Add a 20-30% to that final number to make sure that you are meeting your needs.

If you need more DPI than that (for example, if you are digitizing a map) you just replace the number 300 for the amount of DPI that you want to capture (say, 600). The easiest way to calculate this is using this calculator.

Another important aspect of megapixels is that with large size material (such as maps, posters, illustrations, etc.) you might not be able to get the amount of megapixels that you need to do a proper scanning, even if you buy a really good camera. In those cases, you will need to capture your material in pieces and then stitch them together using some stitching software (we will go into software details later).

How much control do you want?[edit]

Here there are two important aspects to consider.

  • One of them is intrinsic to your capture device: as high-end the camera gets, more control you will have over a lot of variables. The variables that you need to control are the following:
  1. Aperture
  2. Shutter speed
  3. ISO
  4. White balance
  5. Flash on/off
  6. Any custom image processing (sharpening, color enhancements, etc)
  7. Focus (ideally being able to lock focus)
  8. Exposure compensation
  9. Zoom
  • The other is related to the way in which different hardware speaks to each other and the software tools put in place for that. High-end cameras will normally have the possibility to allow you to do remote capture and work with two devices in place, while compact cameras need to set different software tools for that. The problem here is that what you are mainly trying to do is to a computer talk with a camera. What can go wrong? Turns out that pretty much everything, even if you are not using two cameras.
There are several software tools in place to do this, and getting better each time. We will discuss software tools later, but if you want to go with a compact camera, you might want to choose a Canon camera that enables CHDK (Canon Hack Development Kit). To see which cameras are CHDK compatible you can check here. Other compact cameras might work with gphoto2 but that might require coding.

How important is color for you?[edit]

If most of the scanning that you are going to perform is going to be over plain-text books, then preserving color is not really an issue, and you can go with compact cameras that will do the job just fine and won't cost you a lot of money. However, if preserving color is really an issue (because you have maps, books with illustrations, illustrations, paintings, etc.), you do need to have a good camera that is able to work with RAW capture and work in different color spaces.

How much money do you have?[edit]

Of course, this is an obvious question, but not an irrelevant one. From the cheapest to the most expensive, here are your options:

  • Mobile cameras
  • Compact cameras (where you will find a lot of price variation)
  • DSLR cameras
  • Medium format cameras

You could also buy a machine vision camera if you feel like spending a lot of money to just make experiments.

Mobile phones cameras[edit]

As we mentioned, one of the main factors that determines your camera quality is its sensor. In this sense, the cameras that are attached to mobile phones for the most part don't have big sensors. But the relationship between hardware and software in this case increases their quality greatly.

Another important aspect to consider is that the mobile phones cameras are pushing the market to new unexpected directions. As an example, we can mention the development of the High Efficiency Image File Format, a new format developed by the Moving Picture Experts Group (MPEG) that is likely to replace JPEG, since it is able to store twice the data that a JPEG while having better quality.

Other cameras[edit]

In this section we have included cameras such as webcameras and the Camera Module of the RaspberryPi.

Although their price and the fact that they can be easily controlled through computers make them very attractive, for the most part they are not a good recommendation to be used for scanning (even if it's the only solution available). Their sensor size and their poor optics won't be able to capture a great resolution (or level of detail). You want to avoid using any of these.


The realm of table scanners is not only vast but confusing, in part because of the idea that the higher the DPI, the higher the quality you get. There are many things that matter in a table scanner, but the most important number that you need to search to assess quality is the DMAX (or optical depth, pixel depth, density range, dynamic range, etc., depending on the manufacture of the device). This number is a measure between the highest and lowest ranges of luminance.

Film scanners[edit]

Flatbed scanners[edit]

Automatic document feeder scanners[edit]

Photocopiers or multifunction devices[edit]

Drum scanners[edit]

Handheld scanners[edit]

What to look for in a scanner[edit]

Relationship between Image Quality Features - Type of document[edit]

Image Quality / Document Glass plates Transparencies Color photos B&W photos Color drawings B&W drawings Text for OCR
Imaging device
Resolution critical high medium medium critical critical medium
Scaling critical high medium medium critical critical medium
Dynamic range critical critical critical high medium low low
Image noise medium high medium medium low low low
Color management
Color fidelity - critical critical low high low low
Color registration medium medium medium low high low medium
Grayscale accuracy - balance high medium high critical medium medium low
Shadow details medium critical high high low low low
Dropout colors medium medium low low critical low medium
Aliasing low low low medium critical critical high
Flare critical high high high low low low
Scratches/dust critical critical high high low low low
Performance enhancement
Uniform lighting high medium high high medium medium medium
Scanning speed low medium medium medium medium medium high

This table has been made using a combination of tables, in particular the table at Gann, R. (1999) A look at Scanners in Desktop scanners: Image Quality Evaluation, New Jersey: Hewlett Packard & Prentice Hall and the table presented at Williams, D. (2000) Selecting a scanner in Guides to Quality in Visual Resource Imaging, Council on Library and Information Resources.

Values for the table:

  • low: 1-2 points, not important
  • medium: 3 points, important
  • high: 4 points, very important
  • critical: 5 points, critically important