Grants:Project/Documenting digitization practices in Wikimedia communities

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Documenting digitization practices in Wikimedia communities
summaryI want to document and help improve the current and future digitization practices of Wikimedia communities. This project will also help understand the current landscape and strategies for community-centered digitization projects.
targetWikisource, Wikimedia Digitization User Group, Wikimedia affiliates doing digitization projects and prospective digitization projects inside the Wikimedia space; Wikimedia GLAM partners
type of grantresearch
type of applicantindividual
this project needs...
created on16:25, 21 November 2018 (UTC)

Project idea[edit]

What is the problem you're trying to solve?[edit]

What problem are you trying to solve by doing this project? This problem should be small enough that you expect it to be completely or mostly resolved by the end of this project. Remember to review the tutorial for tips on how to answer this question.

For GLAM institutions, contributing to Wikimedia Commons always presuppose that the heritage of the institution is digitized, well catalogued and with some metadata in place. While this is might be true in several parts of United States and Western Europe (be it because they do it themselves or rely on the help of other partners), the reality in emerging communities varies greatly. Openness is an abstract concept if you don’t have the resources to put your content in the open first.

These institutions see and feel the need to have in place workflows for digital content, only that they don’t have both knowledge and technical challenges to set them. To address this issue, in the last years several Wikimedia Affiliates have put together digitization projects. But digitization in the Wikimedia context also has a number of specific challenges: they are mainly volunteer driven, they depend on Wikimedia Commons copyright rules and developing high quality scans and metadata for Wikimedia’s needs can be challenging.

Despite that, an increasing number of Wikimedia communities are expressing interest in supporting this kind of projects. The solutions developed by the Wikimedia Affiliates rely heavily on the DIY Book Scanner project, but each of them have designed their own approach to deliver this solution. These projects and the documentation of these practices is very decentralized. For Wikimedians, open advocates and GLAM partners wanting to get into digitization, it is often very unclear where the best place to start is. And with the current models, GLAM institutions rely on Wikimedia Affiliates to deliver either the machinery or the paid or unpaid labor, which limits the outreach capacity of a program like this.

DIY digitization and the larger body of knowledge for small scale and volunteer driven digitization projects has a community of practice already existing in the world. But documentation is still scarce and scattered around, so as a Wikimedia community wanting to initiative a volunteer-driven and open-technology digitization program, its very hard to figure out best practices. The Telegram group of the Digitization User Group has proved that there is a quite large need for Wikimedians trying to solve their digitization questions.

Wikimedia volunteers over the years have tried to build documentation around scanning and digitization. We can cite several examples. The Learning Pattern: Digitizing Archival Records is useful but incomplete; the Commons Help:Scanning is a very good how-to guide, although might require some re-organization; and the Wikisource Help:Digitising texts and images for Wikisource seems like a halfway-through effort. And these pages don’t document the practices of Wikimedia Affiliates, or are translated to other languages different from English.

There is already documentation about digitization and digitization practices. As an example, we can cite the FADGI guidelines. These guidelines always assume a use-neutral approach, meaning that you will always aim to achieve the best quality possible with the highest quality technologies available, something that in the context of emerging communities can be quite difficult due to the lack of financial resources, among others. Even materials that pay attention to the emerging community use case still have as its baseline the notion that you will be having the highest quality technology that will allow you to make good quality scans. You can still achieve good quality even if you don’t have a lot of resources in place, but for that you need to take informed decisions.

These kinds of decision making tools will have applications beyond Wikimedia communities: if you are an underfunded institution working in an emerging community you need documents that provide accurate information of which are your trade-offs when deciding between one technology or another, and what are the small, non-as-expensive tools that you can implement that can help you do your work better.

Moreover, one of the most important challenges has to do with the way in which this information is currently being provided. Most of it relies on being discovered by really self-driven and motivated staff in a GLAM institution, in situations where language barriers play a major role. Digitization is absent as an issue in most LIS curricula around the world, and there is no sustained effort to try to move this knowledge forward to the front line in emerging communities, and with GLAM partners.

In short, both Wikimedia communities and GLAM partners in emerging communities need to level up their knowledge regarding digitization practices. For that, we need to build good teaching materials about digitization and good documentation of Wikimedia practices in the context of digitization, and focus on delivering those materials both through Wikimedia projects and through face-to-face training.

What is your solution?[edit]

For the problem you identified in the previous section, briefly describe your how you would like to address this problem. We recognize that there are many ways to solve a problem. We’d like to understand why you chose this particular solution, and why you think it is worth pursuing. Remember to review the tutorial for tips on how to answer this question.

The solution to this problem is to document the process and practices being used by Wikimedia Affiliates and identifying the knowledge gaps over digitization, and identify opportunities to improve those practices through consultation of the established literature and digitization communities. This research will allow the development of technical recommendations, workflow and planning recommendations and teaching materials about digitization. That would later on inform a “train the trainers” toolkit that will allow Wikimedia Affiliates to deliver training to GLAM institutions that want to start digitization projects but don’t know how. These materials would allow alliances with other partners in the GLAM space that could provide support, input and feedback on the content and initiative on a rolling-basis.

For this grant, I plan to expand and improve the work I started at my Fellowship in the Harvard Library Innovation Lab. This was my initial project which I got the Fellowship for and this was my closing presentation (video here). I’ve been a long-time contributor to the DIY Book Scanner project, working directly with the founder Daniel Reetz. I manage the DIY Book Scanner forum (alongside with Jonathon Duerig, now in charge of the project), and in that capacity I've done things such as re-organizing the forum, translating the website to Spanish, wrote an English glossary and then a Spanish one, tutorials such as the ScanTailor tutorial in Spanish, among others, activities all oriented towards making information more accessible for people. I have assembled at least 15 DIY Book Scanners for several GLAM organizations, including Wikimedia Uruguay, and I started and run the digitization project at Wikimedia Argentina for 3 years. In 2016, with a group of colleagues we started doing digitization workshops with libraries, archives and museums, training more than 200 professionals and community archives, and I’ve been improving the teaching materials (2016, 2017, 2018) over each iteration of the course, based on the feedback from the participants that we have systematically gathered. I have also did shorter versions of the same course in other locations, per the request of libraries and archives. We also have a community of Spanish speaking practitioners gathered around our Facebook page, and we could also expand that interest to our Facebook group.

Building a set of open tactics focused on the Wikimedia space, in turn makes Wikimedia a more attractive platform for sharing knowledge digitally unavailable from communities around the world. Having documentation, standard practices and training materials can also bring the attention of funding partners and coalitions that want to support reliable digitization work in emerging communities, in partnership with GLAM institutions and Wikimedia communities.

Project goals[edit]

What are your goals for this project? Your goals should describe the top two or three benefits that will come out of your project. These should be benefits to the Wikimedia projects or Wikimedia communities. They should not be benefits to you individually. Remember to review the tutorial for tips on how to answer this question.

  • Document some of the practices that the Wikimedia Affiliates have been putting in place to help GLAM partners digitize their content, especially in emerging communities. For building this documentation I will take into consideration the results of the survey around “How do you like your GLAM-Wiki documentation” being done by the GLAM-Wiki team.
  • Make recommendations on how to improve digitization practices in the Wikimedia community, so that the Wikimedia movement can better address the Knowledge Equity and Knowledge as a Service goals of the Wikimedia movement direction.
  • Help build capacity among Wikimedia Affiliates to train GLAM partners to digitize their content and build proper digital workflows in their institutions.
  • Identify technical gaps in the open digitization tooling ecosystem in order to make recommendations addressing technical challenges to build a more streamlined, standard process for digitization in the larger open community.
  • Help build a global network of support for digitization projects in the Wikimedia space that allows them to take digitization projects off the ground.
  • Provide an outline of grants and programs outside the Wikimedia space financially and technically supporting digitization.

Project impact[edit]

How will you know if you have met your goals?[edit]

For each of your goals, we’d like you to answer the following questions:

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)

For each of your answers, think about how you will capture this information. Will you capture it with a survey? With a story? Will you measure it with a number? Remember, if you plan to measure a number, you will need to set a numeric target in your proposal (e.g. 45 people, 10 articles, 100 scanned documents). Remember to review the tutorial for tips on how to answer this question.

  1. We will have put together a centralized page to offer information about digitalization and digitization practices inside the Wikimedia movement, that could be easily found and updated by anyone.
  2. We will have build teaching materials and tactics to deliver this content in a practical, step-by-step guide for beginners and GLAM partners, and trained chapters and affiliates to deliver the content.
  3. We will have invited more supporters for Wikimedia digitization projects, specially outside partners.
  4. We will have a shortlist and set of recommendations around the hardware & software needs over open tools for digitization.

Do you have any goals around participation or content?[edit]

Are any of your goals related to increasing participation within the Wikimedia movement, or increasing/improving the content on Wikimedia projects? If so, we ask that you look through these three metrics, and include any that are relevant to your project. Please set a numeric target against the metrics, if applicable. Remember to review the tutorial for tips on how to answer this question.

I want to interview at least:

  • At least 10 participants from the Wikimedia community from (tentative list):
  1. Communities in India
  2. Communities in Latin America (Argentina, Uruguay)
  3. Wikimedia Indonesia
  4. The GLAM-Ghana team
  5. Wikimedia Bulgaria
  6. Wikimedia Serbia
  7. Wikimedia Armenia
  • Around 7-10 other practitioners doing community-centered and open digitization. A tentative list include:
  1. Indigitization
  2. Our Digital World
  3. Freedom Archives
  4. CLIR - Center for Library Information Resources
  5. people from the DIY Book Scanner community
  • Around 3-4 other GLAM institutions doing digitization in the higher-quality end of the spectrum. Ideally these organizations would include organizations doing:
  1. paper formats scanning (books, paper photos, etc.)
  2. video & film scanning
  3. audio scanning
  4. other formats scanning (such as glass plates, etc.)

At the end of the grant I would like to train at least 10 Wikimedia organizers through an on-site workshop at Wikimania and iterating on those materials, offer an online course that could include a more decentralized group of participants.

Material created during this grant will help Wikimedia communities create higher quality and more content as part of digitization programs. In particular, it will help communities that are working with under-resourced and underrepresented communities, since they are also the ones that face more challenges when trying to digitize their content properly.

Project plan[edit]


Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

The primary activities that this project will carry are:

Months 1-2

  • Identify & interview Wikimedia communities and similar community-driven environments that have attempted digitization in the past few years, in order to evaluate the needs of the Wikimedia community in this space and understand the best practices from practitioners of community-centered digitization projects.
  • Identify and list all the common problems and questions that have been shared and asked through the DIY Book Scanner forum, the diybookscanner [at] account (which I'm in charge of), the Telegram of the Digitization User Group, and several other communication channels.
  • Identify partners in the space that could provide help or support to Wikimedia communities with regards to digitization practices.

Months 3-6

  • Develop teaching materials for digitization.
  • Develop a guide that includes the technical and planning aspects that are often hard to grasp or learn from the existing body of documentation, with a special focus on building visual materials. The guide will include the following (some of this headlines are already at the Digitization portal; also, this is not an exhaustive list and is not arranged in any given order, and right now it's covering only concepts regarding book/paper format; I will make sure that in the process of making these materials I will also be able to add other type of materials as well):
  • Understanding the concept of digitization
  • What is the process of digitization
  • Understanding your goals & needs for a digitization project in relationship to the money that you have.
  • What's the "use-neutral approach" and why it is important;
  • What can you do with the reality of your context (lack of resources, training, etc.).
  • How to digitize different type of materials according to the goals & needs of your project.
  • An update of the glossary that I developed here.
  • How to choose a proper capture device:
  • Concepts such as "image quality" and how important they are in relationship to the project you have (some of that has been already developed here, but could be expanded into a better guide).
  • How to read a technical specifications sheet and what things are important (and which things aren't) when buying a scanner.
  • How to pick a camera.
  • Understanding which DIY Book Scanner model might be useful for you according your needs, the type of materials that you need to digitize, and your budget.
  • How to budget for a DIY Book Scanner: things to consider.
  • How to pick a camera.
  • Other aspects that also impact on good quality digitization
  • Environmental control
  • Color concepts
  • Image quality control
  • Lighting
  • File formats, compression methods, etc.
  • Basic concepts on digital preservation
  • Planning
  • Running tests & building an incremental project. How to get your project off the ground
  • How much people do you need for the goals you are setting for the project
  • How to calculate/estimate how much time a digitization project will take you. Is your timeline realistic?
  • How to budget for a digitization project and not fall short. Is your budget realistic?
  • An ultimate guide to all the FLOSS tools that exist for doing digitization and how they pair up (or not) with their privative equivalents.
  • Produce a list of the existing bibliographical resources on the topic, both in English & Spanish.
  • Make a FAQ regarding typical questions around digitization.
  • Develop the @digitization_bot and the @adigitalizar_bot in Telegram to easily answer questions regarding digitization (fed by the FAQ), in part by referring folks to the new documentation.
  • Make the spreadsheets & cheatsheets needed to carry on with the project (including budget, timelines spreadsheets, etc., among others).
  • Overhaul the portal at Digitization with those materials.
  • Provide workshop as part of Wikimania.

Months 7-9

  • Build a “train the trainers” curriculum so affiliates can locally help GLAM institutions in doing this work. This would include an outline of the main concepts that need to be taught, the content, and a set of slideshows and scripts for delivering the content of the course, pointers to further reading resources to help dealing with questions that might have arisen during the training, and pedagogical approaches to help deliver the training. Basic materials will include:
  • Outline
  • Modules
  • Learning outcomes for each module
  • Summary of module
  • Basic concepts that will be taught in this module
  • Recap of module
  • Pedagogical approaches to teaching technical concepts
  • How to introduce technical concepts to people new to the field
  • Don't use the word "easy"
  • You don't have to know everything
  • There might be someone that already knows what you are explaining
  • Be simple
  • Teaching is not about proving how much you know
  • Managing your time to go through the explanations
  • How to run successful tests in challenging contexts
  • Make sure you test before
  • Check that you have all the resources you need (with checklist)
  • Deliver at least one online “train the trainers” course for Wikimedia communities focused on using the teaching materials to train other partners.
  • Review and iterate on the documentation based on feedback.


How you will use the funds you are requesting? List bullet points for each expense. (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!
This request seeks mainly to fund a half-time dedication for 9-months. Previous experiences and conversations around this project seem to demonstrate that is difficult (if not impossible) to develop strong documentation only on a volunteer basis.

Budget items

  • Professional time dedicated to develop the documentation. This will cover the necessary time to contact the interviewees, make the interviews, take notes, systematize them, make the literature review, design and write the documentation, set together the valuable pieces of information needed for improving practices, designing the training materials and delivering the trainings. Budget: $2,000 per month. 1*9 months = $18,000.
  • Books and literature around digitization. While there is significant literature available online, for certain materials (especially technical ones), information might not be available online. Since it is difficult to estimate such costs, the proposal is to establish a fixed rate of $1,000 for buying literature. Once the project is done the material will be donated to a public library TBD. Budget: $1,000.
  • Travel costs to deliver a workshop as part of Wikimania.
  • Overhead costs. These costs include any unexpected event during the project that might impact on its development. Planned overhead costs: $500. If for whatever reasons these funds don't get used, I will buy some equipment (like color checkers) to donate to a community digitization project.
  • Total requested: $20,540

Community engagement[edit]

Community input and participation helps make projects successful. How will you let others in your community know about your project? Why are you targeting a specific audience? How will you engage the community you’re aiming to serve during your project?

This project aims to involve at least three different communities of practice:

  • Wikimedia communities and affiliates doing in-ground digitization work that need to document and improve their workflows;
  • potential GLAM partners and GLAM professionals in emerging communities interested in digitization, with a special focus on DIY digitization;
  • partners, funders and professional associations working globally that could support digitization work of Wikimedia communities and GLAM professionals in emerging communities.

Alongside from that, we would count with the input of the following contacts to serve as advisors for the project and main liasons within the Wikimedia communities and potential partners.

I also plan to send regular updates on the current state of the materials through the several Facebook pages that exist around GLAM-Wiki, the Telegram digitization User Group, and in communication with the GLAM-Wiki Team at the Wikimedia Foundation.

Get involved[edit]


Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.

  • Scann - Evelin Heidel. Creative Commons member, where I lead the OpenGLAM Community Platform. I'm part of the DIY Book Scanner project, I lead a project to build a database of the public domain in Argentina (property P4158) and Harvard Fellow 2018 at the Harvard Library Innovation Lab, dependant of the Harvard Law School Library, where I presented (and started) a project regarding these topics. Read my full CV here and read more about my work with the DIY Book Scanner community above in the section "What is your solution to this problem?"!
  • Volunteer As somebody engaged in the translation of FADGI guidelines (into Estonian) I would like to work on making contacts between the GLAM digitization community and Wikimedia volunteers. Puik (talk) 11:23, 24 November 2018 (UTC)

Community notification[edit]

Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc. Need notification tips?


Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).

  • Support Support Documentation and how-to's are very useful to develop efficient digitization workflows. Open educational resources on this subject will allow greater access to this type of specialized technical knowledge, and will also allow translations. --Pepe piton (talk) 21:04, 21 November 2018 (UTC)
  • Support Support --Jalu (talk) 17:50, 22 November 2018 (UTC)
  • Support Support The project aims to professionally document digitization processes and design a curriculum for the training of trainers. These are useful inputs that will support the work of wikimedians and allied institutions. I think it is a necessary project, given the growing interest in GLAM and digitization in the community. --Señoritaleona (talk) 01:30, 24 November 2018 (UTC)
  • Support Support. I believe the topic is very important. But I do not agree with the implication about FADGI guidelines "These guidelines always assume a use-neutral approach, meaning that you will always aim to achieve the best quality possible with the highest quality technologies available, something that in the context of emerging communities can be quite difficult due to the lack of financial resources, among others." FADGI guidelines (NB! published under CC0) are mostly about the full process of digitization with special attention to quality control. The guidelines also define quality levels so that the focus is on quality measurement and awareness of the parameters that can and need to be measured and not about high end hardware. Therefore I believe it is important to also make contacts to FADGI (Tom Rieger from LOC as the main editor of the last edition) and discuss the options of localization of FADGI guidelines (we are about to finalize the translation of the guidelines into Estonian). --Puik (talk) 10:55, 24 November 2018 (UTC)
Hey Puik, thanks for that feedback. I think you're right, and that might have slipped in the proposal a bit. My feeling with regards to the FADGI guidelines is that you still have to understand several concepts before that are on the base, alongside with other things that we could definitely discuss in another place (such as what sort of myths we end up unintentionally creating around quality, quality control, etc.). I can re-write that section to better reflect what you're saying (and what I think) and of course I totally agree with you that having the input and feedback of people from the LOC would be awesome. Scann (talk) 15:18, 24 November 2018 (UTC)
  • Support Support Hi -- my name is Matías Raia. I'm not an active member of any Wikimedia project, but I've been working with Scann for a long time and I'm part of the group that delivers the digitization workshop that she mentions in her proposal. One of the reasons that I think moved her to do the digitization workshop (and embark several of us into it!) is that there's an important knowledge gap in terms of technical understanding on how to do the digitization work, that in turns creates the false idea that if you don't have a multi-million dollar budget you can't do anything. And the need to build this knowledge base is so big that we can't rely on a very reduced or selected group of people to deliver this sort of training -- I think that she's hitting the sweet spot on growing the replicability of teaching and training, and for that you need to count with resources and to know which ones are useful for your work. Mhraia (talk) 12:21, 24 November 2018 (UTC)
  • "Hi, I'm Loren Fantin and I run Our Digital World, one of the organizations proposed to be interviewed here. We have been working with community archives for over ten years and we know that they face several challenges regarding digitization. We would love to share our experience in building a microfiche DIY scanner. Even when I'm not part of any Wikimedia communities, I think that the commitment that Scann has with this idea and her consistent track record on the field will help a larger body of communities and projects that are trying to take their heritage from their analog format and bring it online. A consistent effort to document and to share these resources will help Wikimedia to make a more ambitious plan on the digitization front that hopefully will help a lot of communities in the ground". 17:02, 27 November 2018 (UTC)
  • Support Support Scann is an able professional, and one of the few people in the Wikimedia Movement whom I count on around content digitization issues. I think having her share her unique knowledge is a way to ensure future GLAM digitization projects will have a resource to count on, and that's super worthwhile! Alleycat80 (talk) 22:42, 4 December 2018 (UTC)