OpenSpeaks/Community language documentation and archiving training
This curriculum is developed by OpenSpeaks to train those who document and archive low-resourced languages. See more.
Last updated 26 April 2026 (v.1.0.0-beta).
Purpose
[edit | edit source]This one-day workshop helps people with some experience in archiving or documentation learn the full workflow for language documentation.
Who this is for
[edit | edit source]- Native-speaker archivists and other community researchers
- Wikimedians
- People working with nearby or related languages
- How to plan a language documentation project.
- How to handle consent, credit, and payment fairly.
- How to record clear audio and video.
- How to manage files safely.
- How to edit media for sharing.
- How to prepare media and metadata for Wikimedia Commons, Wikidata, and Wikipedia.
- How to share materials with archives, libraries, and communities.
Trainer will share tools and editable templates at the end. Participants can use these templates and start documenting and archiving.
Session plan
[edit | edit source]- How to choose the language or language community.
- How to decide what kind of material to record.
- How to set a realistic goal for the day or project.
- How to get money to pay people for labour.
- How to explain the project to the community in simple language.
- How to get consent from speakers and contributors.
- How to discuss credit, shared ownership, and payment for labour.
- How to decide what can be shared openly and what should stay private.
3. Recording
[edit | edit source]- How to choose the right device.
- How to test audio before recording.
- How to keep notes on speaker name, date, place, and topic. (
metadata) - How to record in a quiet place.
4. File management
[edit | edit source]- How to use clear folder names.
- How to save raw files and edited files separately.
- How to make backups.
- How to keep a simple log of all files.
5. Processing audio/video
[edit | edit source]- How to trim only what is needed.
- How to improve sound.
- How to export in stable formats.
- How to create transcripts or subtitles when possible. (see guide)
6. Publication
[edit | edit source]- How to prepare short project and file descriptions.
- How to add language name, speaker name, date, and place.
- How to choose the correct license.
- How to upload to Wikimedia Commons.
- How to add participant and project details to Wikidata.
- How to use files in Wikipedia articles.
7. Sharing and deposit
[edit | edit source]- How to contact archives and libraries and deposit media for archiving.
- How to share copies with the community.
- How to plan follow-up support after the workshop.
Hands-on outputs
[edit | edit source]- One documentation plan
- One consent and rights note
- One file-folder structure
- One cleaned media sample
- One Commons-ready metadata draft
Guide for trainers
[edit | edit source]See below for details for each above sections (1–6). Use this guide to prepare examples, stories, and activities from your own experience.
1. Planning before documenting
[edit | edit source]- Trainer goal
Help participants plan well, not just go and record. By the end, each participant should have a simple, written plan, covering: who they will work with, what they will document, why it matters, and where the recording might live in the long term.
1.1 Connect to archives and communities
[edit | edit source]- Explain what a "language archive" is in community language documentation: a community-controlled collection, a Wikimedia project, or a formal archive that can preserve language materials for the long term.
- Use a short comparison:
- Academic archives focus on long-term preservation, fixed technical standards, and detailed metadata.
- Community documentation must balance preservation with local ownership, consent, and easy access.
- Ask participants: "Who should be able to hear and see this material in 5–10 years?" Use this to talk about which archives or platforms might be appropriate (local, national, Wikimedia, or specialist language archives).
1.2 Define collections and bundles
[edit | edit source]- How to build a collection: a set of recordings, photos, texts, and notes that belong together because of a shared theme, place, family, or project.
- Ask each participant to choose:
- One focus language (or neighbouring language).
- One main theme (example: contemporary issues, oral histories, songs, everyday conversations, farming vocabulary, craft knowledge).
- One realistic time frame (example: "3 interviews in the next 6 months").
- Emphasise that small, well-described collections are more useful than large, messy ones.
- Share how each collection can have recordings in many languages, and each language/dialect can have its own bundle.
1.3 Plan for ethics, consent, and labour
[edit | edit source]- Connect planning to ethics: before recording, participants should think about consent, power, and labour.
- Ask them to list:
- Roles (speaker, interviewer, camera person, transcriber, community reviewer).
- Risks (sensitive topics, political risk, gender-based risk, young speakers).
- Benefits (local use in community, teaching, advocacy, community memory).
- Encourage participants to write down:
- How to ask for and record consent: written form/audio statement.
- How contributors will be credited and, when possible, remunerated.
- How to separate content: open to all (Wikimedia, Internet Archive, ELAR), limited (most archives), or community-only (select archives). (Read Key Approach 8 in Digital initiatives for Indigenous languages: "Protecting indigenous linguistic heritage and communities").
- Share tips on researching for grants, writing grant proposals, asking for money, and planning
1.4 Trainer activity
[edit | edit source]- Activity: Draft a mini project plan
Ask each participant (or pair) to write one page with:
- Project title and short description.
- Language(s), place, community.
- Types of materials (audio, video, photos, text).
- Intended uses (local community, Wikimedia, school, archive).
- Possible archive(s) or platforms where the collection could be stored.
- Possible donors to fund
Individual/group can edit this plan online after the session, and share in group chat and ask others for feedback. Task to be finished before the next session.
2. Consent, ethics and payment
[edit | edit source]- Trainer goal
Explain how consent, rights, FAIR-CARE-based ethical practices, and payment are used in practice.
2.1 Asking for and recording consent
[edit | edit source]- Explain how documenting consent is used to decide what is permitted and what is not permitted, and how archives set access levels based on consent.
- Emphasise:
- What informed consent is; how to ask and record consent: how they must explain where the recordings might go (archive, Wikimedia projects, commercial exploitation).
- Consent can be individual, household, or community-level, depending on context.
2.2 Rights, licenses, and ownership
[edit | edit source]- Introduce basic terms in simple language:
- Copyright: who legally controls how others can copy and reuse.
- License: a written permission for how others can reuse the recording.
- Open license: give example of Creative Commons licenses used on Wikimedia projects.
- Explain the tension:
- Archival best practice encourages clear copyright and stable licenses.
- Community practice may prefer collective or shared ownership of traditional knowledge, and may limit what goes online.
- Suggest practical recommendations:
- For Wikimedia projects: choose an appropriate free license (CC0 to CC BY SA) that everyone agrees to.
- For sensitive material: keep a “community-only” copy in a local or restricted archive.
2.3 Payment and labour
[edit | edit source]- Explain about labour and fair payment:
- How to collect information and credit all contributors (seen/heard in video/audio).
- How time and knowledge are labour.
- Ask participants to think:
- Who is giving time (speakers, elders, translators, people lending spaces)?
- What non-monetary benefits are possible (copies, credit, public thanks, sharing outcomes, food, transport)?
- When is payment appropriate and possible, and who decides?
2.4 Trainer activity
[edit | edit source]- Activity: Consent and access circle

Draw four circles on a board with levels such as:
- Level 1: Private, kept in the community only.
- Level 2: Restricted, in an archive with controlled access.
- Level 3: Public, but not widely promoted.
- Level 4: Public and open, on Wikimedia, websites, and social media.
Ask participants to place example recordings at different levels and to discuss what consent would look like for each.
About
[edit | edit source]This is an open educational resource (OER), designed for training a small group of community archivists, native speakers, and Wikimedia contributors. It is released under a Creative Commons ShareAlike 4.0 International License (CC BY-SA 4.0).
Binding principle: This curriculum is intentionally written in simple language, using short sections, reducing jargon and acronyms, defining necessary technical terms, and separating universal guidance from context-specific examples. This is to ensure it's easy to translate into other languages or adapt for a particular language community. Each module uses a repeatable structure: what this topic is, why it matters, what to prepare, steps to follow, common mistakes, practice exercise, and links to open tools or examples.
This resource can be used to move from "recording" and "uploading" to a full documentation workflow: planning, ethical preparation, audio/video recording, file management, processing, publishing, and linking media across Wikimedia and GLAM ecosystems. The OpenSpeaks toolkit frames documentation as a community-first process, covering consent, audiovisual recording, metadata/publication, and accessibility.
The workshop described should end with each participant producing a complete mini-workflow: one documentation plan, one consent/rights decision, at least one-two sample recordings, and one publication-ready metadata draft.
Further reading
[edit | edit source]- "Guidelines for PARI contributors". People's Archive of Rural India. 2016-06-26. Retrieved 2026-04-26.
- Kung, Susan Smythe; Sullivan, Ryan; Pojman, Elena; Niwagaba, Alicia (2020). "Archiving for the Future: Simple Steps for Archiving Language Documentation Collections". Retrieved 2026-04-27.
- Llanes-Ortiz, Genner (2023). Digital initiatives for indigenous languages - UNESCO Digital Library. Paris: UNESCO & Global Voices. ISBN 978-92-3-100617-3.
- Seyfeddinipur, Mandana; Rau, Felix (September 2020). "Keeping it real: Video data in language documentation and language archiving". Language Documentation & Conservation (University of Hawaii Press) 14. ISSN 1934-5275. Retrieved 26 April 2026.
- Daigneault, A. L.; Udell, D. B.; Tcherneshof, K.; Anderson, G. D. S. (2022). "Language Sustainability Toolkit" (PDF). WikiTongues & Living Tongues Institute for Endangered Languages.