Jump to content

OpenSpeaks/Archives

From Meta, a Wikimedia project coordination wiki

OpenSpeaks Archives is an open and public language multimedia archive optimised for Wikipedia and Wikimedia projects, focusing on lesser-resourced languages.

Phase 1: July 2025–June 2026

[edit]

This is the first phase of a three-phase project. Oral history of the majority of the world's peoples in their respective languages are not widely recognised as a form of knowledge, reducing their use as merely representational. We challenge this status quo, particularly within the Wikimedia movement, and work towards practical ways to bring oral history as a source of knowledge.

Language community members will be a part of this project, bringing their stories and knowledge into Wikimedia projects. In over ten low-resourced South Asian languages, high-quality, accessible audiovisual media will be published as an outcome. Additionally, we will create open source tools, workflows and open educational resources (OER) to help train both Wikimedia and language speaker communities. We will also collaborate with GLAM (galleries, libraries, archives and museums) institutions to ensure oral history is considered reliable source of information within Wikimedia projects. As a result, the oral history media will be citable, translatable, and usable across Wikimedia platforms.

Why this project?

[edit]

Despite being multilingual and diverse in certain aspects, Wikimedia projects lack knowledge from the majority world and their speaker communities. For example, Wikipedia articles often cover fictional languages in detail, while many living languages and their speakers remain invisible.

We identify three major barriers within Wikimedia projects:

  1. Content gaps: Indigenous and other low-resource languages are poorly documented or entirely missing.
  2. Lack of tech tools: Open, affordable, cross-platform, offline-friendly tools for community-led language documentation do not exist or are locked behind proprietary systems.
  3. Citation bias: Oral histories and Indigenous knowledge face barriers to being accepted as sources in Wikimedia projects.

This project tackles these barriers by:

  1. Publishing accessible, multilingual oral histories that can be cited.
  2. Developing the missing tools and workflows for documentation.
  3. Training and mentoring archivist-Wikimedians and language experts.
  4. Partnering with GLAM institutions to ensure the material is discoverable and reusable.

Our Approaches

[edit]
Theory of Change
The Wikimedia movement can serve low-resourced language communities by embedding community knowledge in their own languages, by providing tools for local assertion, and by ensuring accessibility for wider audiences.
High-quality, accessible language media
Selected recordings (oral histories, descriptive videos) from over ten languages will be edited, subtitled, and published. Subtitles will be bi/multilingual (a local dominant language + English) for accessibility and translation support.
GLAM partnerships and citations
Collaborating with GLAM institutions to ensure oral histories are are integrated into catalogues and made citable for Wikipedia, Wikidata, and beyond.
Tools, workflows, and OER
Identifying and co-developing missing tools for language documentation, releasing them as open source, and documenting them for archivists.
Community and capacity building
Training archivist-Wikimedians through workshops (e.g. WikiConference India, Celtic Knot), mentorship, and campaigns like Wiki Loves Languages.

Strategies and Activities

[edit]
Content creation
Process and publish oral history recordings in 10 languages with subtitles and metadata, ensuring they can be cited in Wikimedia projects.
Tools & OER
Develop and document open-source workflows for audiovisual language media, so archivists can replicate and extend the model.
Training & community building
Host in-person and remote workshops, develop training curricula, and create a peer-mentor network among language experts.
Campaigns
Co-lead Wiki Loves Languages to encourage communities in enriching Wikipedia and Wikimedia projects about languages and their speakers.
GLAM collaborations
Work with partner GLAM institutions to catalogue and cite oral histories, enhancing their reach and credibility.

Phases of Work

[edit]
Phase 1 (ongoing; July 2025–June 2026)
Community building, training, media processing, tool development, OER creation, GLAM partnerships.
Phase 2
Expand self-paced learning modules (e.g. WikiLearn), grow community-led content documentation, deepen local collaborations.
Phase 3
Showcase tangible documentation of oral cultures, broaden GLAM partnerships, and strengthen train-the-trainer and peer-learning networks.

Languages and people

[edit]

The project is guided by OpenSpeaks Fellows who are native speakers from the focus languages. The focus languages are divided into three clusters: Nepal, northern India, and eastern-southeastern India. Apart from their larger advocacy role, the Fellows will contribute in three important ways:

  • Reviewing and subtitling media, in consultation with other community members
  • Ensuring community ownership and consent
  • Acting as mentors for new archivist-Wikimedians
Focus languages
Cluster Language (ISO 639 code) Fellow/Coordinator Resource persons/other collaborators Interviewees
Northern India Marcha (dial. Rongpo - rnp)
Kimmi Pal Bimla and K.S. Bharwal
Johari (dial. Kumaoni - kfy)
Surendra Singh Pangtey Bhuppi Pangtey Surendra Singh Pangtey
Jaunpuri (dial. Garhwali - gbm)
Arun Gour Sampati, Bhagwandi, Suchita
Jaunsari (jns)
TBD Deepak Joshi
Bangani (dial. Garhwali - gbm)
TBD Jaiprakash Chauhan
Eastern-Southeastern India Sora (srb)
Opino Gomango Ramani Dalbehera, Namad Dalbehera, Opino Gomango
Juray (juy)
TBD Dinabandhu Gomango, Manjula Sabar, Srinivas Gomango
Juang (jun)
Opino Gomango
Gorum/Parengi (pcj)
Opino Gomango
Lambadi (lmn)
Nenavath Mohan Nenavath Mohan and Meghavath Sathish
Nepal Saptariya Tharu (thq)
Sanjib Chaudhary
Raji (rji)
Uday Raj Aaley
Kumhali (kra)
Uday Raj Aaley

OpenSpeaks Fellows

[edit]

The first phase of OpenSpeaks Fellowship was awarded to seven community members who will be co-leading the media identification, subtitling, translation and publication. The Fellows are:

  • Kimmi Pal is a communication designer and a Marcha-Rongpo speaker from Uttarakhand, India, who will coordinate the Rongpo archive.
  • Surendra Singh Pangtey is a noted author, former administrator, and Johari-language speaker based in Dehradun, Uttarakhand.
  • Arun Gour is an environmental activist from Bangsil, Uttarakhand, where he founded the Devalsari Paryavaran Sanrakshan Awam Tekniki Vikas Samiti, and is a speaker of the Jaunpuri dialect of Garhwali.
  • Opino Gomango is a researcher and activist of the Sora-cluster languages from Palakhemundi, Odisha, India.
  • Nenavath Mohan is a speaker of the Bangani language and is based in Telangana, India.
  • Sanjib Chaudhary is an author and Saptariya/Eastern Tharu-language activist from Nepal.
  • Uday Raj Aaley is a language researcher, lexicographer, writer, and activist from Nepal.

Output

[edit]

Media

[edit]
2025
  • July:
    • Discussed with project plan with archivists.
    • Discussed with Indic Mediawiki User Group collaboration with tools development.
    • Early funding, resource person and other logistical coordination.
  • August:
Surendra Singh Pangtey narrating a joke in Johari
    • Decision to call all language leads OpenSpeaks Fellow.
    • Organised workshop (mentioned above) in Dehradun.
    • Developed prototype tools that directly address technical gaps identified in the pilot last year, to create and edit subtitles, inspect media properties and compress files for sharing/editing, batch calculate total media duration inside folders for project planning/budgeting.
    • Shared prototypes and discussed with Indic Mediawiki User Group to collaborate for tools development.
    • In-person field translation conducted for Johari in Dehradun, India. One output video used in multiple Wikipedia articles.
    • To plan for subtitling and translation, met in person with Arun Gour, OpenSpeaks Fellow for Jaunpuri, who flagged the need for a tutorial to understand subtitling.
    • Kimmi Pal, Fellow for Rongpo, finished first draft of subtitles for interviews of Bimla and K.S. Bharwal.
  • September:
    • Seven OpenSpeaks Fellows confirmed participation.

Tools

[edit]
For further information, see OpenSpeaks/Tools.
OpenSpeaks Subtitler
OpenSpeaks Subtitler is a proposed webapp for creating audio/video subtitles both offline and online

Updates

Screenshot of a prototype of OpenSpeaks Subtitler in action
  • Prototype created and tested by multiple archivists
  • Prototype and technical design brief shared with Indic MediaWiki Developers User Group

Awareness & capacity building

[edit]

In media

[edit]

Supported By