OpenSpeaks/Text Style Guide
OpenSpeaks Text Style Guide is to help language documenters and archivists transcribe the spoken words in audio and video as text.
It covers captioning, subtitling, and transcribing.
- Captioning: writing spoken words and non-speech elements as text on screen for deaf people, in the language spoken in audio/video
- Subtitling: translating captions into a different language
- Transcribing: a written record of the spoken words, different from captions, which have time-codes
1. Captioning
[edit]Captioning helps people who are deaf or hard of hearing. They cannot hear audio but can read if the same words appear on screen. Captioning helps with accessibility. But captioning is only for languages with scripts and for people who are literate (can read) in that script. If a language lacks a script, you can caption audio/video by writing in another script. Use a script of a neighbouring language taught in school.
- 1.1 Accuracy
- Write what you hear: Captions should match the spoken words exactly. Include slang and informal language if the speaker talks that way naturally.
- Language mixing: If a speaker uses words from a different language, put them in italics.
- 1.2 Timing and Format
- Keep it short: Try to write a maximum of 30–35 characters per line so text is easy to read. Break a sentence into two lines if too long. Otherwise, break sentence into 2-3 meaningful parts.
- Example:
We are keeping the crops in a closed bin in winter.
should be:
We are keeping the crops in a closed bin in winter.
- Text line: Never have more than 2 lines of text on screen at a time.
- Timing: Adjust timing so your text appears when the speaker begins and disappears when they stop. Make sure the text stays on screen long enough to be read comfortably (between 3 to 7 seconds)
- Meaningful breaks: For long sentences, break when the speaker pauses. Don't break in middle of a phrase.
- 1.3 Give background
- Sound effects: Use brackets to describe sounds that are important to the story. Example
(DRUMMING),(LAUGHING).
- Speaker labels:
If more than one speaker are speaking, use their name at the start. If full name is too long use the first character of the last name. Example: use Surendra S.P. instead of Surendra Singh Pangtey In such cases add a note separately in the audio/video description text, mentioning full names.
Subtitling
[edit]The subtitle guide is based on the BBC Subtitle Guidelines (Version 1.2.5, March 2026), which is itself based on the EBU-TT-D standard. This convention details multilingual interviews (interviewer and interviewee speaking in different languages and switching between languages within the same recording).
File format
[edit]We recommend "SRT (.srt)" or "VTT" format for subtitles. Wikimedia Commons uses TimedText, which is similar to SRT. Open an SRT file in a text editor (right click > open in <Text Editor name>), and copy the text to the TimedText. Each subtitle block must contain:
- a sequential index number
- a timecode in the format
HH:MM:SS,mmm --> HH:MM:SS,mmm - one or two lines of subtitle text
- a blank line separating each block
Example:
1 00:00:01,292 --> 00:00:02,001 Your name? 2 00:00:02,210 --> 00:00:03,086 Sukra Dhangdamajhi
Language layers
[edit]Each recording is subtitled in at least two layers:
- Source language — the primary language spoken in the recording. Subtitles transcribe what is said as closely as possible, including code-switches (see Section 5).
- Translation language — a more widely spoken language into which the source subtitles are translated.
Each language layer is a separate SRT file. File naming follows the convention:
LanguageCode-RecordingID-LayerLanguageCode.srt
Example: Bfw-Munaremo-SukraDhangdamajhi.or.srt (Odia (or) layer of a Bonda (Bfw) recording)
The ISO-639 code is used for widely documented languages. For languages without an assigned code, use the code from the Glottolog or a full language name. Dialects are not often captured in subtitles.
Identifying speakers
[edit]When you hear more than one speaker in a recording, identify the speaker the first time each speaker appears in each subtitle block, by using their full name; thereafter use it only when the speaker changes or when there is a gap of 30 seconds or more.
Place the speaker name on a separate line above the speech. Add a colon (:) after the name and add a line break by adding <br> after the colon. If you are writing in Latin (script for English, French, Khasi, Bahasa Indonesia, and Igbo, etc.) script,
1 00:00:01,292 --> 00:00:02,001 GOBARDHAN PANDA: Your name? 2 00:00:02,210 --> 00:00:03,086 SUKRA DHANGDAMAJHI: Sukra Dhangdamajhi
Do not use the name the second time or later they appear, unless it is unclear who is speaking.
Off-screen speakers
[edit]When you can hear a speaker, but they are not visible on screen, use a 'single quote' , before and after their speech. Sometimes the speech might be in multiple lines. Then, add one single quote before the speech and another single quote after the speech.
7 00:00:05,755 --> 00:00:07,090 'What all do you do?'
Inaudible speeches
[edit]When you cannot hear speech for any reason, explain the cause in subtitles. If writing in Latin script, write in capital letters/ALL CAPS.
POURS WATER INTO GLASS. INAUDIBLE SPEECH.
Whisper
[edit]For whispering speech, label it.
WHISPERS: I knew it.
If subtitle is very long, add brackets around subtitle.
(We often pickled raw mangoes in summer and ate throughout the year.)
If whispering continues for long, use label/bracket for the first time, and avoid any label/bracket after that.
Code-switching
[edit]Code-switching: when a speaker shifts from one language to another within the same speech (in same sentence or part of recording). This is common when interviewers and interviewees speak a language other than the target language of the documentation. Clearly naming the languages spoken (in ALL CAPS in Latin script) helps both translators and readers.
IN NEPALI: Our worship rituals...
If a switch happens mid-line, label in the beginning of switch inline in angle brackets ({TEXT}):
We speak Raji, {NEPALI} during our worship rituals.
Use a line break at the point a speaker returns to primary language (of documentation).
{NEPALI} during our worship rituals,
We speak Raji.
Use speaker name (when appearing first time in recording) and language name if off-screen speaker code-switches.
UDAY AALEY:
And in which places {NEPALI} do you use the Raji language?
Use these only when the speaker has clearly shifted to a different language, not when they use a single loanword or names of people/places/things.
Hesitations, fillers, and overlapping speech
[edit]Keep meaningful hesitations (Hmm..., Okay...). Do not transcribe every filler if anything occurs too many times. That makes the subtitle unreadable.
Indigenous and community-specific terms
[edit]You might find Indigenous or community-specific terms that lack translations in other languages. Keep the original names. To distinguish from other words in the subtitle language, use capital letters/uppercase.
I am the DISARI of this village.
Line length and reading speed
[edit]Recommended:
- Maximum two lines per subtitle.
- Maximum 42 characters per line (including spaces).
- Minimum display duration: 1 second.
- Target reading speed: 160–180 words per minute for general audiences; give more time for recordings with technical, cultural, or Indigenous words.
Break lines naturally, especially when breaking mid-sentence:
We built this small house to keep our domestic animals.
Metadata for each subtitle file
[edit]It is helpful to include subtitle-related extra information (metadata). For Wikimedia Commons, you will need to include this in the file information.
Recommended fields to keep:
- Primary language: Language spoken (with ISO code)
- Subtitle language: Language of this SRT file
- Speaker(s): Full names preferred
- Interviewer(s): Full names preferred
- Subtitler(s): Full names and method (e.g. "transcribed from Bonda by X; translated to Odia by Y; English translation and Odia editing by Z")
- Code-switch language(s), if any: Any languages appearing in angle-bracket tags, with codes
- Notes: Any additional notes, in case this convention was changed, or terms requiring extra meaning
Transcribing
[edit]TBD
Notes
[edit]- We strongly recommend the "Video stories" section of the Archive of Rural India guidelines for documenting people's stories. It is short and clear.
- See Oral History Framework for more about areas of language documentation
- OpenSpeaks details all the steps of community-based language documentation