WikiProject Language samples

From Meta, a Wikimedia project coordination wiki

The Wikipedia exists in many languages, in which we have articles about languages. We already have a lot of high quality wikipedia-articles about languages in which you can find information about the number of speakers, the vocabulary, stem and grammar of the language. But often one very natural question about a language is still left unanswered: "How does this language sound?".

This project seeks to change this. The goal of this project is to add a small sample of text spoken by a native speaker to all the articles about languages. The first article of the Universal Declaration of Human Rights (UDHR) is an appropriate choice for this, since it is translated in "all" languages of the world, public domain and of the right length to be included to a Wikipedia article.

This is an example of how this could look and sound for the Japanese language:

 
すべての人間(にんげん)は、()まれながらにして自由(じゆう)であり、かつ、尊厳(そんげん)権利(けんり)とについて平等(びょうどう)である。人間(にんげん)は、理性(りせい)良心(りょうしん)とを(さず)けられており、(たが)いに同胞(どうほう)精神(せいしん)をもって行動(こうどう)しなければならない。
subete no ningen wa, umarenagara ni shite jiyū de ari, katsu, songen to kenri to ni tsuite byōdō de aru. ningen wa, risei to ryōshin to o sazukerarete ori, tagai ni dōhō no seishin o motte kōdō shinakereba naranai.
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Project-History[edit]

The project benefited of a Librivox-project, which also had the goal of creating recordings of the Universal Declaration of Human Rights in over 50 languages. MichaelSchoenitzer imported those to commons, edited the files (see below) and extracted the first article. Through a project in the German Wikipedia they were included in the articles there.

Now it's time to make this into a international community-project – Wikipedians all around the globe can record the first article (or even the whole document) in their mother tongue, and the Wikipedia communities can add them to their language articles.

How-to create a recording[edit]

You want to read the first article of the UDHR in your mother tongue? Or you could convince some other person to do so? Awesome.

First: Get a microphone. Cheap Microphones have of course a lower sound quality, but if you follow the descriptions below even a cheap microphone will give reasonable results. If you live in a country with a local chapter you can ask there whether they can help you getting a microphone, for example in Germany you can borrow a microphone at Wikimedia Deutschland.

You can get the translation of the UDHR in your language at OHCHR.org. Find the first article and copy it to an editor or text program and format it in a way you can most comfortably read it. Before starting the recording read it two or three times loud and drink some water. If you misread a word simply read the word or group of words again and later cut out the wrong version.

Very important: when doing the recording make sure you also record at least 5 seconds of silence at the beginning or end of the recording – this is needed for editing. If you never did a recording, we recommend to use the free software Audacity. If you have a passive microphone (without power supply): activate the microphone boost and put the volume control to maximum. For an active microphone make sure the audio is not that high, that you reach the maximum gain when recording. After the recording, mark the part with silence you recoded and click on Effect -> Noise Reduction and click the Get Noise Profile button. After that select the whole recording (Edit > Select > All or the hotkey CTRL + A) and go again at Effect -> Noise Reduction and click the OK button. After that you can remove the silence and if there were any the misread sections by simply selecting them and pressing Del. After that use from the Effect-Menu the filters Compressor, Leveller and Normalizer in this order. The default settings should be fine. When you are done, go on File -> Export audio, choose Ogg Vorbis as format and save the file.

Upload your recording to Wikimedia Commons, put it in the Category Audiorecordings of Article 1 of the Universal Declaration of Human Rights and add it on the listing below.

More tips for high-quality audio samples can be found in: A short guide to the recording of high-quality audio samples for Wiktionary

Project Status[edit]

So far we have recordings of the following languages:

Language Full recording Recording of Artikel 1 German Wikipedia your Wikipedia…

edit

Afrikaans Done Done link 1

no
Arabic Done Done link 1

Done
Acehnese Done Done link 1

Done
Balinese Done Done link 1

??
Basque Not done Done link 1

Done
Brazilian Portuguese Done Done link 1

Done
Buginese Done Done link 1

Done
Bulgarian Done Done link 1

Done
Catalan Done Done link 1

Done
Chinese (Mandarin) Done, 2 Versions Done link 1

Czech Done Done link 1

Done
Danish Done Done link 1

Done
Dutch Done, 2 Versions Done link 1 link 2

Done
English Done, 2 Versions Done link 1


Esperanto Done Done link 1

Done
Faroese Done Done link 1

no
Finnish Done Done link 1

no
French Done, 3 Versions Done link 1 link 2 link 3

Done
German Done Done link 1

no
Modern Greek Done Done link 1

Hebrew Done, 2 Versions Done link 1

Done
Hindi Done Done link 1

no
Hungarian Done Done link 1

Done
Indonesian Done, 2 Versions Done link 1 link 2

no
Italian Done, 2 Versions Done link 1

Done
Japanese Done Done link 1 link 2

Done
Javanese Done Done link 1

Done
Javanese (Semarang) Done ToDo no article
Kapampangan Done Done link 1

Done
Korean Done Done link 1

Done
Latin Done, 2 Versions Done link 1

no
Latvian Done Done link 1

Done
Luxembourgish Done Done link 1

Done
Malay Done Done link 1 link 2

Minangkabauian Done Done link 1

Done
Nynorsk Done Done link 1

Todo
"plain" ??? Done ToDo ???
Okzitanian (Languedocien) Done Done link 1

no
Oriya Done Done link 1

Done
Persian Not done Done link 1


Polish Done, 2 Versions Done link 1

Done
Portuguese Done, 2 versions Done link 1 link 2

Done
Romanian Done very bad Quality ToDo
Russian Done Done link 1

Done
Swedish Done, 2 Versions Done link 1

Done
Slovak Done Done link 1

Done
Serbian Not done Done link 1

Todo
Sesotho / South Sotho Not done Done link 1

Done
Spanish Done, 2 Versions Done link 1 link 2

Done
Sundanese Done Done link 1

Done
Tagalog Done Done link 1

Done
Tamil Done, 2 Versions Done link 1

no
Turkisch Not done Done link 1

(bad quality)

Todo
Ukrainian Done Done link 1

Done
Urdu Done Done link 1

Done
Walloons Done Done link 1

Done
West Frisian Done Done link 1

Done
Yiddish Done Done link 1

no

Add language


Open Questions and Tasks[edit]

  • Should we also make recordings in different dialects?
  • How do we link the audio-files on Wikidata?
  • How do we reach native speakers of small languages?
  • Design a logo for this project