Recommendation # 10: Wikioral, a Project with Voice Recordings

Q 1 What is your Recommendation?

A Wikimedia project that aims at gathering short voice recordings by volunteers to support the other project’s content (sections and entire articles), languages subject to different degrees of oral use, and also blind users.

Q 2 What assumptions are you making about the future context that led you to make this Recommendation?

In order to propose a Wikimedia project based on voice recordings, we assume the following:

One of the recurring themes in Wikimedia conferences is how to support audiovisual in a better way. Even though it is assumed that some Internet users from multiple ages prefer other content types, there is not a clear solution on how to integrate them to Wikipedia, for it is difficult to create, edit, and update as content becomes obsolete. Nonetheless the need for audio and video and our lack of a platform to host it is a problem, when other sites like Youtube offer such knowledge. If our Movement aims to be the essential infrastructure to knowledge, we must include audiovisual works, which are an undeniable source of knowledge for many topics.
Another important discussion that appears in diversity feedback and Wikimedia-based discussion is: how can we help oral languages? It is known that there exist thousands of languages (an estimate of 3000) that are unwritten. Nonetheless, this number may be higher when considering those languages whose process of standardization was not accepted by all their speakers or these remain in high levels of illiteracy (e.g. Berber language).
As we know, numerous times Wikipedia language editions have been created allowing the use of multiple dialects of a language (i.e. Portuguese, German, etc.). Failure to recognize language plurality has meant that some versions have been neglected in favor of others and different sorts of tensions keep appearing. In some dialects, the written language is similar but the language pronunciation is very different. The knowledge based on dialectology - phonetics and language diction - is not stored in Wikipedia or any other Wikimedia project. When we gather the sum of human knowledge, we should feel responsible for language and knowledge in all its forms.
We have largely assumed that the Wikimedia user is a ‘reader’ possibly because “Wikipedia” has been defined as an encyclopaedia. However, there are different types of users that for multiple reasons are not able or not used to acquiring knowledge in written form - either because of illiteracy, visual impairments or preference, as is the case of the audiobooks. One specific kind of user that Wikimedia has not paid much attention to is the blind user. Even though there are extensions to help blind users including readers and editors (i.e. Wikispeech is a 2015-2017 project), it is undeniable that a good way of experiencing the content would be listening to it from real-people’s voices.
One of the main keys to Wikimedia success is its content reutilization. Many external apps reassemble and remix the content in order to provide services based on geolocation, descriptions, among others. Up to now this content has been article texts and (commons) images. The inclusion of voice recordings of people reading article sections, sentences with several Wikidata statement could be a different source of knowledge to be used in these external services (Alexa, GoogleHome, Homepod) and prevent middle size languages from being left out (i.e. Catalan, Finnish, Czech, etc.).
Displaying the number and the languages that a Wikimedian speaks or understands is common among users with a certain experience. Many Wikimedians do it in order to show their work availability. At the same time, some of them admit they have learnt languages reading articles in other’s language editions than their mother tongue. Collecting audio narrations for specific parts of content could help these editors and all kind of users to learn languages listening to voice recordings. Multiple uses of Wikimedia in education could spring from this innovation.

Q 3a What will change because of the Recommendation?

The outcome of this recommendation is a new Wikimedia project for every language containing voice recordings for content in Wikipedia and Wikidata (extensible to Wiktionary) as well as new original content. It would complement the rest of projects (for those languages which have Wikipedia) and would be the single knowledge place for oral languages. Original material not linked to existing content like interviews could also be uploaded.
The way we envision an article in Wikioral is related to a Wikidata item and it contains a collection of recordings contributors could upload. In a similar way to Wikidata, we could understand each edit as a triplet of “Article name - Voice recording title/content description - Voice Recording”. Additional parameters for each recording would also be collected (date of recording, dialect, etc.).
A recording should specify in the “content description” the content to which it refers. For instance, the article related to the item “car” in English Wikioral would have a recording with the title “First lines” and a link to enwiki.car article. The recording would be: “A car (or automobile) is a wheeled motor vehicle used for transportation. Most definitions of car say they run primarily on roads, seat one to eight people, have four tires, and mainly transport people rather than goods.”. It could also be a group of Wikidata properties-values or a narration based on them.
Whenever a change in the original source (enwiki article car), next to the voice recording there should appear a (!) sign pointing that it should be re-recorded. In some articles, content can be quite static (i.e. a poem included in an author biography). Ideally, articles could be listened to entirely section-by-section and their Wikidata properties too with an interface button “play all recordings”.
In the case of oral languages, there can also be reading-simultaneous translation of sections of other languages articles or Wikidata properties-values. In those cases in which there is not a source material to refer to, the Recording title/description could also be a recording in which to pose a Question (i.e. in an Australian aboriginal language, a question could be “what is Dreamtime?”). It is important to chunk content into small pieces that can be updated and verified. Wikioral would need to be flexible to allow incorporating the sources from the original material (i.e. sections or entire articles in enwiki, wikidata, etc.) as well as introducing new ones.
In any article there can be multiple versions of the same recording. In this case, it would be necessary to establish a voting system in order to give preference to one contributor over another in the ‘default’ version of the website. Once logged in, users of the website can also point which contributor they prefer to listen to when available or dialect (when specified in the metadata of the recording). Quality in the voice recordings is something that needs to be addressed.
Besides the ‘play all button’, the audios contained in each Wikioral article or section could be retrieved using a search engine with voice input/assistant. For this reason, it would also be necessary to have recordings of the section titles, so that voice recognition and machine learning algorithms can retrieve the desired recording. This is in line with the initiative voice.mozilla.org.

Q3b How does Recommendation relate to the current structural reality? Does it keep something, change something, stop something, or add something new?

It adds a new type of content and it relates to all the structured content in Wikidata and text content in Wikipedia. It opens to many possibilities and supports diversity of different types.

Shouldn’t this just be a category in Commons instead?

This should be indeed a category in commons. The project I am suggesting is not a database for oral recordings but a place where to 1) facilitate the recording process, 2) relate it to knowledge from wikipedia-wikidata, 3) listen to these recordings.

For oral languages, they would not be able to relate to their written counterpart (their Wikipedia language) but only to Wikidata. Or they could relate to another Wikipedia language edition and specify it is a translation.

How long should the recordings be?

The length of the recordings matter because they may turn outdated when the Wikipedia or Wikidata section they refer to is updated with newer information. It is easier to update a recording of a section than one of an entire article. Also, I believe that recording a section and certain groups of Wikidata statements make sense as a "meaning entity" that can be heard. For instance, the first snippet in a city always talks about: location, name, etc.

It is true that perhaps it is not good to restrict to short but to suggest. We could listen to a full recording of an entire article. But we could also listen to the full article as the sum of the different recordings. The downside of listening to the sum of recordings is that when listening to the entire article we could listen to different voices. This could be a bit distracting is it might provide "unnecessary" information.

Does every recording need to be linked to articles content or sections?

No, a recording wouldn't necessarily need to be related to an existing article or section of an article. Like Wikidata, you should be able to create a recording and link it to some other content but not as a condition.

If for example, PD recordings of oral interviews could be stored and played from the site, that would be helpful. As an example, an affiliate obtained funds from the Foundation and sponsored oral interviews with a group of people to record their history. In fact, I guess this would be a usual case for some content of oral languages - they wouldn't be just reading an English part of an article with simultaneous translation to the oral language.

Are there other attempts to create a non-textual wiki?

Yes, there is the project Videowiki (videowiki.wmflabs.org). This project embraces both audio and video. While this project encompass them as they are both non-text, the interaction with audio differs from the one with video.

Video can be complementary to text contributing with new visual information (after reading an article or parts of it), while audio can totally substitute text (making sometimes not necessary to read the article, as it is basically the same content in terms of words).

Therefore, we cannot assume that video and audio should be in the same Wikimedia project platform. Instead, considering the audio supplementary we believe that there is a need for an audio platform to ease the processes of a) recording and uploading them, b) relate them to existing content (Wikidata and Wikipedia) and c) listening to them.

Q4a. Could this Recommendation have a negative impact/change?

Having this kind of knowledge can be positive. The voice is something much more personal and a sign of identity. Some users may not feel comfortable or may have second thoughts after contributing with some voice recording contributions.

Q4b. What could be done to mitigate this risk?

The system should enable revert so that any contributor can withdraw his or her recordings easily.

Q5. Why this Recommendation? What assumptions are you making?

We have assumed that there are different needs that can be addressed with more types of content (in this case, audio). It is only necessary to provide 1) a way to structure this content so that it can be verified and updated, and 2) an interface that enables users to upload their recordings with a certain ease.

Q6. How is this Recommendation connected to other WGs?

It connects to product and technology.

Q7. Does this Recommendation connect or depend on another of your Recommendations?

No it doesn’t depend on another recommendation, but it ties in with the recommendation Digitization and Resource Prioritization for Marginalized Groups.

Q^. What is the timeframe of this Recommendation in terms of when it should be implemented? 2020, 2021, etc. Does it have an urgency or priority? Does this timeframe depend on other Recommendations being implemented before or after it?

The earlier the better.

Q8. Who needs to make a decision on this Recommendation?

The product team could accept this idea in order to design and develop a prototype so the first ‘user research’ experiments with real users can be run.

Q9. How should the decision be made?

The knowledge and experience accumulated with Wikidata and other WMF projects can possibly help in planning the development of this recommendation.

Q10. What type of Recommendation is it?

The implementation is between complex and chaotic as it is an entire new project. Some concerns may appear in terms of privacy, data storage, etc.

Q12. What are the concerns, limiting beliefs, and challenges for implementing this Recommendation?

It would be necessary to test this new website with speakers of oral languages. The interface should be in an auxiliary language and this may represent one of the design challenges.

QXX. How much money is needed to implement this recommendation?

The cost would be a team dedicated to its development. It is hard to estimate before-hand but previous projects could be indicative of the cost.

Q^. How should the implementation of this Recommendation be monitored and evaluated? By who?

The implementation should be evaluated by any Wikimedian, as it is a potential user (contributor or consumer) of this site content.