Jump to content

Africa Growth Pilot/Online self-paced course/Module 2/Contributing labels and descriptions to Wikidata

From Meta, a Wikimedia project coordination wiki

Labels and descriptions on Wikidata: in order to teach Wikidata to be more multilingual. Because Wikidata is multilingual. But how multilingual is it? It's as multilingual as we make it. If speakers of French don't provide French labels and descriptions, Wikidata won't know French. And if speakers of Ga won't pronounce provide Ga labels, it will not speak Ga or Igbo or Zulu or any other language.

So here's how this works: You can tell Wikidata what languages you speak by putting a "Babel box" in your user page. I'm going to show you my user page. This is my volunteer account. That's why it doesn't have WMF in it. I also have a staff account, but this is my volunteer account and it gives some information about me. And it also, by the way, has links to the tutorials that I mentioned, on Wikidata.

But anyway, what I want to show you right now is this, the "Babel box", this little box here that shows you some of the languages I speak. I'm a native Hebrew and English speaker, and I speak these other languages at a beginner level. Okay, these are the languages that are relevant for me. You will notice, for example, that Zulu is not here because I don't speak any Zulu at all.

Now why does this matter? Before we say why it matters, let's look at what it looks like if I edit this page. This is the relevant line here. You see this line? We just use language codes, you know, the two or three character language codes to show what languages are relevant for me.

And why is this relevant? It's relevant because if we go to some item on Wikidata, you can see here that the languages that I'm shown for Douglas Adams, English science fiction writer and humorist, you can see that I'm also shown Douglas Adams in Arabic and in German and Spanish and French and Hebrew and Italian and Russian and Ukrainian, because these are precisely the languages I had listed in my Babel box.

If you make sure you have a Babel box like that on your Wikidata user page, Wikidata will show you the relevant languages for you. And why is this helpful? Because if one of these languages was missing a description in a language that I can speak, I could just click edit here and add it! Add the missing description. By the way, at the bottom here there is this link called All Entered Languages. And here I can find all the other languages that have not been mentioned. So there's Afrikaans here for example, which does have a description, but Akan does not have a description. So those of you who speak this language could just click edit and type the Akan equivalent of "English writer and humorist".

And that can help. In some languages, we literally don't even have a label. Let's look for one. Uh, Twi, for example, doesn't even have a label. Meaning we don't even know how to write the name of Douglas Adams in this language. This is a fairly well known author and a fairly early Wikidata item, so there's plenty of information here. But if we take someone a lot less known, like, I don't know, this Hebrew poet, for example, you will see that a lot is missing. This Hebrew poet doesn't even have the Arabic version of his name here. Someone needs to add it.

Okay, so adding labels and descriptions can really change people's experience. And I want to demonstrate this by changing my interface language to, let's say Yoruba. Okay. And now Wikidata will do its best to speak Yoruba to me. You can see that the search box, for example, has changed and is speaking Yoruba to me. And the talk page here is now speaking Yoruba etc.

By the way, the names of the languages have also been translated. And the statements here, are... This, I guess, is the equivalent of "instance of", of type. And I guess this means "human", right? And this means "image". I'm just guessing, because I see the content type here. So for these various properties there is a Yoruba label.

But what happened here? This says "country of citizenship" and I can understand it, which means it is not Yoruba because I absolutely understand no Yoruba at all. So why does this say "country of citizenship"? Because nobody who speaks Yoruba has told Wikidata how to say "country of citizenship" in Yoruba. And any one of you who does speak the language could just click on this property once. Only one person needs to do it, once; click on this and literally click edit. And here in the label, type in Yoruba, which obviously I cannot do; type the equivalent of "country of citizenship" and press enter and that's it. Nobody will ever have to teach Wikidata again how to say "country of citizenship" in Yoruba. And you saw that I didn't have to look hard to find a very basic term that isn't available yet in Yoruba, or "pseudonym", too, isn't available yet. And "languages spoken, written or signed" has not yet been translated to Yoruba. So you can really help by making sure that all of these terms exist in whatever languages you speak as a native speaker, and this can tremendously help other people using Wikidata.

It can help Wikidata generate knowledge in any language. By the way, maybe some of you have heard the latest project that the Wikimedia Foundation is working on is called Abstract Wikipedia. It's an extremely ambitious project that, once it's ready -- it is not ready yet -- but once it is ready, will enable us to compose facts and knowledge in a language-neutral way. And then, once that language-neutral model exists, to generate text, native-sounding text, in absolutely any language that Abstract Wikipedia will be taught.

So that will be a huge feature for languages with smaller communities. Not necessarily small languages, right? There's, for example, 90 million speakers of Punjabi on Earth, but it's still a relatively small Wikipedia. So communities that have not yet built a very big Wikipedia could benefit from these machine-generated -- not machine translated! -- machine generated native text from this language-independent model. I realize this sounds extremely abstract. It's literally called Abstract Wikipedia!

This talk isn't an introduction to Abstract Wikipedia. I'm just mentioning it because your work in translating labels and descriptions will feed into that future project. That is why I mentioned that.

So okay, suppose you want to add some labels in Twi or Igbo. How do you know what is most needed so that you're not just contributing very obscure things? These links will take you to lists of significant items, items with a lot of links, items that are presumably of more central or more viewed topics that don't have labels or descriptions in these languages. So you can click on these links. This link is for Igbo, but if you change the language code from Igbo to Yoruba, you will have that list for Yoruba or for whatever other language you want.

So I encourage you to start contributing in this way as well. Again, no special skills are needed to do this.