Celtic Knot Conference 2020/Submissions/Hands-on Wikidata and lexicographical data/Discussions
The below content is an archive of the note-taking pad that was used for this session.
Welcome to the note-taking pad of one of the Celtic Knot Conference 2020 sessions! This space is dedicated to collaborative note-taking, comments and questions to the speaker(s). You can edit this document directly, and use the chat feature in the bottom-side corner.
✨⏯️ Session details
- Name: Hands-on Wikidata and lexicographical data
- Speaker: Léa Lacroix, Nicolas Vigneron
- Link to the video/replay: https://www.youtube.com/watch?v=oDM5QJAJzNc
- More details: Celtic Knot Conference 2020/Submissions/Hands-on Wikidata and lexicographical data
- See also: Wikidata helpdesk Celtic Knot Conference 2020/Wikidata helpdesk
Feel free to add questions here, while or after watching the session. Please add your (user)name in bracket after the question. The host of the session will pick a few questions to ask them during the livestream. The speaker or other participants will answer on this pad (asynchronously: the answer may come in a few hours or days).
- (From Youtube): no reference is needed when you add an information on Lexeme?
- It's not mandatory but very welcome :) references = <3
- Is it possible to include pronunciation (audio file) for a lexeme? P443 Thanks
- in a community with little access to the internet or computer, is there a possibility of a pleasant printed format for use at school for example?
- you can build a tool or do a SPARQL query.
- I know Theklan is working on something like that for Basque
- Is P5972 (translation) meant to be used in the same way as old interwikilinks (ie. all items links to all translations)?
- The answer to this question is not entirely defined yet ;) the community is still discussing about the best way to structure the data, but yes, in most cases, that's what they are aiming for. (Léa)
- Limerick Q133315 has only one reliable source used as a reference. Are there any strategies to help increase the referencing on articles?
- We have different strategies to improve references on Wikidata:
- More eyes on the content! Displaying data & references in other projects, like Wikipedia, help people noticing that references are wrong/outdated/missing, and hopefully encourages them to fix it, especially when the Wikidata Bridge will allow editing/adding references directly from Wikipedia https://www.mediawiki.org/wiki/Wikidata_Bridge
- Semi-automated tools: the development team has been working on a tool that is scraping the web looking for references, for now it takes the form of a game where people can get suggestions of references and add them if they look good. https://tools.wmflabs.org/wikidata-game/distributed/#game=73 Hopefully these kinds of tools will perform even better in the future.
- Hope that helps :) Léa
- We have different strategies to improve references on Wikidata:
- Does anyone have an idea what Limerick should be in Cornish? [User:DavydhT]
- I looked it up in NJW and it has Lymryk. That probably wants to be SWFified but I wasn't quite sure how so I put it in as-is for now. [User:Gwikor Frank]
- not sure, in Breton we just took the Irish name ;)
🖊️🔗 Collaborative note-taking
Feel free to take notes about the session here, add some useful links, etc.
- This session will include a basic introduction to Wikidata; but see also Mohammed's presentation from yesterday 
- Looking at a Wikidata article — it looks like a normal wiki but a bit different. It stores information in a structured, multilingual form. You can access information in your own language on the same website.
- Looking at the item about Limerick (Q133315 - https://www.wikidata.org/wiki/Q133315). It is currently in English but you can change the language and it immediately adapts to that language. The reason it can do that is because the content of Wikidata is provied in many languages.
- This is done from the "term box" at the top of articles, showing labels, descriptions and aliases for each item. You can display all the languages in a big list if you want, but it's collapsed by default. You can see many descriptions are missing — perhaps you can populate those?
- In order to describe properly what Limerick is, though, we need more information, the "statements". For example, we have a property "instance of", answering the question "what is it?", here it is a city. You can see it is a link to Q515, the item about a city, providing the same kind of information. This is one of the main powers of Wikidata, it connects to linked items.
- We have many statements, here, the name in its native language, which adminstrative areas it is in, the population. Here you can see that there are several values, because the population was calculated at several points in time. So here we can use another piece of information, a "qualifier" showing the point in time (P585) at which the data was calculated.
- [Something about references; see https://www.wikidata.org/wiki/Help:Sources ]
- You can see that on the Limerick item there are several statements without references — another place you can help 😊
- Moving on, there are external identifiers, pointing to various "authority control" databases by each unique identifier, in each case these are links to those external sources. Wikidata has its own unique identifiers too, in the URL and at the top of the page is a Q ID (Q133315 for Limerick, as mentioned above). This means there is a permanent URL for Limerick, even if the name changes in the future, for example.
- [Site links, to other Wikimedia projects]
- [Edit buttons]
- Maybe it would be more interesting if we create a new item. GIven the Celtic theme, we thought it would be nice to create an item for this a Breton dictionary:
- Nicolas created https://www.wikidata.org/wiki/Q97122546
- [instructions on creating the initial label and description should go here]
- FIrst this will only be in 1 language, but we can add the othert language labels here but clickinkg edit and populating the term box
- Comment: For seeing Wikidata labels and descriptions in more lanaguages than your own, the "labelLister" gadget is useful; activate it in your Wikidata "Preferences"
- You can add a statement "instance of" [P31], this is an online dictionary — notice that it suggests that as you type, so that it links to the appropriate Q ID.
- Then you can add the official website [P856] and paste the URL. You save that and you can see it shows us an error flag, because we need to go back and add the language of this URL.
- But there are 2 URLs for this dictionary, as it is also available in French, so we can add another value for the "official language" with the French URL
- [Added another property for language of work]
- Someone mentioned in the chat that there is a tool called "VIPs labels", which means you don't need to copy/paste the label into all the different languages; it allows you to set the name for all languages in the Latin alphabet at the same time.
- Comment: VIP Labels: https://www.wikidata.org/wiki/User:Pigsonthewing/Setup#Scripts
- [something else I missed?]
- So that was the creation of an item — something describing a concept in Wikidata — but someone told me that we can create words here.
- [Description of the difference between Q items and L lexemes that I was too interested in to type up]
- So let's add a new lexeme for "kazh-koad", the Breton term for a house cat (L305362). So you type the term, pick the language and the lexical category, in this case noun.
- You can add statements such as that this term is made up of 2 parts: "kazh" meaning cat and "koad" meaning woods.
- You can add statements about it being "described by source" [P1343] TermOfis (Q97122546), the Breton dictionary we added earlier
- You can also add "senses" [I didn't understand this]
- Then you can add a statement "item for this sense" [P5137] and link to the concept item (the Q ID) for the house cat.
- Now you can add "forms", so you can add the singular and plural forms, for example.
- So now you can see how to add a lexeme. You can add references for each statement — I only added the "described by source" property here for the sake of speed. If you look on kazh (L458) you can see the page numbers are added
- You can also look at visualisations; this is Wikidata Lexeme Graph Builder  and we can look at a "root entity" (L305362) and you can see that it is composed of these parts "kazh" and "koad"; "kazh" (L458) already existed and had information about the etymology of the Indo-European roots *kaθ [L###] and *kattā [sp? L###?]
- Another tool I like is Ordia (https://ordia.toolforge.org) which gives another way of displaying lexemes. I particularly like that you can look at languages and see how many lexemes exist for each language. We can see that Russian has the most lexemes, but there are other languages here: English, Latin, Hebrew, Basque and so on.
- Comment: from Nikki: Or feel free to ask me to adapt it for you :)
- There are a lot of other tools as well.
- One other thing, when you go to your own User: page, it is very important to include your Babel code, the first line in the source of d:User:VIGNERON. This is very important not just for telling other people what you speak, but it also determines which languages are shown in the term box for you.
- Comment: How to set up Babel = https://www.wikidata.org/wiki/User:Pigsonthewing/Babel
- A question (from User:OwenBlacker) in the YouTube chat asked about Celtic mutations; those are usually added in Forms at the moment — if you look at "ki" (L69) you can see the Forms include all the mutations. In case you didn't know, Celtic languages change the first letters of words in certain circumstances and you can see them in the Forms.
- A questioner asked about pronunciation informaiton, there are properties "pronunciation audio" and "IPA transcription" for example — you can see these on L99 "Luftballon" (an example of a Wikidata Easter egg)
- [A question I missed]
- Jens asked about free dictionaries online and what people think about mass uploading dictionaries for specific languages. Léa reminded us that the licence for Wikidata is CC-Zero, so we can only upload content that is also CC-Zero or Public Domain. While we can, it is also important to bear in mind if we should? Before doing any import, it is important to ensure you have the people to take care of that import and ensure that the information is imported accurately.
- With regard to Lexeme's relations to Abstract Wikipedia, there is some information in this paper. https://storage.googleapis.com/pub-tools-public-publication-data/pdf/9d16c3ed6771b62c8f66785479a734529d1a2f75.pdf
- "How far can we go with entering lexemes? Can we enter localised forms?"
- Looking at Luftballon (L99) we can see there's the language code "de", but not all languages have language codes. If yo ulook at "ama" (L1), you can see that Sumerian doesn't have a language code recognised by Wikidata yet, so you can put a language code linking to the QID of the language: "mix-x-Q36790" for Sumerian here. And you can have various scripts, so the Cuneiform is shown here — we saw the same thing with Serbian earlier, displaying both
- "Can I see a list of all the lexemes in a given language?"
- Yes; there are several ways, for example SPARQL, also on Ordia https://ordia.toolforge.org/language/Q7737
More information about the Celtic Knot Conference 2020: https://meta.wikimedia.org/wiki/Celtic_Knot_Conference_2020
The Friendly Space Policy also applies on this space: https://meta.wikimedia.org/wiki/Celtic_Knot_Conference_2020/Friendly_Space_Policy