Jump to content

Wikimedia Diversity Conference 2013/Documentation/Wikidata

From Meta, a Wikimedia project coordination wiki

Session: Gerard Meijssen // Wikidata as a tool to bring initial information

[edit]

Abstract

[edit]

All Wikipedias suffer from a lack of information about the "global south". When this information is known in Wikidata, it is relatively easy to bring this information as a stub to many languages. This is already done on some Wikipedias. In this presentation, Gerard Meijssen will explain what is needed to bring information to a language. He will also indicate possibilities that become possible as Wikidata matures.

Starting point / Insights

[edit]

Leveraging data as information for all languages, cultures and interests

Wikidata:

  • All about making statements about items. e.g. Nelson Mandela - nationality South Africa.
  • Place where all the inter-lanugage links are stored.
  • All information, all data and provides multi-lingual support
  • Wikidata grows really rapidly
  • Started with the interwiki-links
  • For example, one person can add all the labels referring to Nelson Mandela in its native language (e.g. Setswana)
  • This made it powerful, because it had lots of data to work with.
  • Many bots harvest info from WP, put it into Wikidata.
  • 60% of the items in Wikidata don't have any statements linked to them at the moment.
  • Statements can derived from any Wikipedia (e.g. Stasi information from the Russian WP, or ko-wiki) - only need the local link, not much info.
  • Effect: hard to manipulate information, more "neutral" information for more people
  • Terminology: item (is a subject like Nelson Mandela), label (string of text, describes item in the language - e.g. horse in English and Pferd in German), link (hyperlink to Wikipedia e.g. en.wikipedia.org/wiki/Horse, Wikicommons etc.)

Item can be something like a subject. Can have a label or no label. Can be conneted to more than one language. Each language will have a label (e.g. "Horse" in English; "Pferd" in German). For example, http://en.wiki.org/wiki/Horse would be a link. Label doesn't have to be English. Creating new labels: function in the Wikidata interface.

Using WD for search: can be used to find things in Commons. Important for education. For example, a young child can search in their own language if there are WD labels attached to pictures, they will work in all languages.

Challenges

[edit]
  • Have links without labels - e.g. an article in English, but Wikidata doesn't have a link to it. Bots are fixing this.
  • Have items without labels in given languages (please add labels, they'll be found)
  • Articles which are not subjects in their own right, e.g. lists; categories. Will be solved in future.
  • More than 50% of labels don't have statements associated with them, and some will have labels in some languages and not in others. Better to have these issues than to have nothing to work with, and things are improving month by month.

Issues with search

  • If the search item is not in a certain language, search doesn't work. Reasonator gives useful info on people, like links to relatives, etc. if WD has info on the person.

WD will show what people are looking for, but not finding. Can show impact of adding labels in certain languages.

User participation: Build on the data that we have. Excel file on female painters, but now on WD with a query that has been built for the question, and it's being used.

Q: Does this get done during edits? A: People include geodata on articles of railway stations. and it's working!

Q: Collecting search data? Why not? A: Low priority from WMF, but maybe if they see the impact, they might do it.

Babel: Add translations - if you add your language skills you can enter labels and descriptions also in other languages which you have defined via Babel

Concept clouds: expand the scope of an article by using info that is available from Wikidata. Many labels are missin from the concept clouds.

Can provide basic infoboxes automatically from Wikidata info. This will make us more enabled in terms of sharing the sum of human knowledge.

  • makes it more easy for people to ad labels

Ideas

[edit]

Compile a "concept cloud"

Categories are ontologies in their own right. That means that you can leverage the info in categories in WP.

  • Category is similar to query in Wikidata
  • This can provide redlinks in a list. Then we have all that it takes to link to statements o the subject.

Can make smaller Wikipedias more advanced by leveraging this info. We know of millions of subjects, andhave info in Wiktionary, so we could expand available info dramatically.

Statistics:.

  • 60% of items don't have statements.
  • As we add data, this will not affect the trend.
  • Gender ratios in WD also refer to things that don't have articles.
  • Gender ratios can be tracked, for example.

What sex is a eunuch? Also a state that can associate with sex.

WMF stats - not really relevant or up-to-date. Magnus's stats are updated more frequently, and shows growth in statements, items with links/labels etc. See progress that way. Near real-time - list of people with birth dates before 1900, no date of death. When date of death is added, the list is updated immediately

  • Statistics on failed searches (have no priority so far)

Visualisation:

  • Helps people undrestand the data; motivates people to add data, labels, statements. May even motivate people to write WP articles.

Q: are relationships bidirectional? E.g. if I link Chelsea Clinton to Hilary, does the reverse happen? A: No.; also, some statements are not right, so good not to be too automatic.

Gender info not that relevant, but as more connections are added, WD gets more useful.

Start with what you care about. Make sure you enjoy yourself. Use the tools (like reasonator) Doesn't matter what you add data about, because it's all good.

Questions / Next steps recommendations

[edit]

Q: "Stone bridges in Belgium" would be an intersection of "made of stone" "bridge" and "belgium". [what was the question?] A: If you do this as a query, you don't just get the WP article, but also things that don't have articles in WP.

Q: Why not have categories in WD? A: If you don't think of WD as just for Wikipedia, but serving other projects, then categories aren't that valid as objects. Response: Map categories to queries? Reply: Stone railway bridge in belgium category would then be intersection of "made of stone", "located in Belgium", "railway bridge"

Wikidata will be used to curate women scientist info (list). Can be used to combat systematic bias, because it's easier to add someone to Wikidata than to write an article on WP.

  • all women chemists all over the world --> helps "cure" the bias in information about women scientists

85 women botanists listed on WP, but only 20 listed on Wikidata.

Q: Search for "actress/scientist/woman" didn't work: why? A: Data is probably not yet in Wikidata. Label probably not there yet.

Q: How many women scientists are missing from Wikidata. A: Only about 1/4 of women scientists on Wikipedia are on Wikidata. Shouldn't take to long to "fix" this, adding information may only take 2 mins per person

Key take-home message is that when you add to Wikidata, it helps all languages.