Jump to content

User:Alan Ang (WMDE)/How to: Wikidata/Basics

From Meta, a Wikimedia project coordination wiki
Wikidata nodes in blue
Wikidata nodes in blue

Basics

[edit]

Welcome to the basics of Wikidata. This section aims to provide you with a brief understanding of how to get a basic understanding of Wikidata. Of course, I am assuming you are aware of what Linked Open Data already is, and also have a fundamental understanding of what CC0 means.

A basic knowledge of Wikidata will require you to understand the following:

  • Knowing that Wikidata is a knowledge graph that lies in the heart of Linked Open Data web. If not, please take this quick course on Databases and Linked Data.
  • Everything on Wikidata is published under CC0 license, meaning anyone and everyone is free to reuse the data on Wikidata.
  • That the data on Wikidata is structured data in a Triple structure; namely Item, Properties and Values.
  • Wikidata is multilingual and collaborative, and made for both humans and machines to read and access.


As a basic user, you should know how to:

  • create an item on Wikidata.
  • make edits to an existing item on Wikidata.
  • create multiple items or make edits to multiple items on Wikidata via QuickStatements.


If you are ready, then let's begin!

Step 1: do this course.

Step 2: check out these links to some basic training resources:

  • User:Epìdosis/Wikidata-intro (collections of links as introduction to Wikidata). If this page looks overwhelming to you, there is a more familiar interface which you can also check out:
  • Wikidata Tours- useful page for starters

If you are a French speaker, then this would be really useful for you:


If you prefer to watch rather than read, then check out this YouTube playlist of Wikidata Tutorials (short bite size videos) for the busy folks contributed by User:Masssly

Finally, if you are unable to do any of the above courses/ links, the following sections below will guide you through the basics of Wikidata.

I. Introduction

[edit]

Note: the following content are sourced from Introduction to Wikidata course by WMF.

What is Wikidata

Wikidata is an open, multilingual structured knowledge base that can be read and edited by both humans and machines.

Open

CC0 license means this data is free and open. All content on Wikidata is available under a CC0 license. This means both the information and the data structures are in the public domain. You may freely use, share, and remix data that is CC0 license. The combination of open data and an open data structure are free and open, this encourages people to combine their own data with Wikidata, and allows for the use of tools to manipulate and interpret the data in a variety of ways.

Multilingual

Data entered in any language is immediately available in all other languages. Editing in any language is strongly encouraged.

Structured Knowledge Base

Wikidata records not just statements, but also the relationships between those statements. This makes it possible to search for information in a structured way.

These relationships between things make Wikidata a powerful resource for the entire internet, illustrating how millions of pieces of data relate to each other.

Read and Edited by Humans and Machines

Unlike some databases, everything on Wikidata is stored in a way that can be read and understood by humans.

It's also editable - you can click the “edit” button to make changes to the database! Behind the scenes, there's a community of people who make contributions to Wikidata. We'll talk more about this community later on.

Every item is known by its Q-number as a unique identifier, and labeled in different languages and variations. Wikidata records 220 variations of the former Libyan leader's name, but because they're all tied to a single Q number, there's no confusion.

Machine-readability also allows bots to edit Wikidata.

This makes Wikidata important to the internet...

Machine-readability allows digital assistants (Siri, Alexa, Google) and bots to add and pull information from Wikidata and present it to their users. Human-readability allows people to do this as well.

This can make data from libraries, cultural institutions, and civic organizations more accessible.

Since Wikidata uses a CC0 license, a lot of companies developing AI use Wikidata as part of their AI's knowledge base. Just as you can think of Wikipedia as an encyclopedia of encyclopedias, you can think of Wikidata as a database of databases. Other databases are linking their items to Wikidata items though identifiers (more on those in a few slides), allowing more than 700 other databases to help describe items. This creates a centralized location of information about the same concepts, things, or people.

...and important to Wikipedia

Wikipedia exists in more than 300 languages, each with its own unique articles. There are countless articles that exist in one language, but not another. Wikidata, on the other hand, is language-agnostic.

For example the University of South Africa has individual Wikipedia articles in more than 20 languages. As enrollment changes from year to year, information needs to be updated on each Wikipedia language version or it will quickly go out of date. If each Wikipedia language version pulls information from a single Wikidata item, the data needs to be updated only once, and the person updating it doesn't need to be comfortable with all — or any — of the languages used in the various Wikipedia language versions where the article appears. If all the versions pulled enrollment data from the same place, then all the versions would be up-to-date (as long as Wikidata's values are up-to-date).

The structure of Wikidata also makes it easier to identify gaps in certain language Wikipedias and allows the article-creation process to happen faster by calling data from a central location.

Since data on Wikidata is centralized, users from any language can add to and access it. This has implications for equity, collaboration, and access on a global scale.

Data Structure

Wikidata is a document-oriented database. Facts are stored as structured data within documents. Each document is indexed by a unique identifier, the Q-number, which distinguishes it from every other document in the database. This format allows for detailed, expressive querying, and makes the data machine-readable.

At the heart of each document is the statement, a three-part structure that connects an item, a property and a value.

Presenting facts, not context

Wikipedia presents information in a plain language format that's easy to read but difficult to query. Comparisons across a whole range of articles is especially difficult.

Where Wikipedia is great at capturing the context around the subject of an article, Wikidata is designed to express facts.

Creating Items

[edit]

Not every item in the world should be on Wikidata. Wikidata, in its first phases, has two main goals, namely, to centralize interlanguage links across Wikimedia projects, and to serve as a general knowledge base for the world at large. An item is acceptable if and only if it fulfills at least one of these two goals.

See:

Understanding Properties

[edit]

Knowing and understanding what "properties" mean in Wikidata is fundamental to help you navigate within the Wikidata knowledge graph. It would also inspire you to think about how to model your data later on.

See also

Editing Items: OpenRefine

[edit]

See OpenRefine course by the WMF

Importing an inventory into Wikidata

[edit]

See How to import an inventory