Grants:IdeaLab/COOL-WD

From Meta, a Wikimedia project coordination wiki
COOL-WD
To complete or not to complete, that is the question
idea creator
Fadirra
volunteer
Amachishiro
this project needs...
volunteer
developer
join
endorse
created on12:12, 29 March 2016 (UTC)


Project idea[edit]

What is the problem you're trying to solve?[edit]

Wikidata is a great platform for collecting information, and the high quality work of many authors yields very reliable information. Still, a challenge for users of Wikidata is that there is no way to see whether all data on a certain topic is in Wikidata. For instance, it is easy to see that Malia and Sasha are children of Obama, but there is no way to specify that these are all his children. More generally, Wikidata stores many facts, but it stores no information about which topic it contains all facts.

What is your solution?[edit]

Screenshot of COOL-WD
Screenshot of COOL-WD

We have developed a prototype that allows to add and manage completeness information on Wikidata. With our prototype, called COOL-WD (Completeness Tool for Wikidata), one can:

  1. See completeness statements for Wikidata facts
  2. Add, remove, aggregate and filter completeness statements
  3. See how completeness statements allow conclusions about the completeness of SPARQL queries over Wikidata.

COOL-WD is available at http://cool-wd.inf.unibz.it/ and a 3-min demo video can be found at http://cool-wd.inf.unibz.it/coolwd-hd.mp4 It employs various libraries, most importantly GWT, Apache Jena, SQLite and the Wikidata API. The formal background and description of the tool including an indexing technique for completeness statements have been accepted as a research paper at ICWE 2016 (http://icwe2016.inf.usi.ch/) available to download at: https://dl.dropboxusercontent.com/u/5622977/permalinks/16_03_01%20-%20COOL-WD%20Paper.pdf

Below are some naive ideas of how completeness could be useful to users:

  • Use Case: Managing Completeness of Geographical Data

Rido is a geographer who would like to contribute to Wikidata about the administrative divisions of regions. He cares so much about data quality, especially data completeness, and is collaborating with Simon, another geographer. However, when completing data on Wikidata, there is currently no way to mark which data is complete. Rido and Simon must make these notes about completeness manually in, say, a Google Doc. Worse still, the effort from Rido and Simon to complete data could not be appreciated by Wikidata users since to the users’ eyes, there is no difference between complete data and incomplete data on Wikidata.

Demo: Wikidata is complete for all administrative divisions of Saxony (http://cool-wd.inf.unibz.it/?p=Q1202)

Complete for administrative divisions of Saxony
Complete for administrative divisions of Saxony
  • Use Case: Movie Application Optimization

Jen is a developer of a moviegoer application. She usually integrates data between multiple sources including Wikidata. If some movies on Wikidata have completeness statements, she might optimize her application to not search in other data sources for those movies.

Demo: So, when her app is asking on COOL-WD at http://cool-wd.inf.unibz.it/?p=query for cast and screenwriters of the movie Before Sunset (http://cool-wd.inf.unibz.it/?p=Q652186):

SELECT * WHERE { wd:Q652186 wdt:P161 ?c . wd:Q652186 wdt:P58 ?s }

Her app gets not only query answers but also the completeness information of her query.

Complete for administrative divisions of Saxony
Complete for administrative divisions of Saxony

Goals[edit]

We would be excited to achieve the following goals:

  • Possible native support in managing completeness on Wikidata.
  • What is completeness? When is information complete? What types of information can be said to be complete? We would like to hear what the Wikidata community says about these questions so that we can have a policy consensus in managing completeness information.
  • Adding a feature to better crowdsource completeness information
  • A more scalable system
  • Better support for provenance of completeness information
  • Better analytics feature for completeness information (e.g., how complete are we for the languages of all the cantons in Switzerland?)
  • And so on (we would love to hear your feedback!)

Get Involved[edit]

About the idea creator[edit]

I am Fariz Darari, a PhD student at the Free University of Bolzano in Italy. My research area is about data quality on the Semantic Web. The proposed idea is a joint idea with Werner Nutt, Sebastian Rudolph, Simon Razniewski, and Radityo Eko Prasojo.

Participants[edit]

  • Developer Data completeness is an important issue! I would like to contribute by developing more ideas and features of COOL-WD. I am also interested in developing an automated ways to populate completeness statements into COOL-WD. Radityoeko (talk) 16:57, 29 March 2016 (UTC)
  • Volunteer I'd be happy to help in the modeling or reasoning side of this project. Amachishiro (talk) 16:58, 29 March 2016 (UTC)

Endorsements[edit]

  • I would love to see Wikidata featuring completeness information. Fadirra (talk) 13:17, 29 March 2016 (UTC)
  • i like the idea behind it. it can make world better through more reliable and valid data Hafizhudani (talk) 04:01, 30 March 2016 (UTC)
  • great project Hafizhudani (talk) 04:01, 30 March 2016 (UTC)
  • The idea is very good and useful. The completeness information will help us to decide whether we still need another information or not. Conanx (talk) 10:04, 30 March 2016 (UTC)

Expand your idea[edit]

Would a grant from the Wikimedia Foundation help make your idea happen? You can expand this idea into a grant proposal.

Expand into an Individual Engagement Grant
Expand into a Project and Event Grant