WikiConference India 2011/Submissions/DBpedia

From Meta, a Wikimedia project coordination wiki
Timestamp
16:38, 14 August 2011 (UTC)
Title of the submission

DBpedia - Querying Wikipedia like a database

Type of submission (workshop, tutorial, or presentation)

Presentation

Author of the submission

Jinesh Shaji George
Wiki user name - Jinesh.george

E-mail address or username (if username, please confirm email address in Special:Preferences)

Email : jine.george@gmail.com

State of your origin (Country, if you are not based in India)

New Delhi, India

Affiliation, if any (organization, company etc.)


Personal homepage or blog


Abstract (maximum 500 words)

Wikipedia contains amazing amounts of information in it. But how can one use this information to obtain answers to simple questions like “Who are the German musicians born in Berlin in the 19th century?” or “Which are the cities in India, having more than 1,00,00,000 inhabitants?”. This information is present in Wikipedia, but probably spread across many wiki pages. We would need to do quite a bit of research on Wikipedia to get the answers! But what if there is a system that allows you to do exactly the same! The answer is DBpedia.

DBpedia is a project undertaken by the Free University of Berlin and University of Leipzig (where I have interned for 2 months). DBpedia aims to extract the knowledge on Wikipedia, structure and classify that information using semantic web constructs like the DBpedia Ontology. The DBpedia Ontology is useful in representing knowledge by identifying its properties and its relationship with other objects. For example, there are many classes in the DBpedia Ontology like Resource, Place, Person, Organisation, etc. Resource Description Framework (RDF) is a technology that enables to do model knowledge in this manner.

DBpedia uses a PHP based extraction framework that can understand the Wiki mark-up and extracts information from various sources on a wiki page, such as the infoboxes (which uses a template mechanism) and creates RDF triplets out of this information. 29 million triplets have originated from the infoboxes alone! This information is made available on the Web under an open licence. It uses the SPARQL query language to query this information. In order to obtain answers to such questions posed in the first paragraph, one just needs to execute a query in the SPARQL endpoint. Hence, it is possible to query the entire knowledge on Wikipedia, just like a database. Please have a look at this page in order to understand an example - http://en.wikipedia.org/wiki/DBpedia#Example. More details about DBpedia - http://dbpedia.org/About.

This is just one use case of DBpedia. The applications of DBpedia, in the context of semantic web, are many. In my presentation, I would like to give an introduction of DBpedia, its use-cases and a demo, if time permits.


Track (Community/Knowledge/Outreach/Technology)

Knowledge Technology

Will you attend Wikiconference if your submission is not accepted?

Yes

Slides or further information (optional)