Jump to content

Talk:Wikidata/Data model/JSON

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 12 years ago by Denny Vrandečić (WMDE) in topic Why are language codes repeated?

No IRIs?

[edit]

To not include URIs is neither Linked Data nor RESTful. Is this a good idea? BarryNorton (talk) 07:33, 5 April 2012 (UTC)Reply

We will provide an RDF export that covers Linked Data, and which will include IRIs. The JSON export is aiming at a different clientele. --Denny Vrandečić (WMDE) (talk) 19:37, 5 April 2012 (UTC)Reply

(definition of an IRI at [[1]])

JSON-LD

[edit]

An alternative representation in JSON-LD could maintain full RDF fidelity. In JSON-LD, the @id key is used to denote the subject IRI. It can be a term, compact IRI, or CURIE. Values are expressed either as simple strings, or with more structured information such as language or datatype. A context (@context) is used to provide a definition for terms and prefixes used in the body, and to provide implicit type coercion.

 {
   "@context": {
     "titles": "http://purl.org/dc/terms/title",
     "description": "http://purl.org/dc/terms/description",
     "label": "http://www.w3.org/2000/01/rdf-schema#label",
     "en": "http://meta.wikimedia.org/language#en",
     "de": "http://meta.wikimedia.org/language#de",
     "value": "@value",
     "entity": "@id"
   },
   "entity" : "q7",
   "titles" : {
     "en" : {
       "@language" : "en",
       "value" : "Georgia_(country)"
     },
     "de" : {
       "@language" : "de",
       "value" : "Georgien"
     }
   },
   "label" : {
     "en" : {
       "@language" : "en",
       "value" : "Georgia"
     },
     "de" : {
       "@language" : "de",
       "value" : "Georgien"
     }
   },
   "description" : {
     "en" : {
       "@language" : "en",
       "value" : "A central-asian country"
     },
     "de" : {
       "@language" : "de",
       "value" : "Land im Kaukasus"
     }
   }
 }

In this case, @id would stand for entity and be interpreted as either a relative IRI or a term. Other terms, such as "description" and "label" can be defined in the context to their appropriate IRI definitions, perhaps "dc:title", "rdfs:label" and "dc:description". Ideally, to be linked data, it would also reference related subjects, such as cities, location, and so forth, allowing for follow-your-nose navigation between different subjects.

This results in the following Turtle:

 @prefix wml: <http://meta.wikimedia.org/language#> .
 @prefix dc: <http://purl.org/dc/terms/> .
 @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
 <q7> rdfs:label [
     wml:de "Georgien"@de;
     wml:en "Georgia"@en];
   dc:title: [
     wml:de "Georgien"@de;
     wml:en "Georgia_(country)"@en];
   dc:description [
     wml:de "Land im Kaukasus"@de;
     wml:en "A central-asian country"@en] .

There is also a proposed extension to JSON-LD which would allow the "en" and "de" keys to be treated directly like @language specifiers. Yours being an important use case, we could consider the following markup:


How would that be expressed in JSON-LD?

[edit]
{
  "entity" : "q8",
  "label" : {
    "en" : {
      "language" : "en",
      "value" : "Elisabeth II.",
      "audio" : "http://example/elisabeth2.aiff"
    }
  },
  "alias" : {
    "en" : [
      {
        "language" : "en",
        "value" : "The Queen",
        "audio" : "http://example/thequeen.aiff"
      },
      {
        "language" : "en",
        "value" : "Elisabeth Windsor",
        "audio" : "http://example/elisabethwindsor.aiff"
      }
    ],
    "de" : [
      {
        "language" : "de",
        "value" : "Königin Elisabeth II."
      }
   ]
  }
}

So, one way to express this in JSON-LD might be the following:

{
  "entity" : "q8",
  "alias" : [
    {
      "@context": {"@language": "en"},
      "value" : "The Queen",
      "audio" : "http://example/thequeen.aiff"
    },
    {
      "@context": {"@language": "de"},
      "value" : "Königin Elisabeth II."
    }
  ]
}

This misses the second "Elizabeth Windsor", but that would either be a third object with it's own context, or use a recessive @set notation, which ends up collapsing down anyway, but may be more useful for some API purpose. Basically, "alias" is the predicate, and each of these defines unnamed objects with "value" and "audio" properties.

To express multiple entities sharing the same context (language), you can use the @set notation:

{
  "entity" : "q8",
  "alias" : [
    {
      "@context": {"@language": "en"},
      "@set": [
        {
          "value" : "The Queen",
          "audio" : "http://example/thequeen.aiff"
        },
        {
          "value" : "Elisabeth Windsor",
          "audio" : "http://example/elisabethwindsor.aiff"
        }
      ]
    },
    {
      "@context": {"@language": "de"},
      "value" : "Königin Elisabeth II."
    }
  ]
}

This has multiple entities sharing the same language. Each entity has an implicit BNode defined for it.

Yet another way of doing it would be to define separate properties for each language which would lead to something like:

{
  "entity" : "q8",
  "label_en" : {
    "value" : "Elisabeth II.",
    "audio" : "http://example/elisabeth2.aiff"
  },
  "alias_en" : [
    {
      "value" : "The Queen",
      "audio" : "http://example/thequeen.aiff"
    },
    {
      "value" : "Elisabeth Windsor",
      "audio" : "http://example/elisabethwindsor.aiff"
    }
  ],
  "alias_de" : [
    {
      "language" : "de",
      "value" : "Königin Elisabeth II."
    }
  ]
}

Why are language codes repeated?

[edit]

In the example, "de" and "en" are in the role of a key and a value at the same time in a mode that seems to be redundant for me. Why is this useful? Or could you write another example where the difference becomes obvious? Bináris tell me 08:25, 6 April 2012 (UTC)Reply

Hm, you are right. I was thinking about having it more consistent as a translation from the data model, but I guess there is no use for that, really. --Denny Vrandečić (WMDE) (talk) 11:25, 6 April 2012 (UTC)Reply
The basic conundrum is that you either place language specific information at the leaves, or you have different branches of data, with some repetition for each language variation. If language variations have different provenance, this might be a good idea in any case. --Gregg Kellogg

Expressing provenance information in JSON-LD

[edit]

JSON-LD also has support for named graphs, similarly to TriG. An example, based on a use-case submitted to the W3C RDF Working Group is shown here, and repeated as a JSON-LD Test Case:

 {
   "@context": {
     "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
     "ex": "http://example.org/",
     "xsd": "http://www.w3.org/2001/XMLSchema#",
     "ex:locatedIn": {"@type": "@id"},
     "ex:hasPopulaton": {"@type": "xsd:integer"},
     "ex:hasReference": {"@type": "@id"}
   },
   "@graph": [
     {
       "@id": "http://example.org/ParisFact1",
       "@type": "rdf:Graph",
       "@graph": {
         "@id": "http://example.org/location/Paris#this",
         "ex:locatedIn": "http://example.org/location/France#this"
       },
       "ex:hasReference": ["http://www.britannica.com/", "http://www.wikipedia.org/", "http://www.brockhaus.de/"]
     },
     {
       "@id": "http://example.org/ParisFact2",
       "@type": "rdf:Graph",
       "@graph": {
         "@id": "http://example.org/location/Paris#this",
         "ex:hasPopulation": 7000000
       },
       "ex:hasReference": "http://www.wikipedia.org/"
     }
   ]
 }

This basically defines two resources, each of which asserts a named graph. (Of course, being linked data, the information could also be split among separate documents). (Also, the top-level @graph is only necessary to represent two unrelated objects; if it is a single assertion, then it is not necessary).

The equivalent TriG would be the following:

 @prefix ex: <http://example.org/> .
 @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
 @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
 {
   ex:ParisFact1 a rdf:Graph;
      ex:hasReference <http://www.britannica.com/>,
        <http://www.wikipedia.org/>,
        <http://www.brockhaus.de/> .
   ex:ParisFact2 a rdf:Graph;
      ex:hasReference <http://www.wikipedia.org/> .
 }
 ex:ParisFact1 {
   <http://example.org/location/Paris#this> ex:locatedIn <http://example.org/location/France#this> .
 }
 ex:ParisFact2 {
   <http://example.org/location/Paris#this> ex:hasPopulation 7000000 .
 }