Talk:Wikidata/Notes/Inclusion syntax v0.2

From Meta, a Wikimedia project coordination wiki

"id"[edit]

Isn't problematic to use "id" in {{#property:foo|item=id/bar}} considering there is id.wikipedia.org? The syntax is ambiguous. Helder 13:04, 29 May 2012 (UTC)

yes, we will probably have to use something else, like "item" or "qid" (which is in the range of reserved for "local use" by iso 839). We havn't decided yet on the exact form (inter)wiki links to item pages should have, but this parameter should use the same syntax. -- Daniel Kinzler (WMDE) (talk) 13:25, 29 May 2012 (UTC)[reply]

My first User Story: The Population of a City[edit]

Deutsch[edit]

Seit unserem Treff in Dresden habe ich eine User-Story im Kopf, von der ich mir wünschen würde, dass sie bereits in einer sehr frühen Projektphase Realität werden könnte. Mein Grundgedanke ist, dass wir beispielsweise die tausendfach eingebundene Gemeinde-Infobox Stück für Stück umstellen: Immer nur einen Parameter auf einmal, beginnend mit denen, bei denen die Quellenlage unstrittig ist. Das wäre dann auch meine erste User-Story:

Aus Sicht der Leser
Angenommen, der Leser interessiert sich für die Einwohnerzahl der Stadt Dresden. Er scannt den Artikel danach und findet die Zahl in der Infobox. Hinter der Zahl wird ein kleiner Link oder ein Info-Icon sein, ähnlich wie die jetzige Fußnote oder das Imagemap-Icon. Klickt der Leser darauf, gelangt er zu einer Wikidata-Seite mit allem, was über die Property „Population“ des Items „Dresden“ bekannt ist (das kann auch ein Unterabschnitt einer größeren Seite zum Item „Dresden“ sein). Er findet dort einerseits Zahlen aus verschiedenen Quellen (beispielsweise auch die Angabe der Stadt, die anders als die statistischen Ämter die Zweitwohnsitze mitzählt) als auch die Veränderung jeder dieser Zahlen über die Zeit. Falls der Leser beim Betrachten des Artikels (in dem nur eine einzige Zahl steht) den Eindruck hatte, die Angabe wäre falsch, wird sein Informationsbedürfnis nach nur einem Klick befriedigt: Er sieht, dass es verschiedene Sichtweisen auf den selben Sachverhalt gibt. Und er sieht, dass die seiner Meinung nach richtige Zahl bereits eingetragen ist. Ein wichtiges Detail ist dabei, dass diese Info-Seite komplett deutsch sein muss (Stichwort Akzeptanz).
Aus Sicht der Autoren
Wir haben uns darauf geeinigt, in allen Stadt- und Gemeindeartikeln nur die Einwohnerzahlen der statistischen Ämter zuzulassen, da nur so eine Vergleichbarkeit über alle Städte und Gemeinden hinweg gewährleistet ist. Das heißt, wir müssen irgendwie sicherstellen, dass alle Infobox-Einbindungen nur Zahlen aus dieser einen Quelle anzeigen, und zwar unabhängig davon, was für das einzelne Wikidata-Item möglicherweise als Preferred festgelegt wurde. Dafür benötigen wir eine Syntax, wie sie schon vorgeschlagen wurde, mit einem eindeutigen Identifikator für die Quelle. Außerdem darf es auf keinen Fall möglich sein, dass die Leser die Einwohnerzahlen inline in den Stadt- und Gemeindeartikeln editieren können. Das darf nur auf dem Umweg über die oben beschriebene Wikidata-Seite geschehen und auch dort nur mit Sichtung. Der bevorzugte Weg wird ein jährlicher Import (per API, CSV oder ähnlich) der Listen sein, die von den statistischen Ämtern veröffentlicht werden.

Nach dieser User-Story wird es sicherlich mit den Flächen weiter gehen, dann mit den Höhen. Ein Klick auf das Info-Icon hinter der mittleren Höhe in der Infobox wird dann beispielsweise auch minimale und maximale Höhen zutage fördern, die in der Infobox nicht eingebunden sind. Im Prinzip bleibt alles wie oben geschildert, nur dass wir für Daten mit weniger starrer Quellenlage immer offener werden müssen, was die Editierbarkeit betrifft, bis hin zum freien Inline-Editieren direkt in der Infobox. --TMg 18:05, 23 June 2012 (UTC)[reply]

English[edit]

My first user story is about the German municipality infobox template:

From a readers point of view
Let's assume a reader is looking for the population of the city Dresden. What he will see in future versions of the infobox template is a single number followed by a link (like a ref) or an icon (like the blue imagemap icon). This will open a WikiData page either for the property "Population" of the item "Dresden" or an even bigger page for the item Dresden (not sure which is better, depends on the size of the page). On this page the reader will not only find the source and date of the number shown in the actual article, he will also find more numbers from different sources and different times. If he thinks the number in the article is "wrong" and tries to change it, what he will find instead is that there is no "right" or "wrong". That said, it's important these WikiData pages are made for the reader, easy to understand and written in the readers language.
From an editors point of view
I cases like this we won't allow easy editing of the numbers. All articles about German municipalities must always show the population from a single source and a single date. To do this we need an explicit selection with an explicit source and an explicit date, e.g. {{#property:population|source=Destatis|timestamp=2011-12-31}} (with "Destatis" being the identifier for the Federal Statistical Office of Germany) or maybe {{#property:population|source=Destatis2011}} (but I prefer the first). If new numbers are published by the government we will import these numbers and either change the date in the template or use something like {{#time:Y|-3 MONTHS}}-12-31 so it will change automatically every April, 1st. Of course it should always be possible to add more sources but none of these should change the actual article.

I wish this could become reality very soon. Currently some users created so called "meta data templates" and I would like to get rid of this hack. --TMg 16:19, 18 July 2012 (UTC)[reply]

I agree with you about the editors point of view: Wikidata can becomes a powerful tool only by providing reliable and if possible official references. This implies a reduction of the possibility for users to enter their own data. The best solution is to separate the data input from the final data storage by including a temporary storage. The data exportation to the final storage will be possible only for authorized users. Snipre (talk) 18:54, 10 August 2012 (UTC)[reply]

Make this as easy as possible[edit]

Just a reminder: It's very important to make this syntax and everything easy to understand for everybody. This is the interface between the Wikidata world and the Wikipedia world. This is the language both sides need to talk and to understand. Create as few new elements as possible. Give every element a clear, distinct behavior. No redundancy. No side effects. No hidden dependencies. Everything should provide a clearly visible advantage.

Why I write this (again): In the last few weeks I came across more and more negative discussions about templates. Most users don't understand templates, which is not a problem, but some hate them simply because it creates a dependency they can neither understand nor control.

No criticism, just a reminder. You are all doing a very good job so far! --TMg 16:38, 18 July 2012 (UTC)[reply]

Interesting but far away to match data structure requirement[edit]

Nice presentation but I think the development is too simple to be able to handle complex data. I just want to present a particular case of my interest to show the limitations of the present solution. Just to simplify before giving the example I remark one missing approach: properties have properties. We need to integrate subproperties in the data structure.

My example is very simple: sound speed of chemical. As often for physical properties conditions of the measurement have to be cited in order to give enough information for data comparison or use. If I take the speed of sound of sulfur dioxide, I have to provide the temperature and the pressure of the measurement.

So for the item sulfur dioxide we have the property "speed sound" which has a value and a unit. Then as for all sound speed we need to give the temperature (one value and one unit) and the pressure (one value and one unit). And fnally we have to give the reference.
In that example the temperature and the pressure are subproperties of the property sound speed. From my opinion we can say that the reference is a subproperty too but in the case of the reference we can easily refer to it as another item in the database.

Finally I give the present wikicode of the sound speed in the chemical infobox in the WP:fr and I will be very interested to see how the present data call can be introduced in that code:

| vitesseSon = {{Unité/2|213|m||s|-1}} ({{tmp|0|°C}},{{Unité/2|1|atm|}})<ref name="HBCP91">{{ouvrage | langue = en | auteur = W. M Haynes | titre = Handbook of chemistry and physics | numéro d'édition = 91 | éditeur = CRC | lieu = | année = 2010-2011 | volume = | pages totales = 2610 | isbn = 9781439820773 | passage = 14-40 | consulté le = 28 juillet 2011 }}</ref>

The important information are 213 m/s at 0°C and 1 atmosphere. Unité/2 is a template which formats the output and converts unit and value in the correct unit if necessary.

The simplest way from user point of view is that solution:

| vitesseSon = {{#property:SoundSpeed|source=HBCP91}}

but that implies a huge coding work in order to translate that call into the code above.

Another solution is to introduce the raw data without any format work:

| vitesseSon = {{Unité/2|{{#property:SoundSpeed|source=HBCP91}}|{{#property:SoundSpeed|source=HBCP91|part=unit}}}} ({{tmp|{{#property:SoundSpeed|source=HBCP91|part=Tmeas}}|{{#property:SoundSpeed|source=HBCP91|part=TmeasUnit}}}},{{Unité/2|{{#property:SoundSpeed|source=HBCP91|part=Pmeas}}|{{#property:SoundSpeed|source=HBCP91|part=PmeasUnit}}}})<ref name="HBCP91">{{#property:SoundSpeed|source=HBCP91|part=source}}</ref>

Snipre (talk) 14:07, 11 August 2012 (UTC)[reply]

Hi Snipre! Thanks for the comment. Do you mean something like this:

Sulfur dioxide


Speed of sound213 m/s[1 source]
Temperature 0°C
Pressure 1 atm

That does not answer how the data will be included in the Wikipedias (the clients), but it should demonstrate that the data model deals with it and that our plan encompasses these problems. --93.220.93.47 15:35, 13 August 2012 (UTC)[reply]

Yes, it seems a good layout, but you have to increase the number of parameters:

Sulfur dioxide


Speed of sound213[1 source]
Temperature 0
TemperatureUnit °C
Pressure 1
PressureUnit atm


{{Wikidata statement|item=Sulfur dioxide|property=Speed of sound|value=213 |unit=m/s|qualifier1=Temperature|value1=0|qualifier2=TemperatureUnit|value2=°C|qualifier3=Pressure|value3=1|qualifier4=PressureUnit|value4=atm|numberofsources=1}}
For the property an "unit" parameter is necessary as well as for each qualifier. We can solve the problem by increasing the number of qualifiers but the mistake is to consider that the unit of the pressure is a qualifier of the property "Sound of speed". I agree, that is a conceptual problem not a real one but the risk is to mix too much information by putting everything as qualifier and not trying to create a hierarchy in the data structure.
But as already said it's more a question of organization of the data and not an technical problem. Snipre (talk) 19:29, 14 August 2012 (UTC)[reply]

Hi Snipre! Thanks for elaborating. I agree mostly with you besides one thing: according to the Wikidata data model the unit is part of the value, and that is why we do not need to increase the number of qualifiers in this case, as you suggest. This makes dealing with the values far easier internally, and it is as we handle it in Semantic MediaWiki's data model. Cheers! --Denny Vrandečić (WMDE) (talk) 09:58, 15 August 2012 (UTC)[reply]

I told you your "hard coded" units will be a problem. ;-) Sorry, I know there are good reasons to do it that way. --TMg 16:44, 15 August 2012 (UTC)[reply]
Mixing string and numeric value is easier to handle ??? Your answer gives me the impression that you are mixing too many different objectives. Snipre (talk) 22:07, 15 August 2012 (UTC)[reply]
Sorry but I just want to complete my previous remark: mixing numeric value and unit in the same data recipient is a big problem for final formatting. Wikidata is not responsible of the final format and because of different formats in the different wikipedias it is necessary to separate information. Just an example: in the french wikipedia we have a template which converts Fahrenheit to Celsius for final display. Putting the unit with the value will force to add more code in template to match the different informations. Same problem concerning number format: english format is normally this one (1,000.23), french this one (1000,23 or 1 000,23). Keeping number away from unit will simplify the use of the data in template handling format problem. Snipre (talk) 14:48, 25 August 2012 (UTC)[reply]
As far as I understand the Wikidata people want to do this number formatting stuff for you. I agree with you. There are many possible problems. For example in the German Wikipedia numbers are formatted like 76.543,21. But 4-digit numbers like 6543,21 should not contain the dot. Except for numbers in tables. Yes, thats an exception of an exception. An other problem is how numbers are rounded. Sometimes we want to keep at least three significant digits. In such cases the number 76543.21 should be rounded to 76543 but the number 43.21 should be rounded to 43.2. How do we tell Wikidata what we want? --TMg 13:46, 28 August 2012 (UTC)[reply]

A few clarifications:

  • Additional information like temperature and pressure for the speed of sound (or the point of time of a population figure) are expressed as qualifiers. Qualifiers are properties of property-value-assignments (not of properties).
  • For they data type "scalar measurement", units (and the precision!) are part of the value, but values are not strings, they are complex data structures. No problem with parsing there.
  • Internally, all numbers are stored as numbers, not as strings, so there is no issue whatsoever with formatting. The same is true for dates, etc.
  • When displaying (transcluding) a number (as a property value), the author can specify the format and precision. Sensible defaults will be provided for each locale (this is already implemented in mediawiki, we just need to apply it to property values).
  • When transcluding a scalar value, the author can also request unit conversion. We will support at least conversion between metric and imperial units, and perhaps a few more.

I don't see anything above that isn't covered by our approach. -- Daniel Kinzler (WMDE) (talk) 10:24, 29 August 2012 (UTC)[reply]

Thank you for the informations but due to lack of information about this subject it is difficult to have an optimistic approach. But the question of the main objectives of wikidata still stays open: is it the task of wikidata to provide a fomating tool ? I don't know if you listed all format templates in the differents WPs or if you can ensure a maintenance team in the future providing the support for the different demands from "simple" users without any programming skills.
Wikidata is a tool which will be handled by a small group of persons which will have the knowledge of the tool and there is a high risk to have at the end an interesting tool but too complex and not used by people (I did some experiences with the use of templates). As long as no working prototype can be tested it is hard to believe promises that all is under control and scheduled (sorry that's just the result of my experience concerning community development).
I admit that this approach is not very helpful for people working on wikidata but a good product is not enough if you can not convince people to use it. I found in the past a lot of inertia when introducing new things especially when people don't have a complete understanding or the possibility of modifying the things as they want. I'm convinced by the need of a wikidata tool but I'm afraid that only a few persons will use it in the future if you can not provide either a very simple system or a complete program with help and GUI. Snipre (talk) 20:55, 29 August 2012 (UTC)[reply]
@Daniel: What I wrote about "5-digit numbers as 76.543,21 but 4-digit numbers as 6543,21" is not already implemented in MediaWiki. Same for significant figures where larger numbers become "1234" but small numbers become "1,2" or even "0,12". We use templates to do this. How does this work with Wikidata? We can't use templates if Wikidata returns "213 °C" as a string. --TMg 00:54, 5 September 2012 (UTC)[reply]
There is two ways (note that the spec for the inclusion syntax is still in flux, so i'm trying to give you an idea, it may look different in the end):
  1. let Wikidata handle the formatting: {{#property:temperature|precision=0.1|unit=Kelvin}} renders as 486,2 K (if the content language is German - if it's English, it would be 486.2 K)
  2. do your own formatting: {{#number_format:{{#property:temperature|raw|unit=Kelvin}}|2|,|.}} K
So - wikidata can give you the raw value, with optional unit conversion, so you can do your own formatting. Or you can tell it how you want the number to be formatted, and it will do it and return the default representation. -- Daniel Kinzler (WMDE) (talk) 17:21, 5 September 2012 (UTC)[reply]
There is a third possibility:
Present situation of format template called Temperature: {{Temperature|25|K}}
Third possibility: {{Temperature|{{#property:temperature|Value}}|{{#property:temperature|Unit}}}}
This implies to be able to call each property of a data separately in order to fill the templates with inclusion syntax instead of numbers or string. This will lead to very complex wikicode I admit but this can be hidden in a template as there will be no need for user to edit the wikicode for infobox. Snipre (talk) 19:26, 5 September 2012 (UTC)[reply]
This is pretty much the same as the second possibility I mentioned. "temperature|Value" from your example would be "temperature|raw" from my example. "temperature|Unit" would also work and return K, but only coicidentally: Wikidata will be using SI units for everything internally. Per default, a conversion appropriate for the wiki's locale would be applied. So, you'd have something like this: {{Temperature|{{#property:temperature|raw|unit=Celsius}}|C}}. -- Daniel Kinzler (WMDE) (talk) 13:59, 6 September 2012 (UTC)[reply]
Thank you for the reply. I admit that I'm still not totally convinced but I agree that you took care of a lot of details. Perhaps it would be a good idea when you will have a demo to test a chemical infobox and its more than 30 parameters. Snipre (talk) 22:10, 18 September 2012 (UTC)[reply]

Usage of data-item statement[edit]

Example, in article France, we might want to give a list of the main cities, with their population and location, and get a code like:

* {{#Data-item:Paris}}[[Paris]], {{#property:Population}}, {{location-dec|{{#property:longitude}}|{{#property:latitude}}}}
* {{#Data-item:Lyon}}[[Lyon]], {{#property:Population}}, {{location-dec|{{#property:longitude}}|{{#property:latitude}}}}
* {{#Data-item:Marseille}}[[Marseille]], {{#property:Population}}, {{location-dec|{{#property:longitude}}|{{#property:latitude}}}}
{{#Data-item}}

Last line to be sure that any following property is about the main article,

I guess that models like location could pick their parameters from the calling article or specified item when not indicated, {{location}} would suffice and replace {{location-dec|{{#property:longitude}}|{{#property:latitude}}}} in the example above.

Am I getting it right? --Cqui (talk) 13:39, 19 September 2012 (UTC)[reply]

As far as I understand your first line would be more like this:
* [[Paris]], {{#property:Population|item=Paris}}, {{location-dec|{{#property:longitude|item=Paris}}|{{#property:latitude|item=Paris}}}}
With "location-dec" being a regular template. It will be possible to move the last part into a template called "location":
{{location|item=Paris}}
Even better, I think it will be possible to get nice formatted coordinates from Wikidata:
{{#property:coordinates|item=Paris|format=DMS}}
I hope I'm right. Please note that the "item" parameter and especially the last part about the format is not yet specified. --TMg 17:08, 19 September 2012 (UTC)[reply]