Talk:Wikidata/Notes/Inclusion syntax v0.1

From Meta, a Wikimedia project coordination wiki

Localization, syntax and more questions[edit]

  1. Currently we are using the hash syntax for parser functions like {{#if: and {{#expr:. I'm not sure if it's a good idea to use the same syntax for the Wikidata stuff. As you said the output of e.g. {{#data-value: can not be used as an input for e.g. {{#if:. Isn't this confusing?
  2. Maybe it would be more confusing to invent a new syntax?
  3. Why does data-template use a dash but data_item and data_param are using underscores? Please use dashes everywhere. Be consistent with the HTML5 data-* attributes.
  4. Why not using the same syntax inside the templates? For example, {{{data-color}}} instead of {{{data.color}}}?
  5. Are we free to use localized template names and parameter names for the new infobox syntax? I consider this very, very important. Here is an example why this is so important. We need a clearly defined point where the parameter names can be translated to create a localized version of the same template.
  6. Overall, I'm not sure if the new infobox syntax is meant to be used in articles or in other templates? Can we keep our existing localized templates and use the new syntax in these templates? The new syntax should allow this.

--TMg 14:23, 22 May 2012 (UTC)[reply]

Hi TMg, thanks for your input!
  1. these are parser functions. And you can use their output as input for other parser functions, they are just riddled with a lot of HTML and so pretty useless as conditions, etc. But I will rephrase the relevant sentence - the output of #data-value etc can be used as the seconds or third parameter to #if just fine, it just doesn't make sense to use it as the first parameter (the condition).
  2. yes :) Also much harder to implement.
  3. ok, will use slashes instead of underscores in parameter names.
  4. {{{data.color}}} is a structured identifier, meaning the property "color" of the object "data". Dots (and colons) are commonly used in programming languages to denote sub-entities (parts, properties, members, etc). in contrast, data-template or data-param are not structured - they are just compound phrases. The do not denote sub-entities.
  5. Template names are completely custom. It's not that you can localize them - you will have to provide your own. As to the parameters supported by the parser functions... they will use whatever mechanism exists for localizing the parameter names of parser functions. I don't know if MediaWiki supports it. If MediaWiki supports it, Wikidata/Wikibase supports it.
  6. The intention is to use {{#data-template:whatever}} in the article and {{#data-value:data.foo}} in the template. If you want to hide the {{#data-template}} stuff, you can wrap another template around it: {{Whatever}} would do {{#data-template:whatever-format}}, and whatever-format would contain the actual formatting logic.
HTH -- Duesentrieb (talk) 14:44, 22 May 2012 (UTC)[reply]
  1. OK.
  2. I'm a software developer, I know the dot syntax. However, I'm not sure if it's appropriate here. Wikitext is no programming language, not even with all the parser functions we have. It does not even look like JavaScript or C++. Currently in template parameters like {{{Min.-max. height}}} neither dashes nor dots nor spaces have a meaning. All I say is: If you choose a character why not choose the dash? Again, this would be consistent with the HTML5 data-* attributes.
  3. I want to translate {{{data.color}}} to {{{Daten.Farbe}}} (or to {{{Daten-Farbe}}} as argued above). Maybe I'm wrong and this is not important. The question is: What part of the new syntax will be visible in articles? All these parts must be translated.
  4. OK. Similar to the /doc, /sandbox and /testcases subpages we will create a lot of /data-template subpages then. I think this is a good idea.
--TMg 15:30, 22 May 2012 (UTC)[reply]
  1. You asked "If you choose a character why not choose the dash?" well, dashes and underscores are often used in property identifiers. If we used them as our structuring element, they can not occur inside either the name of the parameter that references the item, nor in the name of any property of the item. So, the item can't have e.g. a pupulation-density parameter and, according to your original point, shouldn't be using pupulation_density either (well, we could use dashes as a structuring element and underscores i nthe name of parameters and properties, but you didn't like that and it's visually far more confusing than using dots). Anyway, I'm not desperate to use dots. I just think dashes are worse. We can use slashes, how about it :)
  2. Ah, you want to translate the names of the item's properties. We are considering to make this possible in the property definition in the wikidata repository. We'll have to think about restrictions for those names (allow dashes? dots? spaces?), and if and how they can be changed later (changing the localized name would break a lot of things...).
It may save us a lot of trouble to require the use of unchanging unique identifiers to the parameters, so nothing breaks when the translation is changed. We'll have to maintain a localized "visible" name anyway, so we can automatically provide labels for properties (oops, forgot to mention that in the draft).
--Duesentrieb (talk) 15:55, 22 May 2012 (UTC)[reply]

Btw: perhaps join the discussion on wikidata-l, I have relaied our conversation there. -- Duesentrieb (talk) 16:17, 22 May 2012 (UTC)[reply]

  • Dots are also "used in property identifiers". It's basically a matter of taste. Everything is possible including something like data[color]. Please choose the syntax carefully. Whatever you choose, it will introduce a new restriction and due to this cause problems in some existing templates.
  • By the way, it's a nice idea but maybe confusing that {{{data}}} expands to "the label and description of the item". How do we specify how this string looks like? Again, I think this needs to be localized.
  • Having unique identifiers is not a problem if things like {{{data.color}}} don't need to be included in thousands of articles. If it's enough to put this in a template (and localize the template) it's fine.
--TMg 17:26, 22 May 2012 (UTC)[reply]
Using slashes might become a problem, since they're used in the syntax of the German Wikipedia Coordinate-Template which is included in some way in virtually every infobox depicting georeferencable objects, e.g. something like |lat_deg= 2/30/52/N |lon_deg=13/21/0/E, and I know of at least one template using slashes for some string operations (de:Vorlage:Infobox Fluss (some kind of EBNF syntax) with the parameter ABFLUSSWEG which is helping with the automatic categorization of German WP river articles). --Matthiasb (talk) 19:33, 23 May 2012 (UTC)[reply]
I'm afraid you are mixing something. It's not a problem to have slashes in a value. It may be a problem to have slashes in a parameter name (e.g. lat/deg). By the way, I think it's a lot better to always store coordinates as numbers (e.g. "+13.35" instead of "13/21/0/E"). --TMg 18:04, 24 May 2012 (UTC)[reply]

Formatting[edit]

I don't really understand what all the formatting stuff is about. Why should we use

{{#data-value:data.color|form=span}}

instead of

<span>{{#data-value:data.color}}</span>

Or worse, why should we use

{{#data-value:data.color|form=td|style=text-align: right;}}

instead of the wiki table syntax?

| style="text-align: right;" | {{#data-value:data.color}}

I think you should drop "form", "style" and "class". Simpler is better. --TMg 17:26, 22 May 2012 (UTC)[reply]

For the simple access you suggest above, the plain template parameter syntax is provided, e.g. {{{data.color}}}, as in
<span>{{{data.color}}}</span>
but there are many aspects of rendering a property value that can not be readily expressed in this syntax, for example, which language or precision and unit to use for the output. Also, values often have qualifiers, such as the source, accuracy, timestamp, etc. Lastly, there nedds to be a place for things like indicators for disputes, edit links, etc.
the {{#data-value}} function lets you output all of these "parts" all at once (or separately, if you like), and lets you control aspects like the format of the output using parameters. If you choose to output multiple parts at once, {{#data-value}} can use its knowledge about the desired HTML form to do this nicely. E.g.
 <span>{{#data-value:data.population|show=label,value,timestamp,source,indicators,edit|form=tr}}</span>
would be rendered as
  <tr>
    <td>Population</td>
    <td>523,411</td>
    <td>2010</td>
    <td><a href="#src23">[1]</a>,<a href="#src23">[2]</a></td>
    <td><a title="disputed" href="..."><img src="..."/></a></td>
    <td>[<a title="edit" href="...">edit</a>]</td>
  </tr>
Its hard to imagine how to achieve this nicely without using parser functions. -- Duesentrieb (talk) 19:37, 22 May 2012 (UTC)[reply]
I'm very sorry but I think this is way to complicated. I'm a software developer and I think I should be able to understand all this in seconds when I look at it. I think you should create a toolkit that is very tiny and very easy to understand. Extremely powerful tools like in your example with all the complicated parameters (even comma-separated, which I think is horrible) are way to restricted in the end and can be used only in very, very few cases. Here is how your example should work in my opinion:
|-
| Population
| {{#formatnum: {{{data.population}}} }}
| {{#time: {{{data.population.timestamp}}} }}
| {{{data.population.source}}}
| {{{data.population.indicators}}}
| [{{{data.population.edit}}} edit]
This belongs in a template. In an article we will never write {{#data-value:data.population|show=label,value,timestamp,source,indicators,edit|form=tr}}. We will write {{Population table row}} instead. As said before, I don't understand why we should use HTML table syntax in a wiki? There is a table syntax. We know how it works. Don't force us to use an other syntax, please. We have tools to format numbers, timestamps and to create references and links. We have powerfull tools to create templates. We are able to use styles and classes and HTML. We don't need a new syntax to do thinks we already can do. This is not only confusing, it is highly counterproductive. Don't work against the template syntax, work with it. data.population should output the unformatted population. We have tools to format numbers. data.population.timestamp should output an unformatted timestamp. We have tools to format timestamps. data.population.edit should output an URL. We have tools to create links. data.population.source should output <ref> tags. --TMg 09:38, 23 May 2012 (UTC)[reply]
Ok, so you want to handle all parts of each value by hand. Fine. It is possible for most things, as you said. But it's very tricky to do in other cases, and very redundant to have to do it over and over. Here's a few things that I can't think of a good way to do using templates:
Unit conversion. Even if you have templates to do this, you would need a plain number as input. but data.population may not be a single value, but (e.g. in case of a dispute) a list or range of values.
indicators are generally complex html
the edit link would normally contain javascript that invokes the on-site editing interface. Only as a fallback would it actually link somewhere. And it should not be formatted as an external link, nor should it end up in the externallinks table.
data.population.source is actually a list of sources, each of which needs a template for rendering. You are already generating complex html at this point.
in the case of #data-values (plural), each value (actually, each statement, see the data model spec) for a property would be listed separately. You would need a foreach loop to do this in a template. With Lua, this will be possible in the future, but right now it isn't.
While i agree that it's not very nice to be outputting entire table rows from a parser function, I think it would very hard to cover the above with a simpler approach. If you can think of a cleaner, nicer, yet workable way, let me know.
-- Duesentrieb (talk) 10:19, 23 May 2012 (UTC)[reply]

Don't let Wikidata "generate complex HTML"[edit]

I'm able to understand all this from a developers point of view. But most Wikipedia users aren't developers, not even close. They don't need nor want a powerful tool. What they want are simple solutions for their problems.

All I can do at the moment is to warn you: Most of the details described in this documentation are way to complicated and will never be used in the Wikipedia projects. Very few people will understand this stuff. This will create an other playground for an elite of very few Wikipedia users and lock out most people. This will cause a lot of bad discussions. Wikipedia is full of such "elitist playgrounds". An example are categories. Categories are almost completely useless for the readers. They are even ignored by many longtime Wikipedia authors (like me) because we have the feeling categories are controlled by this "elite" of very few users. They don't want us to make categories simple and useful for the everyday user. They fight us. Please don't let this happen again.

We don't need the Wikidata project to solve problems that are already solved like formating dates and numbers, creating tables or appending styles and classes to HTML tags. We need you to create an easy to understand set of simple tools to fill a few gaps.

To answer some of your points: We don't need to do this "over and over". We will do this once, in a template. We don't need you to do unit conversion. Unit conversion is a solved problem. We have very good templates to do this. Code like {{{data.population}}} must return a number. What's the point in having a Wikidata project in the first place if it does not solve the problem we have: people entering random stuff in the template parameters? If we don't know the exact population this must be solved with other attributes like {{{data.population.precision}}} or {{{data.population.min}}} and {{{data.population.max}}}. All these attributes must be atomic (see database normalization, first normal form). I know this may be too simple in some cases. But we don't need a complicated solution to solve a few complicated cases. We need a simple solution to solve thousands and thousands of simple cases.

Inline editing of a population is dangerous and should never be possible, especially for populations that are imported from a reliable source. "source" should output a string of concatenated <ref> tags. I don't see a problem there. About "indicators" and "dispute": I'm not even sure what this means or why we need this. Again, I have the feeling this is not a problem that needs to be solved. Overall, I think Wikidata should be about plain data and not about "complex HTML formatting". I'm very sorry but this feels wrong for so many reasons (from the view of a Wikipedia author but also from a developers view).

I don't write this to stop you. I write this to help you move forward in the right direction. I know you collected a lot of problems and you want to solve them all. You want to create "the next big thing". Please, don't do this. Focus on very few but important problems.

We don't need a "powerful tool". We need to solve the problems we have. That's a difference. A tool is just a tool. If it's to complicated and can't be understood by a wide range of users it's useless. If only a few developers can master a tool and everybody else needs to ask these developers first the tool failed. It will just create more problems. It will ban more people from participating in a previously "open" project.

(Wenn ich mich unverständlich ausgedrückt haben sollte, kann ich gern eine deutsche Erklärung nachliefern.) --TMg 18:55, 23 May 2012 (UTC)[reply]

TL;DR: I think this should be done according to the model–view–controller (MVC) pattern. We have a view (wiki markup, HTML, CSS). We have a controller (templates, parser functions, some extensions). What's missing is a model. Wikidata should be the model only. If the controller is not powerful enough, create the required tools (for example {{#data-foreach}} to iterate a set of values) but don't mix this controller stuff with the model stuff. --TMg 19:10, 23 May 2012 (UTC)[reply]

Various notes[edit]

In general, I am surprised that on the one hand you seem to be deeply modifying the template parameter calls (by providing structured parameters and methods to resolve the structure) while at the other hand you don't allow the data items to be called from within a template. You present approach seems to force either each page that calls an infobox template to be modified (replacing the {{infoboxZZZ |foo=some value}} with a {{#data-template:infoboxZZZ |foo=some value }} call, or, which is more likely, each infoboxZZZ to be renamed to infoboxZZZ-Inner and a new infoboxZZZ be created that calls the infoboxZZZ-Inner wrapped in the #data-template.

While this is possible to do, it seems a some overhead in the (very likely) scenario that an infobox display a mixture of wikidata-stored information and page-injected information, i.e. both the wrapper, and the inner, real infobox template need to pass the right parameters.

I guess that approach is taken because of caching concerns. However, given that the template parameter calls have to be overloaded for wikidata anyways: is it possible to silently, whenever calling a item.color as a parameter, to always cache the entire item, so that the next call for item.size would already be in memory?

Some random notes on the text, which may or may not be useful:

  • The explanation is somewhat hard to follow, because the section "Including Items in an Article" requires an understanding of what the object is that is passed to a template. Normally templates do not get structured parameters passed, so this was surprising to me. You invent this newly and a new syntax. Perhaps the explanation of this general mechanism could come first.
  • Like other commentators, I am sceptical about using the dot for this. Both dots and hyphens are legal in the grammar for RDF property names (http://www.w3.org/TR/REC-xml/#NT-Name). Slashes or hashes are not and would be a better choice in my opinion.
  • "This implies that the client wiki tracks": please define "client wiki". Also I cannot follow the rest of the sentence, perhaps elaborate.

--G.Hagedorn (talk) 21:15, 22 May 2012 (UTC)[reply]


It was indeed intentional to always do item formatting via a template, since that seems to be the way people usually handle the formatting of uniform data objects. This can easily be amended by introducing a parser function that makes a data object available in the present scope, instead of passing it to a template. As to forcing all pages using the infobox templates to be modified: technically, you don't have to do that, because you can easily wrap the call to #data-template in the original template and use some other template to do the actual formatting. But in practice, the page will have to be edited anyway. It's pointless to use data from Wikidata if we don't remove all the infobox parameters from the article pages.
re caching: all data items are cached. Twice, actually: once persistently in a local database table, and once per request, in memory.
re your text notes: thanks for the input, I'll improve that. I think I'm going to rewrite the entire proposal, now that I have gotten some feedback. -- Duesentrieb (talk) 15:55, 23 May 2012 (UTC)[reply]

{{#data-template}}[edit]

#data-template -- a two word parser function representing a thing like an "infobox" on a page is practically unprecedented because it's unnecessary
data_item=q332211 -- major departure from wiki simplicity and its strong orientation to naming topics and subtopics. Its description suggests this is the name of an MVC controller.
foo=some value -- where is foo defined in the wiki
data_param=stuff -- sounds like an MVC data module name
stuff.color=green -- an MVC view notation that should be an MVC scheme
--Hypergrove (talk) 22:07, 22 May 2012 (UTC)[reply]
The idea is to send a data items to a template for formatting, making it's properties available as template parameters.
Using numeric ids to reference items allows for stable, unique, language neutral identifiers. They will however rarely be used, becuase the default is "the data item connected to this page via the language links".
foo is just a normal template parameter, passed to the template.
data_param controls wich template parameter will be used to represent the item (i.e. it assigns a local name to the item).
stuff.color=green overrides the value of a property of the item. Color was perhaps a poor choice of example, because this is unrelated to display logic. It may just as well be data.weight=12kg or thingy.population=1234567.
-- Duesentrieb (talk) 10:02, 23 May 2012 (UTC)[reply]

{{#data-value}}[edit]

This equates to the ask/show parser functions. Comments:

form -- should be "format"
style -- unconventional; use surrounding div
class -- unconventional
source -- is this an rdf resource identifier? I prefer names not identifiers.
show -- dotted names corresponding to an RDF graph traversal path (with axes) is needed here
precision -- this selector should be part of the name of the data item being asked
unit -- this selector should be part of the name of the data item being asked
language -- this selector should be part of the name of the data item being asked
--Hypergrove (talk) 22:07, 22 May 2012 (UTC)[reply]
#data-value generates potentially complex html structures. Supporting class and style directly is just a conveniance, that's true. It seems handy to allow this, for the same reasons the mediawiki table syntax allows this.
"source" selects a specific statement using it's source id. Please refer to the data model spec.
"show" selects the parts of the statement to show. This is related to the "snaks" (see the data model spec), but unrelated to graph traversal. We do NOT use an RDF data model. Intentionally. See all the nice discussion about this on wikidata-l.
the precision with which a scalar value is shown needs to be controlled by the user. It's a formating decision, not a property of the data (as opposed to accuracy or margin-of-error - these are in fact parts of the data, modeled as value qualifiers). The same is true for the unit: it's up to the local wiki to decide which unit to use for display, Wikidata provides (linear) unit conversion.
using the name (as in show?) to select the language means inventing more syntax. Until now, the discussion ha been pointing in the opposite direction: do not use structured identifiers at all, getr rid of all the dots, etc, use separate parser function parameters for everything. I tend to agree.
-- Duesentrieb (talk) 10:09, 23 May 2012 (UTC)[reply]

SKOS {{#topic}}[edit]

I don't understand yet why SKOS' logical model is not being seriously considered for reuse - it is an obvious fit with the wiki model. If SKOS were the basis for the design, then you'd likely end up REPLACING {{#data-template}} with something like a {{#topic:{{FULLPAGENAME}}}} parser function.

--Hypergrove (talk) 22:07, 22 May 2012 (UTC)[reply]
This page is only about the syntax used for inclusion. If you want to discuss the data model, please see Wikidata/Data model. Also have a look at the other pages linked from Wikidata/Notes.
I agree that some parts of the Wikidata model maps nicely to SKOS. A lot of things don't map to SKOS at all. And all that is completely besides the point for specifying the inclusion mechanism. -- Duesentrieb (talk) 09:55, 23 May 2012 (UTC)[reply]
I'm posting Wikitopics to discuss this approach in more details. Thanks for the replies --Hypergrove (talk) 17:33, 23 May 2012 (UTC)[reply]

Language independent Wikidata inclusion specification[edit]

MediaWiki template syntax is a nightmare and Wikidata won't improve it. The only solution is to

  • get rid of the cumbersome {{...}} syntax

Having said this, it would help to at least define a language independent Wikidata inclusion specification, which could be implemented in different syntax forms (MediaWiki template syntax, template syntax, API bindings to several programming languages etc). For instance:

Method: property

parameter description
property which property to return
item which item's property to return
format output format (based on the propertie's data type) ...

From the current specification it is less visible which parameters and which parameter values are supported.

Wikidata inclusion syntax for MediaWiki is just one application of this language independent Wikidata inclusion specification. With formal specification you could also generate automatically API bindings for multiple programming languages. See Tinkerforge API bindings for an example. -- JakobVoss (talk) 10:37, 29 May 2012 (UTC)[reply]

a very good point, thanks! We'll need something like this, especially wrt Lua. -- Daniel Kinzler (WMDE) (talk) 13:27, 29 May 2012 (UTC)[reply]

Responses to the May 29 Draft[edit]

My first thought when looking at the new draft: Wow. Much, much better. Thanks a lot. Good work. I will write a detailed response later. --TMg 11:12, 29 May 2012 (UTC)[reply]