Field-value pairs

From Meta, a Wikimedia project coordination wiki

This is a proposal to allow field-value pairs in MediaWiki articles.

See also: Flexible Fields for MediaWiki, RDF

Rationale[edit]

There are a number of reasons we may want to assign fields to an article. Examples:

  • categorization -- saying that particle physics is in the physics category, or that Lord of the Rings is in the fantasy books category
  • relationships between articles -- break up a single page into multiple chapters or sections, and note that they're all part of the same article
  • synopses -- providing a synopsis or description of an article
  • geography -- marking up pages to specify that they cover a particular geographical location
  • customizable per-installation metadata -- metadata that may make sense for different installations.

Design[edit]

Markup[edit]

Editors can put markup into a page like this:

name=value

Here,

  • "name" can be any string that won't interfere with other wikitext parsing and doesn't contain a ":"; it is the field name.
  • "value" is any string that won't interfere with other wikitext parsing, and doesn't contain "]]". It is the field value.

The field tag can go anywhere in the page text. There can be multiple field tags with the same name.

The markup used is intended to be fairly obvious, and not interfere with namespaces. The big disadvantage is that it prevents having articles with the equal sign ("=") in the name.

Some examples. (Note that there's no pre-defined meanings for any of these fields; this is just to illustrate the markup).

category=physics
parent=Canada
related=Wikipedia:Neutral point of view
status=stub

Database[edit]

When a page is saved, the fields are extracted and put into a database table called "fields". The table has the following columns.

field_page_id
the cur_id of the cur row that contains this field
field_order
zero-based order of field marker within the page text.
field_name
the contents of the name part
field_value
the contents of the value part

Rendering[edit]

When a page is rendered, the associated fields are retrieved, and depending on field name and the installation configuration, they can be:

  • ignored. Just left out of the page entirely.
  • displayed as a link in-page.
  • displayed as text in-page.
  • displayed as a link out-of-page (like interlanguage links)
  • displayed as text out-of-page
  • displayed as a <meta> field-value pair
  • displayed as a <link> field-value pair

Configuration[edit]

Any application of this feature is up to the installation. Auto-indexed pages, bread-crumb navigation, field-based search, field display in search results, directories, etc., could be implemented if needed.

Configuration is done by means of an array $wgFields in the LanguageXX.php file. The array maps field names to rendering options, specified by a code. For example,

$wgFields = array(
category => FIELD_LINK,
ICBM => FIELD_META,
status => FIELD_OTEXT
);

The codes for fields are:

  • FIELD_IGNORE: ignored. Just left out of the page entirely.
  • FIELD_LINK: the value is displayed as a link in-page.
  • FIELD_TEXT: value is displayed as text in-page.
  • FIELD_OLINK: name is displayed as a link out-of-page to value
  • FIELD_OTEXT: name and value displayed as text out-of-page (Name: Value)
  • FIELD_META: displayed as a <meta> name-value pair
  • FIELD_META_LINK: displayed as a <link> name-value pair

The default value fields would be FIELD_IGNORE.

Functions[edit]

A possible enhancement might be to allow mapping a field to a function with three parameters: an OutputPage object, the field name, and the value. For example:

function blueText($out, $name, $value) {
   $out->addHTML("<font color=\"blue\">{$value}</font>");
}

$wgFields = array(
   birthdate => &blueText
 );

(Note that this is just an example; it's just showing a way to do custom rendering for the meaningful field "birthdate" in a different way. Doing wikitext style markup is not a goal of this proposal.)

This would allow some per-site customization of field rendering without deeply mucking with the code for outputting pages.

Advantages[edit]

Some advantages of allowing field-value pair markup in articles:

  • Simple syntax.
  • Allows expanding the functionality of a MediaWiki installation in a consistent way.
  • Allows automated creation of a static index, therfore quicker response for the user.

Disadvantages[edit]

  • Having Wikitext (the markup) that shows up in weird parts of the page, or not at all, is "unwiki". This can be mitigated by a links from section on a page.

See also: categorization with field-value pairs

Semantic web[edit]

How to describe what each keyword means?

This can be used to build the basis for adding semantic, machine-readable information to pages. If that is what is intended, this requires the use of name spaces for the metadata.

The basic idea of metadata is that you may declare an schema for the metadata your are going to add to the document, and then you use that schema. This is because we want flexibility, but at the same we want all users to agree on the keywords they are going the assign, otherwise it is useless.

Let's see an example of the proposed namespaces for keywords:

Book/Title=The title
Book/ISBN=99999999
Book/Author=John Doe

In this case, Book must be an existing page in the database, and Title can be either a sub-page of Book or a section heading of Book.

The use of namespaces for keywords (keywordspaces?) has several advantages:

  • Whenever a user creates a new keyword, he can explain other users what the keyword is about.
  • Multiple keyword markup can co-exist in the same page, such as multiple classification schema.
  • It will avoid us the need to reorganize the keywords later.
  • We can create specialized search engines (e.g.: search books by ISBN) in the future.
  • Inference engines can use this data in the future.
  • It is not mandatory to define the page with the schema (e.g.: Book) before adding keywords, but it is recommended to do so, so other can contribute to the classification schema.
  • Keyword spaces can support arbitrary nesting (e.g.: Book/Subjects/Major)
  • This is a clean solution in the line of Technical categories in Wikipedia

Whatever solution is chosen, it should include both keywords and categorization as two instances of the same problem.

Rendering[edit]

The metadata could be shown in a table, with links to the definition of each keyword.

Book
TitleThe title
ISBN999999999
AuthorJohn Doe

If the page is rendered using XHTML, it is very easy as modern browsers recognize only the namespaces they know:

<Book:metadata xmlns:Book="/wiki/Book?mode=schema">
  <Book:Title>The title</Book:Title>
  <Book:ISBN>9999999</Book:ISBN>
  <Book:Author>John Doe</Book:Author>
</Book>

If the page is rendered in HTML the metadata can be added as:

<meta name="Book:Title" content="The title">
...

The mode=schema page could represent the content of the node as a simple XML Schema, in which each sub-page (or heading) is an element defined by the schema. This is not necessary immediatly, but will be useful in the future.

Alternative syntax[edit]

An alternative syntax could be:

[[schema:Book b]]
[[b/Title:The title]]
[[b/ISBN:999999]]
[[b/Author:John Doe]]

This alternative syntax more similar to the usual XML syntax, declaring an XML namespace and then using it, but may be a bit slower to parse.

References[edit]

Antecedents[edit]

  • The interlanguage links already in MediaWiki are a kind of field-value pair, except with a different syntax.
  • Technical categories in Wikipedia is another proposal along the same lines as this one, with a slightly different syntax ([[field:value]] rather than [[field=value]]).
  • Series of articles discusses how to implement part-whole relationships with a syntax much like field-value pairs. Again, it uses a colon ":" rather than an equal sign "=".