Jump to content

Wikidata/Archive/Describing wikidata structures

From Meta, a Wikimedia project coordination wiki

Depending on how the Wikidata proposal is implemented, there may be a need for a format to describe the structure of wikidata tables. This page proposes a format that uses the pipe syntax wikicode. Obviously, the wikidata implementation would ignore attributes (like border="1"), table headings (beginning with |+), and row/column headings (beginning with !).

Examples

[edit]

Movie

[edit]
This mock-up shows what editing an instance of the structure on the left might look like
Year year
Tagline line
Plot summary multiline
Cast substructure-array
Actor line autolink
Character line autolink
Runtime number minute
Country line autolink
Color enumeration
Technicolor
B&W

Element

[edit]
Symbol line
Atomic number number
Chemical series enumeration
Alkali metals
Alkaline earth metals
Lanthanides
Actinides
Transition metals
Poor metals
Metalloids
Nonmetals
Halogens
Noble gases
Atomic mass number g/mol
Electron configuration line
Electrons per shell number
Phase enumeration
Solid
Liquid
Gas
Density at 0°C number g/L
Melting point (at 2.5 MPa) number K
Boiling point (at 2.5 MPa) number K
Heat of fusion number kl/mol
Heat of vaporization number kl/mol
Heat capacity (at 25 °C) number J/(mol·K)
Crystal structure enumeration
triclinic
monoclinic
orthorhombic
hexagonal
rhombohedral
tetragonal
cubic
Atomic radius number pm
Covalent radius number pm
Van der Waals radius number pm
Unstable isotopes substructure-array
Atomic mass number g/mol
Natural abundance number
Half life number y
Decay mode enumeration
alpha decay
β− decay
β+ decay
electron capture
proton emission
spontaneous fission
Decay energy number MeV
Decay product line

Software

[edit]
Developer line
Latest version line
Release date of latest version date
OS enumeration-other
Cross-platform
Linux
FreeBSD
NetBSD
OpenBSD
Mac OS 8/9
Mac OS X
Windows
Genre enumeration-other
Antivirus
Audio editor
Automation
CD copying software
Database management system
Emulation
License enumeration-other
BSD License
GPL
LGPL
Sun Public License
Zope Public License
Website url

Star

[edit]
Mass number kg
Radius number km
Luminosity number L☉
Surface temperature number K
Age number y
Notable features multiline
Spectral type enumeration
O
B
A
F
G
K
M

Type keywords

[edit]

The middle column in the examples above contains a keyword that describe the datatype of the field.

Table of keywords
Keyword Meaning
number The third column of the structure description contains the units of this number, if applicable.
year Like a number, except it's autolinked and has no unit information
line The field should be edited with an <input type="text">. If the third column contains autolink, then when the field is automatically linked. Otherwise the third column should be empty.
url Like the line type, but it is automatically linked as an external link.
multiline The field should be edited with a <textarea>...</textarea>
enumeration The third column of the structure description is a sub-table where each row is a possible value for the enumeration. When editing an instance of this structure, the user is presented with a combo-box of the possible values.
enumeration-other Like an enumeration, except there's an Other option in the combo box, and a text field next to the combo box for entering something that isn't in the enumeration.
substructure-array The third column is a nested structure description. The only restrictions on the structure description are that it can't have fields of type multiline or substructure-array, and it can't have more than 10 fields.

Consider the example of the Movie datastructure. It has a field called cast that is of type substructure-array. The substructure has two fields--Actor and Character--both of type line. This can store a mapping of actors to characters.

boolean A check box is used to edit this field.
date A day/month/year, but not a time

PHP Functions

[edit]

Table Parser

[edit]

This function takes a string containing wikicode as an argument, and returns a 2-dimensional array of strings representing the first table found in the wikicode.

function wikidata_parse_table($code) {
       $lines = preg_split('/\\r?\\n/', $code);
       $level = 0;
       $array = array();
       $row = 0;
       $col = -1;
       foreach ($lines as $line) {
               if (preg_match('/^{\\|/', $line)) {
                       $level++;
                       if ($level > 1 && $col > -1) {
                               $array[$row][$col] .= "\n" . $line;
                       }
               } else if (preg_match('/^\\|-/', $line)) {
                       if ($level == 1) {
                               $row++;
                               $col = -1;
                       } else if ($level > 1 && $col > -1) {
                               $array[$row][$col] .= "\n" . $line;
                       }
               } else if (preg_match('/^\\|}/', $line)) {
                       if ($level > 1 && $col > -1) {
                               $array[$row][$col] .= "\n" . $line;
                       }
                       $level--;
                       if ($level == 0) {
                               return $array;
                       }
               } else if ($level == 1
                       && preg_match('/^\\|(.*)$/', $line, $matches)) {
                       $columns = explode("||", $matches[1]);
                       foreach ($columns as $column) {
                               $col++;
                               $array[$row][$col] = $column;
                       }
               } else if ($level > 1 && $col > -1) {
                       $array[$row][$col] .= "\n" . $line;
               }
       }
       return NULL;
}