Depending on how the Wikidata proposal is implemented, there may be a need for a format to describe the structure of wikidata tables. This page proposes a format that uses the pipe syntax wikicode. Obviously, the wikidata implementation would ignore attributes (like border="1"), table headings (beginning with |+), and row/column headings (beginning with !).
Examples [edit]
This mock-up shows what editing an instance of the structure on the left might look like
| Year |
year |
|
| Tagline |
line |
|
| Plot summary |
multiline |
|
| Cast |
substructure-array |
| Actor |
line |
autolink |
| Character |
line |
autolink |
|
| Runtime |
number |
minute |
| Country |
line |
autolink |
| Color |
enumeration |
|
Element [edit]
| Symbol |
line |
|
| Atomic number |
number |
|
| Chemical series |
enumeration |
| Alkali metals |
| Alkaline earth metals |
| Lanthanides |
| Actinides |
| Transition metals |
| Poor metals |
| Metalloids |
| Nonmetals |
| Halogens |
| Noble gases |
|
| Atomic mass |
number |
g/mol |
| Electron configuration |
line |
|
| Electrons per shell |
number |
|
| Phase |
enumeration |
|
| Density at 0°C |
number |
g/L |
| Melting point (at 2.5 MPa) |
number |
K |
| Boiling point (at 2.5 MPa) |
number |
K |
| Heat of fusion |
number |
kl/mol |
| Heat of vaporization |
number |
kl/mol |
| Heat capacity (at 25 °C) |
number |
J/(mol·K) |
| Crystal structure |
enumeration |
| triclinic |
| monoclinic |
| orthorhombic |
| hexagonal |
| rhombohedral |
| tetragonal |
| cubic |
|
| Atomic radius |
number |
pm |
| Covalent radius |
number |
pm |
| Van der Waals radius |
number |
pm |
| Unstable isotopes |
substructure-array |
| Atomic mass |
number |
g/mol |
| Natural abundance |
number |
|
| Half life |
number |
y |
| Decay mode |
enumeration |
| alpha decay |
| β− decay |
| β+ decay |
| electron capture |
| proton emission |
| spontaneous fission |
|
| Decay energy |
number |
MeV |
| Decay product |
line |
|
|
Software [edit]
| Developer |
line |
|
| Latest version |
line |
|
| Release date of latest version |
date |
|
| OS |
enumeration-other |
| Cross-platform |
| Linux |
| FreeBSD |
| NetBSD |
| OpenBSD |
| Mac OS 8/9 |
| Mac OS X |
| Windows |
|
| Genre |
enumeration-other |
| Antivirus |
| Audio editor |
| Automation |
| CD copying software |
| Database management system |
| Emulation |
|
| License |
enumeration-other |
| BSD License |
| GPL |
| LGPL |
| Sun Public License |
| Zope Public License |
|
| Website |
url |
|
| Mass |
number |
kg |
| Radius |
number |
km |
| Luminosity |
number |
L☉ |
| Surface temperature |
number |
K |
| Age |
number |
y |
| Notable features |
multiline |
|
| Spectral type |
enumeration |
|
Type keywords [edit]
The middle column in the examples above contains a keyword that describe the datatype of the field.
Table of keywords
| Keyword |
Meaning |
| number |
The third column of the structure description contains the units of this number, if applicable. |
| year |
Like a number, except it's autolinked and has no unit information |
| line |
The field should be edited with an <input type="text">. If the third column contains autolink, then when the field is automatically linked. Otherwise the third column should be empty. |
| url |
Like the line type, but it is automatically linked as an external link. |
| multiline |
The field should be edited with a <textarea>...</textarea> |
| enumeration |
The third column of the structure description is a sub-table where each row is a possible value for the enumeration. When editing an instance of this structure, the user is presented with a combo-box of the possible values. |
| enumeration-other |
Like an enumeration, except there's an Other option in the combo box, and a text field next to the combo box for entering something that isn't in the enumeration. |
| substructure-array |
The third column is a nested structure description. The only restrictions on the structure description are that it can't have fields of type multiline or substructure-array, and it can't have more than 10 fields.
Consider the example of the Movie datastructure. It has a field called cast that is of type substructure-array. The substructure has two fields--Actor and Character--both of type line. This can store a mapping of actors to characters.
|
| boolean |
A check box is used to edit this field. |
| date |
A day/month/year, but not a time |
PHP Functions [edit]
Table Parser [edit]
This function takes a string containing wikicode as an argument, and returns a 2-dimensional array of strings representing the first table found in the wikicode.
function wikidata_parse_table($code) {
$lines = preg_split('/\\r?\\n/', $code);
$level = 0;
$array = array();
$row = 0;
$col = -1;
foreach ($lines as $line) {
if (preg_match('/^{\\|/', $line)) {
$level++;
if ($level > 1 && $col > -1) {
$array[$row][$col] .= "\n" . $line;
}
} else if (preg_match('/^\\|-/', $line)) {
if ($level == 1) {
$row++;
$col = -1;
} else if ($level > 1 && $col > -1) {
$array[$row][$col] .= "\n" . $line;
}
} else if (preg_match('/^\\|}/', $line)) {
if ($level > 1 && $col > -1) {
$array[$row][$col] .= "\n" . $line;
}
$level--;
if ($level == 0) {
return $array;
}
} else if ($level == 1
&& preg_match('/^\\|(.*)$/', $line, $matches)) {
$columns = explode("||", $matches[1]);
foreach ($columns as $column) {
$col++;
$array[$row][$col] = $column;
}
} else if ($level > 1 && $col > -1) {
$array[$row][$col] .= "\n" . $line;
}
}
return NULL;
}