User talk:HappyDog/WikiDB

From Meta, a Wikimedia project coordination wiki

Editing table definitions[edit]

The basic idea is what will be brought to you by Wikidata. There is however one think in your proposal that is just not feasible; "Should follow the wiki principal of on-the-fly editability. This must apply at all levels, including table definitions". When you change the data design, you stand to lose information in the process. It would be really nice to have it from a philosophical point of view but we do have people that do vandalism. Allowing for on the fly editing is of less benefit than the surity that the data entered will be there, complete and usefull. GerardM 06:54, 30 August 2005 (UTC)[reply]

Actually, the point is that changing the design will not lose any data! At worst it will change how data is displayed (it will rarely change what data is displayed). The table structure provides a way of validating and formatting data only, but leaves it for humans to fix errors that have crept into the data - similarly to the way that broken links or incorrect information are handled in Wikipedia. Additionally, you can easily protect a table if it is at risk of vandalism, in the same way that you can protect any other page. --HappyDog 11:43, 30 August 2005 (UTC)[reply]
When someone "decides" to remove a field, a whole column of data will be gone. So I do not understand how you can say that changing a design will not have us lose data. Chaniging a table structure does explicitly change the data that you store. GerardM 14:16, 30 August 2005 (UTC)[reply]
There are three places where someone can 'remove' a field:
  1. From the table definition - There will no longer be any validation or formatting contraints on that field. All records will still contain the data for that field, because they are simply wiki-text held in the body of an article. All views that display that field will still display data from the field, but the formatting of the data may have changed.
  2. From a data record - The field will be blank for that record.
  3. From a view - The data is no longer displayed in that view, but it is still part of the record.
In my proposal, data and table definitions are completely decoupled, which means that the Wiki paradigm can be easily maintained. Try re-reading the intro to the Table Definitions section. Perhaps it is not clear enough - let me know if it needs further clarification. --HappyDog 14:27, 30 August 2005 (UTC)[reply]
How would you then treat the changing of the definition of a field ?? GerardM 16:09, 30 August 2005 (UTC)[reply]

The key thing with this idea is that, to a certain extent, you need to forget what you currently know about databases. Instead of having a set of tables and fields, with properties that dictate the type of data you would find there, in this model you have sets of data, over which you can define a structure. Let me try an example.

Scattered about the wiki, you have the following (simple and made up) data sets:

(Table:Companies) Company name:Microsoft, Founded:1492, Location:Seattle, Annual revenue:$8
(Table:Companies) Founded:April 2005, Revenue: $7.42, Company name:Apple, Logo:Image:Apple logo.png
(Table:Companies) Name:Intel, Fouded:USA, Logo:Image:Intel_logo.jpg

That data exists. It is in the wiki. You edit the data by editing the article that contains the data. That article may be protected, in which case the data it contains is locked. The data is just wiki text (though the above syntax is used for clarity, it is not the syntax that would be used).

You can display data from this table without defining a table definition. Here are two examples:

 <repeat from=Table:Companies>
     '''{{{$Company name}}}''' - Founded: {{{$Founded}}}, Revenue: {{{$Annual revenue}}}
 </repeat>

 <repeat from=Table:Companies></repeat>

which, with the above datasets defined, might display:

Microsoft - Founded: 1492, Revenue: $8
Apple - Founded: April 2005, Revenue: ??
?? - Founded: USA, Revenue: ??
Company NameFoundedLocationAnnual revenueRevenueLogoName
Microsoft1492Seattle$8   
AppleApril 2005  $7.42Image:Apple logo.png 
 USA   Image:Intel_logo.jpgIntel

Note that if no content is given for the repeat tag, a default view of the whole table is generated. Also, note that the order the records are displayed is undefined (an order may be specified in the repeat tag, however).

A companies table could be defined in the following way:

Table:Companies
:Company name:string
:Logo:image
:Founded:date
:Location:string
:Revenue:currency

If you then displayed the above views, you would get something like the following:

Apple - Founded: April 2005, Revenue: $7.42
Microsoft - Founded: 1492, Revenue: ??
?? - Founded: (invalid), Revenue: ??
Company NameLogoFoundedLocationRevenueAnnual revenueName
AppleImage:Apple logo.pngApril 2005 $7.42  
Microsoft 1492Seattle $8 
 Image:Intel_logo.jpg(invalid)   Intel

Note that records are now sorted by the columns in the order they are defined, and the columns are ordered in the order they are defined.

You can see that there are several records that have badly named fields. You can either go into the wiki and fix these (like you would fix broken wiki-links) or create field aliases (like creating redirects). If you add the following to the table definition:

:Name [[#Company name]]
:Annual revenue [[#Revenue]]

The output of the views will now be:

Apple - Founded: April 2005, Revenue: $7.42
Intel - Founded: (invalid), Revenue: ??
Microsoft - Founded: 1492, Revenue: $8
Company NameLogoFoundedLocationRevenue
AppleImage:Apple logo.pngApril 2005 $7.42
IntelImage:Intel_logo.jpg(invalid)  
Microsoft 1492Seattle$8

I hope this clarifies things a bit - we retain the wiki concept by decoupling data and structure.

--HappyDog 12:14, 31 August 2005 (UTC)[reply]


Localisation[edit]

One thing that is missing in your proposal is the localisation of information; we do not have the data only for en.wikipedia, we can have it for all of wikipedia. GerardM 06:54, 30 August 2005 (UTC)[reply]

My thoughts were not particularly focussed on the WikiData concept of a single centralised database being used by various projects (I didn't find out about that page until yesterday). However, various bits of localisation are already possible within the structure proposed. As far as I can see, there are two parts of the DB that will need to be localisable. Firstly the field names themselves (and any other meta-data, e.g. enums and field comments shown when filling in the data via a form), and secondly the actual data stored in the table, of which some is international (e.g. dates) and some are language-specific (e.g. Film Title).
The first issue can be solved by using aliases (e.g. defining the field 'nom' as an alias for 'name'. The syntax would need to allow for a language to be specified for this to be properly useful.
The second issue can be solved by using sub-classing, as described at the end of the proposal. The main table definition will define all core data, whilst locale-specific data can be stored in a table that sub-classes the core table and adds some extra fields of it's own.
Ideally tables would be inter-wiki linkable, and data shareable between wikis. E.g. the English Wikipedia could have a 'films' table defined containing:
Release date:date
Title:string
and the French wikipedia could contain
Date de sortie:date [[en:Table:Films#Release date]]
Titre:string
Titre Anglais:string [[en:Table:Films#Title]]
This is not a trivial thing to implement though! --HappyDog 11:43, 30 August 2005 (UTC)[reply]
I would have apreciated your example better if the primacy of the data would have been with the French Wikipedia. I think that it is indeed non trivial.. GerardM 16:09, 30 August 2005 (UTC)[reply]