Talk:Wikidata/Archive/Wikidata/historical

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

XML[edit]

For many database projects, you do want to define XML definitions to the data as well. This definition could then allow both data import and data export. This would really open up the data content.

For Wiktionary there are many people currently outside Wiktionary that will really welcome a better structured dataset. There are many resources on the web that we could integrate with if we have a mechanism. A database (stucturing the data) and mechanisms like XML are the ticket. :) GerardM 12:41, 17 Sep 2004 (UTC)

Wow - this is a really great idea! --Daniel Mayer 17:36, 17 Sep 2004 (UTC)

I concur with GerardM. Wiktionary is a mess in large part because the necessarily and rightly loose structure of Wikipedia is incompatible with the way a dictionary works. With something like Wikidata, we could begin to do real lexicography. I have a half baked structure to use, one flexible enough for minimalist entries and rich enough to do things no print dicitonary does. Please go forward with this. I can't write PHP and I don't know databases very well so I can't contribute much on the technical side, but I can contribute applications if the code is in place.

Diderot 13:52, 23 Sep 2004 (UTC)

Do you have a Wiktionary structure written down somewhere already? That would help in mapping out the requirements for Wikidata.--Eloquence
Sort of. I'm part of a team writing a commercial terminology application for a translation firm. What we've done is to adapt a structure that supports a much richer set of lexicographic needs while maintaining a lot more flexibility. Alas, that particular schema is not GFDL. However, I have an alternative - but similar - approach which has few IP encumbrances and some very different priorities. Ever since we started on this project, I keep having these feelings of this could be implimented on Wiktionary. I'll see what I can do for you in the next couple days. Failing that, I'm leaving on vacation on Sunday and will write something on the plane and post it from Canada.
In the mean time, and as an example of a quite extensive feature set for lexicography, take a look at TBX. This was our starting point. TBX is freely useable - it has no IP issues.
Diderot 18:55, 23 Sep 2004 (UTC)

Lisa[edit]

For use with Ultimate Wiktionary, the specifications from Lisa are relevant to dictionaries. As this is the standard used in the localisation industry it makes absolute sense to invest in it. TBX is an open XML-based standard format for terminological data. This standard provides a number of benefits so long as TBX files can be imported into and exported from most software packages that include a terminological database.

Maybe it is possible to support several standards, but I think it makes sense to use the standard that applies to a particular dataset. GerardM 21:59, 5 Apr 2005 (UTC)

Meta-object Facility (MOF)[edit]

The OMG Group has defined a specification called the Meta-Object Facility, which is a layered architecture for defining object meta-data which I think it will be useful to follow when implementing Wikidata. Basically, a Wikidata user must first define the structure of the data he will be creating in his Wiki- this is the model, or the meta-data. MOF already has a specification for a meta-model/meta-metadata language, which Wikidata can implement instead of having to create on its own.

OMG also has a XML spefication called XML Metadata Interchange (XMI) for exchanging meta-data. I'm not sure if it has any use in plain data exchange, however eventually supporting XMI will allow users of Wikidata to create their models using graphical UML tools.

BTW- it is indicated below that development has already begun on Wikidata. Is there a mail list, IRC channel, etc. to coordinate help from volunteers? Jleybov 19:16, 5 Apr 2005 (UTC)

Cooperatation vs. Integration[edit]

Seems very intersting, useful and complicated. I doubt a You-Can-Manage-Every-Data-With-MediaWiki-Software is the right choice for all. There are also free databases we can colaborate with. For instance I'd something linke Wikibibliography or Wikicatalouge where I can correct bibliographical data of libraries, but I do not want to copy all the data of millions of books into MediaWiki. See Linking to databases for a simple strategie that could lead to more cooperation with already existing databases. (more detailed in German here. Better cooperate with an already existing database than trying to reinvent the wheel once more. -- Nichtich 09:49, 19 Sep 2004 (UTC)

The thing with WikiData is that it allows for creating a database that integrates within the Mediawiki software. It will have a UI that will not require to change the skins everytime.
Point is this is technical functionality. Not what content it will be that is used. Your point is correct tough, wikimedia will not have a database for everything, and I expect that it will not be for everyone to create a new mediawiki dataproject. I expect that for each new project we will have prior discussions. GerardM 10:42, 19 Sep 2004 (UTC)
For integration I prefer not creating a database (there already are databases) but a simple protocol how to integrate Databases into MediaWiki. The Databases can be a Wiki itself but in many cases there are experts creating a databases that cannot be created by everyone. They have to make their data free so we can integrate the data, not the entire database. -- Nichtich 13:32, 20 Sep 2004 (UTC)
What are you trying to say; I do not understand. There are "experts" in our crowd. One of the problems with many databases is that they are fragmented or hard to reach or in a proprietary format. With a Wikidata, we will be able to host databases. I do expect that we will not host everything, but first define a need. When we cannot add value by hosting the data, I do not think we should. There is also a difference between defining the database and filling the database with content. The definitions will be done by the "experts" but filling the content takes another kind of expertise. Not all the content/databases will be interesting to everyone.
Really I do not really see what your point is. GerardM 17:27, 20 Sep 2004 (UTC)

Semantic Web[edit]

Sounds like connecting Wikimedia to the Semantic Web (I hope so). The idea of Semantic Web does only function if everybody gives his information for free, anyway. -- Nichtich 09:49, 19 Sep 2004 (UTC)

I sure hope so. It would be very interesting for the Semantic Web community to leverage the community of Wikipedia to create ... wow, I'm on a loss of words on what we would create... --denny 11:53, 1 Dec 2004 (UTC)

Software changes[edit]

Brion wrote: Please note that the statement previous [that software changes are required] is completely false. This would work similarly to templates and plugins such as TeX math bits already; it's supplementary to the main text editing work.

That depends on what exactly you are trying to accomplish. If you are talking about something like Magnus' Special:Data, with its own namespace, you are correct. However, I prefer a view where everything, including regular article pages, are wikidata and can be easily complemented with new fields, relations, etc. I also believe that wikidata fields must be easily indexable for performance reasons.--Eloquence
Everything is Wikidata... is that actually accomplishable with MediaWiki or would we need a new implementation? --denny 11:53, 1 Dec 2004 (UTC)
Mediawiki proper and Wikidata projects are apart. Yes, fields can be added to the Mediawiki data but this is already the case. Wikidata is to enable data that we want to host that cannot be served properly by Mediawiki alone. GerardM 18:14, 5 Apr 2005 (UTC)

Wikidatabase[edit]

Hi, it'll be great if this Wikidata project had open accessibility ports. All the informtion within wikidata could be accessed via open and defined MySQL queries from any user. Thus, you could easily implement wikidata access to any software / website programmed:

for example, imagine there would be a database within Wikidata that contains yearly statistics about precipitations (that is, rain) for several regions on the world. This database is openly accessible. Thus, you can either access this database with the Wikidata website, but also with a self-programmed software, that queries the MySQL database directly. On this way, the data within Wikidata would not be "locked away" to only the web interface.

Please tell me, what you think of my idea. Thanks, --Abdull 09:03, 24 Mar 2005 (UTC)

If this is going to happen, it will be VERY costly in hw resources. At this moment I do not envision this in the foreseeable future. First we have to make it work. GerardM 07:19, 27 Mar 2005 (UTC)
This would be a nice feature but it requires careful analysis of its security implications. For example, allowing arbitrary SQL queries to be run against a dataset creates an easy way to launch denial-of-service attacks- say, by doing Cartesian products on the largest tables with no filter criteria. Until a way to manage this is better understood probably best to just make uploading data-dumps more convenient so that people interested in data-mining can do it on their own hardware Jleybov 20:07, 27 Mar 2005 (UTC)

Timeline[edit]

Just a thought on a way to expand this proposal (which I like very much, BTW):

Create pages with content one would find on a timeline (e.g. George Washington marries Martha Washington) and tag it with the date it occurred as well as subjects it concerns
Create a software feature that could automatically create timelines from these pages, so that one could easily create a timeline of David Bowie's career or French history in the 1640s or whatever.

TUF-KAT 06:24, 27 Mar 2005 (UTC)

With all respect, I do not see what this has to do with embedding DATABASE functionality in Wikidata. Given also that each DATABASE project needs its own approval to start of with, I think this is not relevant to THIS subject. GerardM 07:17, 27 Mar 2005 (UTC)


Wiktionary and the status of Wikidata[edit]

Hello. The Spanish Wiktionary doesn't quite begin to take off because there has been a year-long discussion about the format of entries. Even though so far there's no agreement as to what new format to choose, people there majoritarily agree that the current structure used in other Wiktionaries is failed and doesn't meet the requirements to build good dictionary entries, so many (among them myself) don't see the point to start adding thousands of entries in a format that sooner or later will have to be changed because it simply doesn't work well for a dictionary. I myself have tried to develop a new structure to better organize the information within the dictionary entries, but I clearly see the root of the problem lies in the limitations of a software made and thought for encyclopedia entries (consisting of rather lengthy free-form texts divided into thematic sections and subsections) instead of for dictionary entries (consisting of rather short strings of data that fit well into pre-defined fields with a complex set of interrelationships, i.e. a typical database structure). I saw this issue had already been raised in the early days of the English Wiktionary, but since it hasn't been implemented there so far, I thought no one was really interested in developing the necessary software. Today I stumbled upon this Wikidata project and it's exactly the kind of thing that I believe Wiktionaries are in very bad need of. So I'd like to know if it is merely another proposal that might never come true (so that we'll have no choice but to go on building our Wiktionary upon the poorly-fitted freeform-text format), or if the new software is already on its way to become true (so that we might start thinking of a new dictionary-friendly structure for Wiktionary basing it on typical database capabilities like fields and relationships between fields). Thanks. Uaxuctum 05:51, 4 Apr 2005 (UTC)

Uaxuctum, I'm a semi-professional lexicographer, and I haven't contributed much to Wiktionary precisely because I think it's free form format isn't viable, and I haven't really had the time to wade in over the last year and argue for a better one. Much of the problem is with Mediawiki itself and its free form entry structure, and yes, Wikidata might fill the gap. But I don't know what the status of this project is.
But you've identified the right problem: Entries have a complex link structure which is not supported by Mediawiki, and require a quite fixed structure for much of the data in each entry.
Diderot 13:04, 5 Apr 2005 (UTC)
The Wikidata project is a required building block that will enable the Ultimate Wiktionary. This will enable an any to any dictionary and it will create a fixed structure for wiktionary content.
There is a budget for the programming of both Wikidata and Ultimate Wiktionary and the programming has started. GerardM 18:10, 5 Apr 2005 (UTC)
There are several competing ideas on what the UW database might look like. I have held back my ideas on META (ERD) as I really want to see what kind of thing people would like to see. GerardM 22:02, 5 Apr 2005 (UTC)

Great Project[edit]

Wikidata and the Ultimate Wiktionary deals with the most important restrictions that i have seen in Wikimedia projects. Good luck and lets hope it is scalable. Some remarks to keep in mind while programming:

Greetings, --Mononoke 12:00, 12 Apr 2005 (UTC)

The examples that you give, are all seperate wikidata projects. Wikidata and Ultimate Wiktionary are specific. One is additional functionality on top of Mediawiki, the other is a WikiData project specific to the Wiktionary project. GerardM 13:46, 12 Apr 2005 (UTC)

wikispecies?[edit]

Maybe wikispecies would be one very good candidate to think how it could use wikidata? --194.100.190.241 12:23, 27 Apr 2005 (UTC)

It is an ideal candidate. I have a database design for Taxonominal data that I would love to implement in Wikidata for Wikispecies.. GerardM 05:22, 28 Apr 2005 (UTC)
Wouldn't taxonominal data rather implemented as ontoloty? I think so. Therefore there's no need to use special Databases for it. So go end check out SemaWiki first. MovGP0 11:34, 18 November 2005 (UTC)

dealing with incomplete/nonstandard data[edit]

There should be also defined a way to designate that some datum in the form is missing, and a way to add a footnote to it, eg. if there is some controversy about an exact date. 84.42.132.48 20:35, 18 May 2005 (UTC)

Previously proposed projects[edit]

These were on Talk:Proposals for new projects, but the discussion indicated they should be merged into this project. I've move the discussion here in case there's anything here still relevant to Wikidata design and implementation. -- Beland 20:40, 30 May 2005 (UTC)

Slotipedia[edit]

WikiCatalog[edit]

(The WikiCatalog page is now a redirect here, to Talk:Wikidata.)

Millions of collectors worldwide use catalogs for their collections of coins, stamps, sports memorabilia, or whatever. And usually, they also maintain a collection list, which lists and describes the items of their collection. If there was a standard catalog format, it would be easy to combine all the collection lists into a huge catalog.

Of course, some types of collectibles are hard or impossible to catalogize, but e.g. for modern coins it's quite simple, because there is a known and manageable number of issues.

A free, standardized electronic catalog could not only be used for reference; it would also be easy to write collection management software which uses this data. With the click of a mouse, the user could produce a list of his/her missing items or doublets. And this software needn't be restricted to one collector on one computer. Send your list of doublets to your fellow collector who can immediately match this list whith his/her missing items list. Or combine the lists of all collectors to a global electronic trading platform... --Zumbo 22:27, 22 Feb 2004 (UTC)

As a coin collector, I think this is a fabulous idea. --Mero 03:07, 23 Feb 2004 (UTC)
I like the idea too \Mikez 17:36, 2 Apr 2004 (UTC) (stamps)
Interesting but I am not sure a wiki is the best means to store this kind of information. Anyway I have a already whole set of info about French stamps under a free licence (GPL), if needed. http://savannah.gnu.org/projects/stamps-sql/ Yann 21:44, 8 Jul 2004 (UTC)
I'd be behind this. There are a lot of collectables out there, and it would be nice to have wikipedia-style collaboration on cataloging them all. - DrakeCaiman 14:38, 4 Aug 2004 (UTC)
I'm a casual coin collector, and I think this is a cool idea. Maybe not the best name for it though - how about Wikicollection or something similar? Andrevan 14:20, 21 Aug 2004 (UTC)
Is it going to embrace the concept of Wikistamp?
sounds useful :) --217.228.149.40 03:55, 11 Dec 2004 (UTC)

I haven't worked on the project for a while, since I don't see how it could be implemented with the current MediaWiki software. A catalog is mainly table-based and not hypertext-based, therefore, the interface for entering data should consist of forms with fields that correspond to fields in a relational database. With the proposed Wikidata project, this might become possible. --Zumbo 19:26, 27 Feb 2005 (UTC)

I don't think this would be sensible before Wikidata exists. Angela 02:23, 11 Mar 2005 (UTC)

Not much will happen here before we have Wikidata, but meanwhile, I started an ambitious non-wikimedia-project which eventually will include a free and open coin catalog. Of course the catalog project could come back here as soon Wikidata exists. For details, see my page at http://moneta.zumbo.ch/. --Zumbo 01:15, 29 Mar 2005 (UTC)

One Step Further[edit]

First of all, this sounds like a great idea. We use MediaWiki at my work and it is a great tool. Database integration would really take it that last mile.

We were going to do some mods to MediaWiki very similar to what has been proposed. Here is the difference:

  • We store numerous types of documents in out wiki. So on each edit page, there would be a document class drop down box along with the usual edit box for data. Depending on which document class is selected via the drop down, another tab would appear next to the edit tab, called "edit data". This is where additional data could be edited based on the document class. The database fields would appear under this seperate tab.

So for example, if we have a knowledge base document in our wiki, there may be a category and sub category drop down box under the "edit data tab". In some ways the extra step of having to click on a new tab may not be desirable, but in other ways it simplifies the selection of the document class.

I am not asking anyone to run with this, but to just think a little more out of the box by allowing multiple "document types" within one wiki.

Masilver 14:37, 31 May 2005 (UTC)

We need it now[edit]

I'm in the Unification Encyclopedia Project, and we already need Wikidata for our employee database.

Or what about the list of (at the risk of being US-centric) the 50 states? State capital, state bird, year of entrance into the Union, license plate blurb / state motto. All these things would be easier to manage via a database. Okay, they don't change very often, so maybe that's not a great example. But how about the status of each applicant to the European Union, or the various countries voting for the Kyoto Protocol? Or sports team rosters, top ten movie lists, countries that need Tsunami relief (that's a rapidly changing one, 24 hours after the wave hits).

Our programmers have used tags, like <fetch>sort:lastname</fetch> to access MySQL queries, but the each has to be programmed on the back end, of course. Ed Poor 17:20, 21 Jun 2005 (UTC)

Huge searching capabilities[edit]

I'm pretty new to the Wiktionary/Wikipedia system, but I really agree with all these ideas to structure information in a more organized way . Not only would editing be easier (less time fixing format, more time making good articles), but the search engine could become really, really powerful. For Wiktionary, for example, you could look up a word, or you could do a multi-search of the different Wiktionary fields. The search box on the left would be expanded with search boxes for each data field (etymology, verb/noun/etc, definition, all the Tables for Wiktionary ideas.) You could search for a noun, starting with Ch*, that has 'latin' in the etymology field, that rhymes with this, that has a French translation starting with whatever... A search would return a bunch of words, maybe in a box on the right, showing all the possible entries in alphabetical or relevance order. I'm not a linguist, but I'm sure this would have huge possibilities. It would "easily" make Wiktionary the most useful of all the online dictionaries.

It would probably take a lot of server resources, though, since it would need to handle several times as many searches. I'm not sure how exactly all the searches would work, either. Would it just check if the search query was in the appropriate field? Would it have to read every single article in the Wiktionary and check if it matched the search criteria? That would take forever, and if the Wiktionary was this useful, it would probably recieve a lot of searches in a day.

Still, think of the possibilities! As soon as Wikidata is implemented, it would be possible to update the search function as well as the editing page. This could apply to any of the proposed Wikidata projects (movies, astronomy, whatnot) and seems to me to be a logical expansion of MediaWiki's capabilities. Good luck to the developers! --Sboots 16:47, 28 Jun 2005 (UTC)

As far as I can see every Wikidata project that is to be implemented needs approval before it is started. It is easy to start a database but it is hell to change an ill conceived one to something sensible. The proposed wikidata projects are exactly that; proposed. It does not make sense to start a database when the usability over the different projects has not been considered. It is exactly what will happen if we let wikidata projects start without a prior assesment of the requirement and the usability. GerardM 17:36, 28 Jun 2005 (UTC)
That's a good point. Also, someone mentioned above that Wikidata could be used for things like EU countries, states and provinces, etc. How would this fit in (or be accessed) with the existing Wikipedia? The single-text-box-editor seems adequate for Wikipedia, while a more segmented thing would be better for Wiktionary. Would these (Wiktionary-Wikidata versus Wikipedia) have to become separate programs? How could you put them side-by-side as one site?
On a separate note, I made a mock-up of what the search box could look like, based on Eloquence's mock-up on the content page. Mock-up with search box Any ideas? --Sboots 17:56, 28 Jun 2005 (UTC)

Relations namespace[edit]

Quoting from the current proposal:

A relations table:
    source_page_id   destination_page_id   relation_type
    ----------------------------------------------------
    301              302                   2
    => Germany is a neighbour of Poland
relation_types: 0=parent, 1=brothers/neighbours, 3=aunt ... whatever is useful

Instead of just a relation_type, I believe you'd actually want to have a "Relation:" namespace, for which each relation gets its own id, and data about the relation is stored (for example, at least the name). You could prepopulate this space with standard relations, such as w:Dublin Core or w:FOAF relationships. Of course, that's probably only an addition, since "relation_type" could just as easily be "relation_id". -- RobLa 17:20, 13 July 2005 (UTC)

I perfectly agree with this idea (and definitely with the namespace), but this seems to be closer to the concept of typed links (see e.g. this paper). This can be implemented without too much effort and should prove to be independend from Wikidata. Such links should also not be restricted to a particular area, but rather be used like "categories for links" which are introduced as community needs it. (Together with some nice searching, typed links can already replace many very peculiar categories that exist today). For more details, check out the newly created portal of the project Semantic MediaWiki. --Markus Krötzsch 19:12, 11 August 2005 (UTC)

A Real-World Implementation[edit]

I have implemented something similar to the project discussed on this article on the Case Wiki. Details about the implementation can be found at CaseWiki:XML Embedding Extension. I would love to hear feedback. --IndyGreg 18:50, 27 July 2005 (UTC)

'Professional' help?[edit]

I really like this database idea but it would be bad to put a lot of programming effort into it coding the wrong thing. Should we get a professional, like a Ph.D. in database design, to make suggestions too? When a conversation has the words 'semantic web' in it then you know it's a complicated idea ^-^. P.S. I'd want a database of all kinds of music and books, this sounds like an interactive list article. --AmoVictor 19:50, 1 August 2005 (UTC)

A professional is someone who gets payed. A Ph.D is somebody who successfully finished a study. Rest assured Erik has some qualifications and so do I. This does however not imply that we will produce the "ultimate" design but hey we will get all points for effort :) GerardM 21:16, 1 August 2005 (UTC)
One tip: design for things which exist. We can all come up with grand designs which sound great but this has to work and be efficient. This means avoiding things like putting a hundred different data collections or languages into a single database, for example. Jamesday 04:16, 8 February 2006 (UTC)

PS this other kind of professional help ... no thanks :)

Units of measurement[edit]

How is Wikidata going to address the problem of different units of measurement? If you just store a number in a database table, than you do not know whether it is "km" or "miles". Moreover, the user will need to see the expected unit when editing content, and she might even want to select a unit of her choice. Can this be achieved without creating too much dedicated difficult-to-maintain code (built-in conversion tables for all kinds of units do not seem to be an option; there are just too many units)? --Markus Krötzsch 19:29, 11 August 2005 (UTC)

The Problem you describe is maybe out of scope of this Project and so do Units in semantic Wikis.
For me it seems that correct Unit-handling is a Problem for the future Wiki Generation after Datawiki and even after introducing semantic Wikis. I think further, that this needs another Project for a mathematic Wiki.
MovGP0 11:33, 14 November 2005 (UTC)

Back from the discussion about units in SemaWiki, it seems that we need a possibility to store parsed Attributes in a Database. As stated in [1] we need the possibility to parse strings and store them in a Datatable specialised to the datatype to be searchable when storing as plain string can't. I felt that this is a point where the developers of Wikidata and Semawiki will need to work together. Otherwise it might be likely that we end up in incompatible standards when trying to define new Datatypes.
MovGP0 12:09, 18 November 2005 (UTC)

Knowledge gaps and imprecise data[edit]

How could such a system deal with incomplete, imprecise, or ambiguous data? Some examples:

  1. Most birthdates are precise, but in historic contexts, the known birthdate might be incomplete (usually just the year). Yet, creating three data fields for year, month, and day is not very convenient (for the interface) and neglects the relationship between the values (so ordering dates by their time would require special processing to work).
  2. Some data is only known approximately, and often one has to deal with ranges that give lower and upper estimates for a real value. How could this be dealt with?
  3. Sometimes "the" value of some property for a given subject is not unambiguously clear, and depends on the precise interpretation of the subject. An example are the three sizes of France. Can this be dealt with? One possibility is of course to have multiple separate articles in such cases (e.g. three articles on the different "Frances"). Is this possible in every situation?

--Markus Krötzsch 19:29, 11 August 2005 (UTC)

imprecise Data may modelled by give all possible values. then we take the min and max Values as borders. But the interpretion should rely on the user. This is ie. the case when asking for the color of Apples (red, or green, or yellow, or all at once?)
In case of France the 3 different sizes rely on historical changes - this means we need some time based semantic markup. But that's your project, right?
see also: above statements about units and mathematic.
MovGP0 11:36, 14 November 2005 (UTC)

Database structure in 3rd NF[edit]

Classes and Attributes[edit]

We should define the object-types (classes) in the database not just as "namespaces". Each class has a namen and a bunch of attribute-types.

A class table:

   class_id    name
   --------------------------------------------------------------------------------
   0           article
   402         country
   403         film
   404         person


An attribute table:

   attribute_id  class_id   name          type
   --------------------------------------------------------------------------------
   1              0         'text'        'L'          (longtext)
   2              402       'flag'        'S'          (shorttext)
   3              402       'population'  'N'          (number)

(This two tables need a revision-save way for changes)

changed tables[edit]

A pages table:

    page_id    page_name    page_namespace  top_revision
    ----------------------------------------------------
    300        Monkey       0               2043
    301        Germany      402             2044
    302        Poland       402             4893
    => an article on Monkeys, two sets of country data


A data-longtext table:

    page_id  revision_id   attribute_id   value                   
    -------------------------------------------------------------------------------
    300      2042          1              A monkey is an animal...


A data-shorttext table:

    page_id   revision_id  attribute_id   value
    ---------------------------------------------------------------
    301       2044         2              [[Image:Germany-flag.png]]

A data-numbers table:

    page_id  revision_id   attribue_id    value
    ------------------------------------------------------
    301      2044          3              80000000
    301      2040          3              75000000

A minimalist approach[edit]

Not knowing when a Wikidata prototype will be ready for testing, here is a minimalist approach. Assuming that cross-indexing of information in the infoboxes is a real need, and assuming all infoboxes are built with templates, every invokation of a template could be indexed in a simple database table. This would be done when a page is saved and the parser finds a template with parameters, e.g. page pagename contains {{tempname|field1 = value1 | field2 = value2}} . These would be stored in one big template-page-field-value table (fully indexed) where they can later be searched:

template   page      field     value
--------   --------  ------    ------
tempname   pagename  field1    value1
tempname   pagename  field2    value2

This minimalist approach doesn't support editing, which would still be made in the individual pages. --LA2 21:23, 15 August 2005 (UTC)

I'd like to see this approach taken, it gets us that first step now, by basically attaching meta-data to the pages. You'd have to have a "heres what these types of pages should have for meta-data" type of document, but established Wikis would have that. Newer wikis would suffer from unconfirmity, and to that the best answer would be to simply NOT use the meta-data functionality until a minimal set of pages existed, and a common set could be formalized... and hopefully by then, we'll have this WikiDB ;-) --Funkdubious 18:27, 10 November 2005 (UTC)
Have a look at User:HappyDog/WikiDB. I've been looking at a slightly more sophisticated version of this 'embedded data' approach (i.e. data is simply tagged page content) --HappyDog 02:20, 14 November 2005 (UTC)

Hmm... if the same (infobox) template is called twice from the same page, there also needs to be a column to identify the calls. I propose a simple sequential number. This number could be unique for the tempname+pagename combination or for the pagename alone. The important thing is that value2 (in the example below) is not associated with value3 because they belong to different calls. --LA2 02:43, 30 August 2006 (UTC)

template   page      seqno   field     value
--------   --------  -----   ------    ------
tempname   pagename    1     field1    value1
tempname   pagename    1     field2    value2
tempname   pagename    7     field1    value3
tempname   pagename    7     field2    value4
This is now implemented as user:LA2/Extraktor. --LA2 11:30, 30 August 2006 (UTC)

My thoughts on a WikiDB[edit]

Hi! I've been thinking quite a bit about how a WikiDB might be implemented, and made a bunch of notes on my private wiki. I just found this page and thought I'd open them up for comments. I've copied where I'm up to into User:HappyDog/WikiDB. Please take a look at it and let me know what you think. Either leave comments on the talk page, or just plunge in and edit the text. I would be interested to hear what people think - my approach is a little different to what has been talked about here (I think). --HappyDog 01:27, 30 August 2005 (UTC)

External programs enhanced by access to the database (Like Gracenote's music database)[edit]

There is an increasing need for programs to access external databases for data. For example, with gracenote you can access their database to find out information about songs you have such as the length, release date, etc. Some other collection programs also have a database which users can access (if you pay) to tell them more about their books / games etc.

Could Wikidata be expanded, if it isn't in the scope now, to include the ability for people to access the database to enhance database/organization programs they use? If the interface is built in then anyone can contribute to the database and make their own allowing programs to access it. --ShaunMacPherson 15:27, 10 September 2005 (UTC)

GIS & Wikidata[edit]

Since version 4.1 mysql suports geographical Data according to the OpenGIS-standards. Of course there are other (free) databases like postgis/postgresql. I think that are basics wikidata is bound to. According to the existence of these developments on spatial extensions for databases a gis database for wikipedia/wikidata should be developed on its own.

134.106.146.46 08:02, 27 September 2005 (UTC) =de:Benutzer:Arcy

Poor-man's approach[edit]

An alternative approach to storing semistructured data in separate tables is to store the data as XML in the document itself. This approach is much less invasive, and can be implemented in MediaWiki in a fairly straightforward manner. The documents in each namespace can conform to a specific DTD, and namespace-specific editors and viewers can be used to edit/display the data. You can use Lucene to index the XML elements on the pages as separate fields. We're currently nearing completion of a basic implemention. I'm planning on making it available in about a month when it's completed; I could share it earlier if you're interested.

Dallan 04:53, 2 November 2005 (UTC)

I don't think so. From the architectural standpoint of a programmer its better to have all data in a central database. At most you can cache the pages as long they didn't change. For accessing the Data as XML I recommend to use Webservices instead. This would allow to access the Data in raw XML format without the article as overhead. Furthermore it would allow to develop own applications that can acces this data. MovGP0 00:45, 18 November 2005 (UTC)

Wikidata:What for?[edit]

Let me say, that I think that Wikidata could be a creat project. But I felt that there is to much of legends instead of documentation.

So let me first ask: What is the scope of this Project?

As far as I have seen till now, there are two big answeres to this question:

  1. Create a separate Database and Interface Standard for proper Translations in Wiktionaries, so that there is only one - instead of hundred redundant ones.
  2. Create a possibility to create Datatables witch should replace the current Infoboxes.

I' excited about both ideas, but I'm missing some documentation describing how they should work. Therefore I'm not really interested in the current structure of the Database - witch is changing anyway during implementation (I think most Wikipedians will think like me in that point). But I'm very interrested in the how-will-it-work.

First of all Wikidata does not have its own structure, it allows for the creation of a structure.

So let me ask:

  • how can a user add a table?
  • how to use it as infobox?
  • can I create Naviation-Lists with Wikidata? And when so: how?
  • and how will the mapping between Article and Datatable/Datarow work? Can the Infobox query the datarow from the Title of the Article?

MovGP0 11:59, 18 November 2005 (UTC)

Can any programmer in this project - please - make some answers to my questions? Simple ones are ok to. I currently feel like this would be a dead project.
MovGP0 19:50, 8 December 2005 (UTC)

The question who can add a table is very much to be decided. Creating a relational database is not really straight forward. It does not make sense that the same data is created in every language version of a project. There may also be a need for custom code to make a particular design possible. When you want taxoboxes, it would make sense to have them originate from WikiSpecies. When you want Popeboxes or Royalty boxes, or Infoboxes on countries, cities ed, you only need to host the data once and you want this data localised, there is not one obvious place for them.

In my personal opinion I do not have a clue how this will be resolved. Now as to the practical how, Erik will produce a paper that will explain many of these things. GerardM 23:46, 8 December 2005 (UTC)

Hmm - waited very long, but I can't see anything of a paper yet. Maybe I'm impatiently, but I would like to see it. I would even be happy to get just a draft of this paper. MovGP0 19:08, 20 December 2005 (UTC)
Indeed waiting is hard. There is some wikidata stuff life here. Documentation here and a RFC here. Things are certainly moving, there are however many steps to be taken.. GerardM 21:52, 4 February 2006 (UTC)
That looks like something Erik and I discussed more than a year ago. Here's roughly what I said then and what I just put on the talk page of that RFC:
Erik, is this your idea of storing all the data in one huge single database for all languages again? Don't even think of doing it - it'll fail disastrously because of contention issues within the database server. It's already bad enough with things split by language and splitting by namespace may be necessary for the biggest wikis.
Storing everything in one database is completely impractical and shows a major lack of experience with the load properties of the database server. The current structure is the one which scales well and can handle the loads we need. All in one database is pretty, it just has terrible load properties. Trying to support our distributed structure with master databases on different continents close to their main ariticle authors would also be "interesting" if there was only one database, adding significantly to response times compared to local placement. Instead, it's possible to consolidate content pulled form different distributed databases and display or search them in a combined form. See the "Friends Page" at LiveJournal for one example - each entry can be pulled from a journal on a different database server, then combined and it does this routinely and efficiently at loads similar to ours. A typical friends page is likely to involve five to ten different database servers, each sharing the work and spreading the load. Jamesday 04:43, 8 February 2006 (UTC)
As I have replied there:

This should really be an issue of using the correct abstraction. Whether the data is held in a single database or split across several should be transparent to clients of the "database services" layer. IMHO we should be striving for further abstraction so that alterations in one part of Mediawiki have minimal impact on other parts.

HTH HAND —Phil | Talk 10:46, 8 February 2006 (UTC)
Hello James. No, this is not a proposal to merge existing databases; it is a proposal to support multiple languages in a single database in addition to multiple languages in split databases. The most likely use scenarios in the Wikimedia context would be Meta and Commons, both multilingual wikis using a single database which lack multi-language support.--Eloquence 09:09, 9 February 2006 (UTC)

Syntax Proposal[edit]

Because I've found nothing about the syntax in Wikidata, I've decided to begin with a simple proposal how Wikidata should work for Infoboxes. Comments are welcome! MovGP0 19:08, 20 December 2005 (UTC)

To speak about about a possible syntax, we might want to look at Kendra Base, wtich hase some good ideas. The use in Wikidata might look similiar. To make an Example we first define a new Template within the Commons Space to make sure it's avilable to all the other wikis.

Example:

First we define a new Table at commons called Template:Example

{|
| @Country || @@Country
|}

Note, that the two attributes have the same Name, witch indicates that they are relatet. This might be representet in a Database-Table like the following:

WikiDataDB_TableTemplates
TemplateID Field Article Title Content
commons:Template:Example Country en:United States Name United States
commons:Template:Example Country de:Vereinigte Staaten Name Vereinigte Staaten

Now we insert the Template into the Article en:United States and de:Vereinigte Staaten.

{{commons:Template:Example}}

For this example we think about how this might work for the Article en:United States. The Parser filters the Table for the name of the Article:

SELECT Title, Content 
FROM WikiDataDB_TableTemplates AS db 
WHERE db.Article == "en:United States" -- The Name of the current Article
  AND db.TemplateID == "commons:Template:Example" -- The Name of the Template to insert
  AND db.Field == "Country" -- The Name reffered to with the @-Sign

This gives the following response:

Title | Content
------+--------------
Name  | United States

Now the Parser can substitude all Occurencies of @Country with the Value of the Title Column "Name" and the Occurencies of @@Country with the Value of the Content Column "United States".

Therefore the Article en:United State will now contain the following Table:

{|
| Name || United States
|}

Note:

The given Example don't solves the Translation-Problem. It just ensures, that there is just one Infobox-Template per Purpose in the whole Wikipedia. MovGP0 19:08, 20 December 2005 (UTC)

machine-readable citations, & "resource lists"[edit]

I'd like to suggest 2 things here: machine-readable citations -- and "resource lists" instead of "bibliographies" -- they're related.

Digital info users increasingly use bibliographic software, or other databases, for "book" citations. These work from tagged formats: one field containing recognizable data tags such as "subject" or "author" or "publisher" etc., each associated with a second field containing the content -- examples can be found in any library online catalog, such as the University of California's Melvyl, at http://melvyl.cdlib.org, or (slightly-different format, see below) the Bibliothèque nationale de France's OPALE, at http://catalogue.bnf.fr, and many others.

That problem of the "slightly-different format", tho, is a major one. I believe the best thing to do, to avoid that difficulty and its endless controversies on Wikipedia, would be to Keep It Simple and just provide taggedfield/contentfield, and for just a few very basic tags.

People download those -- copy & paste -- for later use at the library or bookstore or online or wherever. Sure beats re-keying all of the data into their pc-resident software.

This said, there's more now than just "books". These "bibliographies", which get thrown in now unformatted at the ends of all sorts of Wikipedia articles, need to be expanded in concept & format to include multimedia: printed paper & cardboard books still, yes, but also journals and newspapers and audio and video and other -- nowadays, and increasingly, it is pretty easy to find a lot, online, by and about some "author", including her books & journal articles & newspapers & audio presentations & video presentations, and the great beauty & advantage of digital info is that all of this may very simply be "linked", and it should be. But it's a "resource list"... See the Saskia Sassen "resource list" I've done on en.Wikipedia as an example, at [2]: Wikipedia can offer better access to the author & her work than any print library ever could have dreamed of... "universal bibliography", or nearly... and it's dynamic instead of static...

If discussion of all this already has or now is going on elsewhere here, somewhere, I'd be grateful for pointers as to exactly where?

--Kessler 19:42, 4 February 2006 (UTC)

Wikidata - current state of existence?[edit]

Does anyone know where Wikidata is being discussed currently? All those links appear to be untouched since June 2006, and the wikidata-l mailing list is utterly silent. Omegawiki is not an official Wikimedia project (? so why does OmegaWiki list all our sister projects at the bottom of its mainpage?), but Wikidata isn't even mentioned at the foundation site... Thanks for any help; and updates to relevant project pages would be even better :) Quiddity 19:25, 16 June 2007 (UTC)