Talk:Wikidata/Notes/ContentHandler

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Extensions[edit]

Some extensions might assume wikitext so it could perhaps be necessary to explicit declare which type of native formats they can handle? That is, if nothing is declared it is assumed the extension only apply to wikitext. — Jeblad 15:55, 26 March 2012 (UTC)

as I tried to exmplain in the Backward Compatibility section, extensions that use the old text-based method to access revision text will get null by default on non-wikitext pages. They will simply not see any non-wikitext content. The same goes for the web API used by bots. Now, if an extension accesses the text directly in the database, that will break. But extensions like that are broken by design anyway. -- Daniel Kinzler (WMDE) (talk) 18:57, 26 March 2012 (UTC)

Protection[edit]

Some native format might imply protection, for example javascript and stylesheets should imply full protection except from the user if they are subpages in the user namespace. Perhaps some types of native formats also have limitations on where they should be allowed? — Jeblad 15:59, 26 March 2012 (UTC)

The need for full protection doesn't arise from the fact that there's JS or CSS on the pages, but from the fact that this JS or CSS is used in the mediawiki user interface. Which is why it's protected by default, just like anything else that is used in the MediaWiki UI (i.e. the MediaWiki namespace). The same applies to user scripts. This mechanism wouldn't change with the new ability. -- Daniel Kinzler (WMDE) (talk) 19:00, 26 March 2012 (UTC)

Serialization[edit]

Is it serialization to a human readable form or is it to a machine readable form? The first one is most likely editable but the later is most likely not editable without some kind of special editor. — Jeblad 16:05, 26 March 2012 (UTC)

The serialization is whatever the repsective handler implementes. For wikitext, the serialization is just wikitext. For wikidata, it will be JSON. Otehr extensions may use something else. I would personally suggest to always use something human-readable, but in theory, even a binary format could be used. Well, depends on how the data gets stored in the database. Binary data may get currupted in a TEXT field. -- Daniel Kinzler (WMDE) (talk) 19:03, 26 March 2012 (UTC)

Versioning of categories[edit]

On bugzilla:7148#c1 it was mentioned that implementing some kind of watchlist for changes to category content would be "practically impossible to implement unless we start versioning category links". On Wikidata/Notes/ContentHandler#Rationale it is suggested a "transition to a system where categories etc are not maintained in the wikitext itself, while still being stored and versioned in the usual way". Would the versioning mentioned there help to solve bug 7148? Helder 18:09, 26 March 2012 (UTC)

Not by itself, because is will be being stored and versioned in the usual way. That is, the same limitations with respect to database queries apply, whether you have the categories in wikitext or stored as JSON along with the wikitext.
I think the original comment you were referring to is actually incorrect. What would be required is to generate an entry in the recentchanges table every tiem a category changes. That shouldn't even be hard to do. But this isn't the right place to discuss that :) -- Daniel Kinzler (WMDE) (talk) 19:09, 26 March 2012 (UTC)

Why have wikitext at all?[edit]

I thought the whole point of wikidata is that it only stores machine readable structured data. If you let people edit the raw page then the data will get messed up.

I assumed we would edit using a table form with strict data type controls

  • 'access date' fields will only accept dates
  • End dates should be later than start dates,
  • death dates should be later than birth dates,
  • for old dates users would be prompted to confirm which calendar is used, with 'unknown' an option so no one is tempted to guess.
  • for all fields the interface would give suggestions as soon as you start typing.
  • if you type 'foo ' then suggestions would include 'association footballer, American footballer, australian rules footballer, rugby player, soccer player, gaelic footballer' etc. and when you pick one it would adjust the data entry table form to suit.
  • if you insist on entering "footballer" it would tell you that is a forbidden value.
  • if you entered "foozbar" then it would ask if you wanted to create a new profession and bring up a new table form for you to confirm what language that was and if it has a wikipedia page with blank fields for what it is called in other languages and any other data we want for pages of category "profession" before returning you to the first page.
  • For references once you choose a type (book, newspaper, academic paper, blog) then a data entry table form for that type would pop up (maybe after asking if it's 'online' or offline'). After a while, for many sources, once you provide the URL the table would be able to fill in most of the rest of the fields automatically because it would know how the BBC and CNN and New York Times work or by using the providers API to fetch that data.

Most categories would be redundant since that info would come from the data. Even editor helper categories like "incomplete entries", "dates with 'unknown' calendar", "data without a reference" would be generated from the data. If someone wants to add a category then that is a prompt that an additional field is needed on the page or maybe that category isn't needed. Except for the most important category - the one that decides which table form to use for this page.

At least that is what I thought. Filceolaire (talk) 20:08, 7 May 2012 (UTC) 07:19, 8 May 2012 (UTC)

The entire point of this proposal is to not use wikitext for data at all (except for discussion pages, user pages, policy pages, ...), and to provide a clean interface to hook in custom views and editors for other types of data. For wikidata, all data entry will be form based.
Using separate data tables for different kinds of data is impractical - there are too many types, and they change to quickly. Allowing user interactions to change the database structure also adds a security risk and makes replication tricky. Because of this (and other issues like value qualifiers and multilingual values), wikidata will use structured data in json blobs as the primary storage, and only use database tables for indexed access to specific well-known fields. -- Daniel Kinzler (WMDE) (talk) 20:16, 7 May 2012 (UTC)
Thanks Daniel. I didn't mean separate data tables for different kinds of data; I meant different data entry forms or forms which adapt automatically, with extra fields appearing as needed for different kinds of data. Sorry if I misspoke. I've gone back and changed it now. Filceolaire (talk) 07:09, 8 May 2012 (UTC)