Talk:OmegaWiki/Archive

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

MFAQ (Most frequently asked question)[edit]

Clearly, one of the MFAQ, if not THE MFAQ, is "When will it be finished?", a question not even mentionned in the F.A.Q. ... --Henri de Solages 09:47, 20 October 2006 (UTC)

Insertion by Sj[edit]

String translation: "Noun" translated into each language, "Danish" translated into each language, etc. It shouldn't be possible to change the display of 'noun' in a description without also changing the underlying language-independent data.

Noun (indicating that the word is a noun) Danish (indicating that the translation is into Danish) that kind of stuff is interfacing; it has a fixed meaning and will be part of the user interface/ the software. What language it is in is something you select in your user-preferences. GerardM 21:14, 17 Sep 2004 (UTC)
Yum. Now I'm hungry for a Cheese danish! Quinobi 20:14, 12 July 2005 (UTC)

Interesting, and ultimate, but overkill.[edit]

The ideas are interesting, but are complicated and might take some time to implement. But if we look at the actual functionality you propose, most of it can be quite easily implemented without any changes at all.

Simply limit each wiktionary for words in THAT language, have one entry for each word in a language, with grammar information and everything, and then let the translations link to the other languages-wiktionaries.

So, if you are interested in the english word machine you will get to know all about the english work machine by going to the english wiktionarys machine-entry. You will also get a list of translation on that page. Now, if you want to know more about the swedish translation, like the grammar and so for that word, you klick on the link which will take you to the swedish wiktionarys entry for maskin. Obviously, everything there will be in Swedish, but if you want to know the grammar for a Swedish word you can be assumed to know a bit of swedish, as the proposal already implies.

A useful future feature the proposal mentioned is the possibility of getting the user-interface + some fixed strings in the text translated to the language of your choice, so that you actually get the translation prefixed with "English" and not "Ensku" in the icelandic dictinary, since that could be a bit tricky to find. ;) That is actually a feature that seems reasonable to add into mediawiki as a whole, maybe for 1.5 or so? It is however not necessary.

I can surely see the use of a large database that actually contains the grammatical information in a structured manner, and not as hypertext. But that is a huge (although interesting) undertaking, and it would possibly require serious research on how to create a database structure that not only can contain all grammatical information for all languages, but also map it between languages. If it doesn't do that, we will risk imposing an uneccesary limitation that reduces the usability of the wiktionary. Regebro 23:24, 21 Sep 2004 (UTC)

  1. The problem is: what about someone (swedish, say) who doesn't really know very much english, and tries to find a Swedish translation of machine on the english page? This page gives four different meanings. Now if these 4 meanings would have 4 different Swedish translations, and a user has the slightest trouble interpreting the english used in the definitions - which translation should this user trust? The use of such a dictionary would be no better than babelfish. It can give some kind of translations as well - why compete with them?
  2. It might be a good idea to change the language in the translation lists, since it would be easier for someone translating the odd e.g. english word. Let's discuss this further...
  3. I agree that something needs to be done on the question of databases/.... and so on, but that is way beyond what I'm able to participate in :) \Mike 00:39, 22 Sep 2004 (UTC)


The user should of course trust the translation that makes sense in the translated context. Just as in any dictionary. This is not inventing anything new. Dictionaries have existed for a long tme, and we know pretty well what works. What Wiktionary is trying to do is to be both a dictionary that explains words and grammar, and a translating dictionary. This may be a great idea, but I still fail to see the purpose of having 176*176 explaining dictionaries when 176 works just fine. When you get into the depth and detail of language that you want here, you firstly DO understand a bit of the language and secondly, you would have to go to the original language wiktionary anyway, because 176*176 dictionaries will never be full of detailed information.
What does these pages have that can not simply be found on their original pages? The translations are already there.
http://sv.wiktionary.org/wiki/Nine vs http://en.wiktionary.org/wiki/Nine
http://sv.wiktionary.org/wiki/Libro vs http://eo.wiktionary.org/wiki/Libro
Sure, it is not perfect. Nothing is perfect. But it would be a major improvement.
Another possibility would be to *only* have the translations to swedish on the swedish wiktionary, and instead of having a translation on the english wiktionary, have links to all the english word on all other wiktionaries. But that is backwards, and you still get the problem of words colliding between languages. http://sv.wiktionary.org/wiki/I already have three definitions, I'm sure there are more languages that have a word i. :) But if we also create a namespace for each language, that would be a workable solution. Although backwards. :)
My aim with this is to get rid of this fantastic amount of double effort that is currently done, and try to get the usefulness up. Nobody has any use of the link to en:Kärlek since that page does not exist. Linking to sv:Love or sv:Kärlek would be MUCH more useful. And there is very little information that could be deduced from a en:Kärlek that is not already in sv:Love or sv:Kärlek.
A for point #2 I agree that the language of the languages should be the language. ;) This is already how the cross-language links work on Wikipedia, and that works fine. If you are interested in English, you know it's called English, while if you are English, you wold have no idea that English is "Ensku" in icelandic.
Regebro 08:23, 22 Sep 2004 (UTC)
<Jun-Dai 16:21, 5 Jun 2005 (UTC)> I'm jumping in on this thread after 6 months, but I just wanted to say that I don't see the "fantastic" amount of double effort that you're talking about. An explanation in Swedish of what the English word nine means has very little duplication of an explanation of that word in English. Or, to put it another way, much of this argument seems to do be based on the dubious notion that a word can effectively be translated with another word from another language. There are very few cases (though nine is pretty close) where this can be done--there are almost always differences in usage both significant and subtle between a word in one language, and a word with a similar meaning in another language. Unless there is a way for this "ultimate Wiktionary" to provide extensive explanations in English (as well as Spanish, French, German, Klingon, etc.) on what a particular Japanese word means and how it is used. But it doesn't seem like the problem is being approached that way.
The whole point of the Wiktionary the way it is set up now, is that the English Wiktionary has explanations in English of all words in all languages. It also provides simple translations of English words into other languages, which in turn are links to those words on the English Wiktionary (which in turn have English-language explanations of their meaning). This has very little overlap with, say, the Japanese Wiktionary, whose explanations are all in Japanese. There is very little that the Japanese Wiktionary would offer an English speaker, unless that speaker were at least capable of an advanced understanding of Japanese. </Jun-Dai>
When a word is a noun, it has a particular gender, it has translations .. they are all things that are language independent. You base what you say on Wiktionary, it is exactly this effort that we do not want to duplicate. Even without language specific stuff it is good to know certain facts. The sheer fact that a word exists in a lanuguage is better information than no information at all. GerardM 12:17, 6 Jun 2005 (UTC)
<Jun-Dai 15:29, 6 Jun 2005 (UTC)> Okay, I see your point. Add to this things like conjugation tables, translation tables, example sentences, and pronunciation. On the other hand, this information is only a portion of the useful information that can be provided in an entry on a particular word--the rest of it is language-specific, most notably the definitions, the example sentence translations, and the usage information, and I'm not seeing any information here (perhaps I haven't looked deep enough?) on how this would be implemented in all languages. Etymology, see alsos, etc. all have some language-specific information and some non-language-specific information. If I were looking at a Chinese entry, for example, and I saw a list of related words, the usefulness would be greatly reduced if I had to click each word in order to begin to understand what it means. In the current wiktionary setup, we can add a stub definition of the sense of the word being used, and this makes the information much more useful, as we can click on only the ones likely to be relevant to our research. I'm a little confused on how, or if, this information is to be retained in the ultimate wiktionary. </Jun-Dai>
Things that are language specific will be saved in a language specific manner; definitions and etymology come to mind. Idiom is not language specific. There certainly will be issues with a first version of the UW, this is to be expected, issues are there to be solved. They are not excuses to move forward. GerardM 07:05, 7 Jun 2005 (UTC)
<Jun-Dai 20:09, 7 Jun 2005 (UTC)> I guess I'll just have to wait and see (it's hard to imagine how the UI for this is going to work). I just hope that there aren't any plans to eliminate the current Wiktionaries until/unless this thing has been up and running and successful for a couple of years. Also, I think that the problem of effort duplication is being vastly overstated here. </Jun-Dai>
When the IW is life I expect that several wiktionaries will be joyously discontinued fairly soon as in within months rather than years. I am interested in learning why you think this duplication of effort is overstated, if anything it is the sharing of efforts that will prove the biggest boon of the UW. GerardM 06:07, 8 Jun 2005 (UTC)
Offcourse that is for the communities to decide and not for GerardM. Waerth 00:13, 26 Jun 2005 (UTC)
BTW, I answered something similar as in 1. on wikt:sv:wiktionary:bybrunnen. I assume you, Regebro, was the anon there as well. \Mike
Yup. And as you see, your answers has already been answered here ;) Regebro 08:23, 22 Sep 2004 (UTC)
Regebro says, that the structure needs to be correct for all languages, that research is needed. To some extend he is correct.. However, it is like with so many things, when you make a reasonable stab at a database and have it function properly for a majority of languages, you have made a lot of progress. The features will work for those languages and for some languages it will not work or it will not work for details. This is where you revise the database to accomodate the new knowledge, test it, and convert the data to the new database.
In the mean time there are the benefits of sharing data between wiktionaries, the export/import of data, a user interface individualised through the preferences.. GerardM 06:22, 22 Sep 2004 (UTC)

Grammar[edit]

Regebro suggests above that it should be about grammar. That is beside the point. The point is that we are talking dictionary here and translation dictionary. I fail to see how grammar comes into this. Grammar is as far as I can see outside of the scope of wiktionary. GerardM 22:36, 22 Sep 2004 (UTC)

I actually thugh the translations were seen as an added bonusm and that the grammar was more important. :) But you are right that the grammar doesn´t need to be in a structured form. But then again, I'm not sure the translations need to be either. Regebro 07:32, 23 Sep 2004 (UTC)
Translations ARE structured. When one word translates to two meanings in another language, it indicates that the word has at least two meanings. When from the second language point of view the foreign word is given two distinct meanings, it means that this is to be reflected in the first language part of the database as well.
As both the first and second language words are in BOTH dictionaries, it makes sense to integrate the dictionaries into one database. GerardM 12:46, 23 Sep 2004 (UTC)
I was more thinking about unstructured vs structured content. Basically, if the translations should be Wikitext or something else. Since translations often is not just to a lits of words, it seems to me it should be wikitext, that is unstructured, as opposed to a list of records in a database.
But all this discussion is pointless if people do not agree that something needs to be done... Regebro 18:10, 23 Sep 2004 (UTC)
Talking on translation always also means talking on grammar. Not translation would be possible without some grammatical knowledge of either language, and without grammatical analysis of the source text, without grammatical construction on the destination text side. Even relatively "simple" grammars, as e.g. chinese, or pidgin ones, cannot be ignored when translating.

When wictionary entries are to be usable in automated translation - which they imho ultimately should be - we need to prepare the ground for it now. -- Purodha Blissenbach 17:32, 29 June 2006 (UTC)

computer readable[edit]

Regardless on whether we should make an 'ultimate' Wiktionary, I like the idea of having the Wiktionary content be more usable by computers, meaning instead of just typing 'English' then 'Noun' we would indicate that we want to define an English noun. I'm not sure how the UI on this would be. It would make things unthought of be possible in the future. --12.216.254.3 20:44, 11 Nov 2004 (UTC)

XML / XSLT[edit]

The content could be in Wiki Format but be transformed and delivered to the browser as XML. An XSLT stylesheet would then format the content according to the desired layout. What XML needs to be generated for what template could be controlled by additional language features restricted to templates (Allowing XML generation outside templates would be too chaotic). --Fasten 11:43, 1 May 2006 (UTC)
See also: #Connotations

Results of some experimenting[edit]

I've been working on a couple of examples which illustrate how a next-generation Wiktionary would work. I'm not thinking about the technical side for now, just about the different "layers". Where different parts of the text come from to form the finished article seen by the viewer.

Basically, I'm not sure the 3-level model will really work but I might have to do some more work and with some more "test words".

The 3-level model is "source language", "target language", "interface language". The 2-level model has only "source language" and "target language".

So far I've been using the "test words" "Hindi"/"hindi" and "Turkey"/"turkey" as they have different meanings for the same spelling in different languages and are inter-related.

Of course we probably ought to have multiple source and target languages.

Source languages are the databases that will be searched when the user looks for a word. Target languages are the languages which will be included in the Translations section. But...

Now I had been assuming just one target language, usually but not necessarily the same as the interface language. In a 1-target model, The user will see definitions (or at least glosses) in the target language, and examples in the source language.

But what happens when there are 2 target languages? Should we show the defs in each of them? Is there any point in doing that just because it's possible? Or, should we have just 1 "true" target language and the subset of languages which are shown in the Translations section are something else?

Now ignoring the multi-target models, there are also some tricky things to think about in the 3-level model: Each definition can have several "attributes". I have only looked at two kinds of modifier so far: "disambiguators" and "modifiers". "realms" are another example. A disambiguator would be "country" or "bird" in the case of "Turkey/turkey". A modifier would be "obsolete" or "archaic" in the case of "Hindustani" as a synonym of "Hindi". A realm would be "linguistics", "computing", etc.

The question is whether such attributes ought to be displayed in the target language or the interface language. Because they will be part of the def they will in the first place come from the database of the def - the source language. We could treat all such words as interface words, but I'm not sure that will "feel" right. The term might not even fit the def properly but it's hard to think of an example.

I have a document in RTF format.I can't wiki-format it right now sorry but let me know if can upload it or email it anywhere.

Comments? — Hippietrail 06:14, 20 Nov 2004 (UTC)

conjugations[edit]

My guess is that it is good to have conjugations in the same entry as the verb itself. If there is a small table of the conjugations in one cell, for each language it would be possible to derive words from that.

For example:

en: eat, ate, eaten
nl: eten, aten, gegeten

Could automatically generate:

eaten definition: "past perfect of eat"
trans nl: "gegeten"

Maybe this is planned already, but this seems the logical approach to me. It will probably become a bit more extensive because every language has its peculiarity, there can be male or female forms of a word, there are cases in German en Latin and probably there is much more in other languages. But still it seems the way to go to me.--146.50.205.252 16:23, 17 Dec 2004 (UTC)

(you can reach me at johnny at the nl.wiktionary.)

Reta Vortaro[edit]

What do you think about this project : http://www.uni-leipzig.de/esperanto/voko/revo/

this is a DTD file to define the dictionary and the translation. http://www.uni-leipzig.de/esperanto/voko/revo/dok/dtd.html

This is a lot of word translated : http://www.uni-leipzig.de/esperanto/voko/revo/inx/statistiko.html

It looks like a promissing project with a lot of cool data. When we receive permission to use it, it would be really cook to make use of it in the Ultimate Wiktionary. (GPL and GFDL are to some extend incompatible) GerardM 19:09, 7 Feb 2005 (UTC)

Interface Preferences are not fully ready[edit]

I'm not involved in Wiktionary, but the Ultimate idea is very cool. Good luck!

People working on this should be aware of the fact that the Interface preferences currently do *not* work well in all languages. Specifically, they don't work properly in right-to-left languages (Arabic, Hebrew, Farsi, etc.). To use a right-to-left interface on a left-to-right Wiki currently simply doesn't work.

Furthermore, there may be very big technical problems editing the Ultimate project for use in right-to-left languages. Currently, it is very hard to edit LTR material on a RTL edit page, and vice versa.

Maybe something good will come out of this, if the Ultimate Wiktionary will also spur software solutions to make LTR and RTL environments more compatible with each other! But in the meantime, people working on it should be aware that these problems exist.

This template will point from the discussion pages of all the different proposals for a single Wiktionary DataBase to the one page where all discussion on the subject of a single Wiktionary Database is conducted, to create a discussion of that purpose, rather than of each proposal separately. User:Aliter

Complete lack of communication with the current Wiktionary community[edit]

Whoever is running this project / idea is completing failing to communicate with the current user community in Wiktionary. I've been a Wiktionary administrator for a few months now, and only just found out about this "project".

From 28 years in the computer industry (from technical programmer to Business Analyst, Project Manager, User Advocate), this project shows all the hallmarks of a technically elegant solution looking for a reason to exist. And all the hallmarks of a disater waiting to happen if it ever gets anywhere near to implementation. If you don't mange involving/engaging the current users, then you can't really be taken seriously.

For any project, there needs to be a "document" that basically outlines the reason for the project (we're talking benefits, not features), who is running the project, who the contributors are etc. None of that is apparent here.

And what about the costs ? If one of the costs is to put off 99% of general users because of the level of complexity, is that cost somthing we can afford ? How useable will this be by general users as a plain text editor. Can it be used that way, with a bot or willing style editor following along to clean up the simple user'scontribution ? Or must the potential contributor do a 10 day training course before they can even start making contributions ? --Richardb 13:12, 30 May 2005 (UTC)

Opps! Sorry for being so abrasive, and ethno-centric. I was referring to a lack of communication with the en:Wiktionary community, forgetting of course all the other lnaguage communities who might be far more aware of this project, especially the nl: community.--Richardb 14:34, 30 May 2005 (UTC)
Please, I'm begging *groveling* on my knees in the dirt etc... Take a Look at w:Wikipedia:WikiProject_Keywords. It's an effort at building a framework for moving forward as a family of projects... ---Quinobi 20:01, 12 July 2005 (UTC)
No wonder no one responded. All Wikiprojects at Wikipedia are in the w:Wikipedia:WikiProject namespace, so the full url is http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Keywords
Sorry for taking so long to fix this. Quinobi 20:00, 18 July 2006 (UTC)

Link everything?[edit]

I have seen a French dictionary site (now offline) that created links for every word in a definition. For example, the 'Latin' page would have looked like this:

Latin ( countable and uncountable ), plural Latins

1. Language of the ancient Romans (proper noun, uncountable) (Classical Latin)

2. A person native to ancient Rome or its Empire, etc...

A lot of the words returned blank pages, since you could click on everything from words to hyphens to commas, but it was a useful and quick way to learn. If you didn't understand the definition, you clicked on a few more words until you did. Do you think this would be useful in the "Ultimate Wiktionary"? Linking is done by users to some extent now, but it could be done automatically. The server could also ignore common words (like, and, the, etc.) and non-words so that it looks nicer. As well, we might want to distinguish between "important" links, added by users, and server-generated links, so that when a user edits the page, there aren't a bunch of square brackets everywhere. Automatic links could show up differently in the article page, too, maybe black like the regular text.

I don't know if this is worth the effort, but it's worth thinking about. Any ideas? --Sboots 16:46, 28 Jun 2005 (UTC)

I would find it really distracting. If anything I would only do it for some key words in a definition. GerardM 10:15, 18 August 2005 (UTC)
The Polish Wiktionary also links everything in the definitions and examples and we find it useful. I don't believe, Gerard, you'd be able to determine what are the key words in all possible languages, it's safe to link everything. Also, it's way easier for the end user. tsca 10:24, 18 August 2005 (UTC)
If the link-everything feature is implemented, I'd like it stays as an option for the user, I find it distracting too. (we abandoned this on fr:) --Kipmaster 13:14, 18 August 2005 (UTC)
I would not write definitions in Polish, if at all I would write them in Dutch or English. Your assertioon that it is "easy" for the end user I do not share I find it immensly distracting. GerardM 15:00, 20 August 2005 (UTC)
Now, linking aside, what do you mean you wouldn't write definitions in Polish? "If at all"? I ask in the context of the UW. tsca 08:20, 21 August 2005 (UTC)
I mean it as in I personally would not write a Polish definition, it is obvious as I do not speak any Polish. I am really eager to have as many Polish definitions as possible; sadly it will not be me who can contribute to that :) GerardM 11:33, 22 August 2005 (UTC)
My vote and agree for this. Definitely not distracting, it's a must (have) function!!! --172.178.123.158 01:05, 3 January 2006 (UTC) (de)
At this moment there is nothing to vote on. At this moment it is functionality that we do not have. It is also true that I do not really consider anonymous "votes" as they do not have a voice and I therefore cannot ask questions / discuss things. At this time for this project anonymity is not helpfull. GerardM 09:07, 3 January 2006 (UTC)
I think it might definitely be useful for some people, and not harmful if the automatically generated links are displayed in a different color (as anybody can ignore automatically generated links). So, I support this idea. --Alex, 10.5.06
I support the idea - probably as a user deselectable option, if some insist on disliking it. I think it would be useful only, if links that do not offer additional information (such as ones to nonexisting pages) can either be switched off, too, or can easily be told from informational ones, i.e. are displayed differently. I for one know that, I would prefer links much over retyping different word in my day-to-day work. It is not unlikely for me to have 80 to 120 windows open simultanously when collecting informations on a subject matter, some in language which I only partially comprehend, so more links tend to be a real time saver for me … -- Purodha Blissenbach 15:06, 30 June 2006 (UTC)

Link to 1 meaning[edit]

I couldn't find precise features of the UltWik. Will it be possible to make a link to a certain meaning of a word (not to the whole article) ? --130.120.105.237 15:51, 11 July 2005 (UTC)

That is most definetly the intention. GerardM 10:22, 18 August 2005 (UTC)

Which language for discussions ?[edit]

Hi,

I was wondering how will we be able to discuss on an article in the UW, as everybody doesn't speak the same language. English could be a solution, but some people (like me) won't be able to deal with some subjects they find too hard to understand, and some people don't speak English at all. I also thought that we could use the language of the concerned article (we will discuss chinese on a chinese article), but whereas I can add chinese translations into french or english, I'm not able to discuss it in chinese. Any other solution ? (If I had to choose, I would take the second one) --Kipmaster 09:41, 18 August 2005 (UTC)

There are several possibilities. When a word is discussed in Cherokee, it makes sense for people who speak that language to discuss the finer points in Cherokee. When you like me do not understand Cherokee, English is the lingua franca of this era. Typically a contributor for a particular language should be able to communicate in that language.

As to contributing Chinese, I have contributed many Chinese words HOWEVER, I am not able to identify them as either simplified or traditional Chinese and as such them have a limited value.. :( GerardM 10:14, 18 August 2005 (UTC)

I can read both traditional and simplified Chinese. Maybe I can prepare to log in there.--Jusjih 17:03, 11 February 2007 (UTC)

Any news[edit]

Hoi,

What's new about UW? We have had no news for one or two months now. So? My ultimatum is: I'll be in vacation at the end of the week, so, if no news, I open a project on SourceForge and code the Ultimate Kiptionary :p. --Kipmaster 09:50, 17 October 2005 (UTC)

I second that question. I really like this project, it sounds great and extremely promising. But I don't like the way information about this project is dealt with. Nobody really knows what's going on and when something is going to happen. --84.163.87.205 23:56, 19 October 2005 (UTC)
Well actual it is not doomed. I am going to Germany to work on the User Interface next week. GerardM 22:23, 11 November 2005 (UTC)

Demo[edit]

Wow, this is cool... http://epov.org/wd-gemet/index.php/Main_Page -- user:zanimum

Collation by a certain locale[edit]

(or Collation by a specific language?)

search MediaZilla for collation
→→ bugzilla:00164 – "Support collation by a certain locale (sorting order of characters)"
→→ duplicates of 00164: 00353, 00608, 01304, 02489, 02602, 02818, 03343, 04622, 04963

  • Halló! I was thinking about a syntax to handle issues related to "collation". There are many issues which should be covered:
  • some characters exist only as upper or as lower case; example "ß" in German
  • ordering for "ÄÖÜäöüß" in German
  • Slavic languages collation is also out of order, letters ŠČĆ are at the end of alphabet, and they should be at their proper places...(like A B C Č Ć D... S Š T U)
  • Swedish similarly with its accented letters, and probably many others
  • some languages are using entities coded with more then one character; example hu:template:CategoryTOC which contains
"A Á B C Cs D Dz Dzs E É F G Gy H I Í J K L Ly M N Ny O Ó Ö Ő P Q R S Sz T Ty U Ú Ü Ű V W X Y Z Zs"
as a consequence "separators" are required in the syntax between the "entities"
in addition definition of "category character headers" will be required at some point in time
  • some languages have multiple lowercase characters; for example, Greek, Yiddish, etc
    • for titles consisting of single characters "lower case character foo" should probably be "before" "lower case final character foo" but "~lower case final character foo" should probably be "before" "~lower case character foo"
  • some alphabets do not contain a subset of English characters; however, foreign words containing these characters have an order in the "extended" alphabet; example: the Esperanto alphabet does not contain "QWXY"
  • contemporary and traditional / historical alphabets; example "Z" in Icelandic; old Cyrillic letters in Slavic languages etc.
  • see eo:Vikipedio:Laŭalfabetigo for a sort order of "Combining Diacritical Marks" U0300.pdf
  • ... (please add more topics)


  • The duplicate of 00164 bugzilla:04622 – "Provide a setup to order pages at special:Allpages completely case insensitive regardless of the value of $wgCapitalLinks" is an simpler requirement then "full" collation support and requested also at 01304 (mentioned above), 02628. It should be more easy to implement this with an advanced MySQL query.

related topics / links[edit]

  1. search unicode.org for "collation" collations
  2. search MediaZilla for {{lc:}}
  3. search MediaZilla for {{lcfirst:}}
  4. search MediaZilla for {{uc:}}
  5. search MediaZilla for {{ucfirst:}}
  6. search MediaZilla for linktrail
    see: wikt:sl:template:wikivar#linktrail – configuration issue (?) at wikt:sl:MediaWiki#Linktrail contains (only?) "‎/^([a-z]+)(.*)$/sD"

comments[edit]

  • It is not selfexplanatory how to sort "Ä" depending on the context. "Ä" followed by a lower case letter should (probably) be expanded to "Ae" while "Ä" followed by a upper case letter should (probably) be expanded to "AE". It might be a usefull solution to convert in a first step all characters of a "title", "sort key" etc. to upper / lower case letters first ("first key") and decide in a following step about further sorting. Best regards Gangleri | Th | T 19:17, 7 March 2006 (UTC)
  • It is even much worse - since e.g. in German alone, there exist several collation sequences, all being used in different fields, we have almost no way to know, or predict, were someone might be expecting a certain entry. Leave alone collating foreign characters with them.
    • Example one: Someone in Swizterland watches a video of some now historic Olymics, finds "СССР" on someones wear, and tries to look "CCCP" up. He would be best serverd with a collation that put cyrillic charaters together with their latin look-alikes.
    • Example two: a person whose, surname happens to be "Bäß" will be collated identical to 'BAS', 'BAESS', 'BAES', or next behind these, depending on wether you look him up in a phone book, encyclopedia, library catalog, in Switzerland, in a historic directory of enterprises & inhabitants, etc., etc.. Thus we're best off serving a large community by inserting those at all possible positions?
    • Example three & four: some languages have two or three letter system, such as some Balkan and west-middle-Asian ones; most germanic Platt and plain languages, such as nds, als, ksh, hess, bay, sux, etc., do not have a strict, single, or generally accepted, orthographic system, some of both groups as well as several indic languages have a broad variety of dialect variant spellings. Here any collation system will be easily found next to useless, while a soundex system performs pretty fine. In Engish that would mean, with an entry "cent", to also list "scent", and vice versa, or "knight" with "night" and "nite", etc., i.e. collect potential "sound-alikes" together.
    • Last not least: A good dictionary should know of historic spelling variants and be able to date them correctly. E.g. German "Thür", "Thüre" officially used up to 1901, factually out of print since about 1921, current spelling → "Tür", "Türe", …
    • --Purodha Blissenbach 16:04, 30 June 2006 (UTC)

Collation on the right level[edit]

We make use of the MYSQL database. Here they describe how you can sort by a specific collation.. It means that sorting of words needs to be a per language thing. with Multilingual MediaWiki MediaWiki becoming language-aware, it would make sense to associate locales to do with a language when there are different options for a language.

One "problem" is that when people get words sorted for a particular language, they may be unfamiliar with the sorting order. Then again what is the point of a resource when the language-specific stuff is hidden?

new magic words {{number:foo}} and {{value:bar}}[edit]

  • Hallo! Please take a look at ne:template:wikivar#NUMBEROFARTICLES, :fa:, :gu: etc. (at :gu: you will find more links to Projects in Indic / South Asian scripts).
  • I assume that at some point in time when embeding of metadata will be more and more common "function formatNum" (defined in most LanguageXX.php files) should be available for users / contributors / visitors. {{number:foo}} would be a magic word easy to remember.
  • The magic word {{value:foo}} would allow an "input" in the content language and return somthing usefull.
  • There are some nondecimal numbering systems as Latin, Hebrew (etc.). Nevertheless for the first implementation syntax should be KISS. It would be a "nice to have" to be able to specify numerical separator (thousand separator and decimal point separator) at leaset for {{value:foo}}. Some articles as New York City could use {{number:8,168,338|thousands_in=,}} and this could be used in other wikies as well. Alternatively (for mixed LTR / RTL wikies) this should work also:
{{number:8,168,338
|thousands_in=,
}}
  • actual benefits: If some users want to update the number of people living in a country / region / city etc. in available wikies they would be able to use either {{number:foo}} (then the representation would be corrected as soon as "function formatNum" is implemented or {{subst:number:foo}}.

questions:

  1. What should be the names for the optional parameters?
  2. Should "uselang=xx" (selected user interface or url parameter) influence {{number:foo}} and {{value:bar}}?
  • Best regards Gangleri | Th | T 20:51, 5 March 2006 (UTC)

Connotations[edit]

An interesting feature would be to link connotations of a word A to specific connotations of another word B in the same language or in another language. An #XML / XSLT format could then include content for the specific connotations of word B selected by connotations of word A. The ISO 2788 relations BT (broader term) and NT (narrower term) could also be used to mark synonyms and translations. A more complex relationship could be the intersection of other terms; a language may be lacking a specific connotation and the exact connotation may only be constructable by intersecting the meaning of two or more words (or #Terms or phrases) from that language. Many dictionaries leave the user with a superset of the connotations of all possible translations instead, which has to be reduced by ruling out false translations. To be able to link to a specific connotation by name connotations would have to be named and (preferrably) allow markup for existing and non-existing links, like internal wikipedia links. --Fasten 13:04, 1 May 2006 (UTC)

<uw word="gehen" type="verb" lang="DE">
<connotation name="funktionsfähig" circumscription="funktionsfähig sein">
  <synonyms>
    <intersection>
      <uw/>
      <uw/>
    </intersection>
  </synonyms>
  <examples/>
  <translation lang="EN"/>
    <uw word="work" type="verb" rel="BT" lang="EN">
      <!-- A subset (e.g. a specific connotation) of the dictionary entry for "work" could be included here, especially the german retranslations. -->
    </uw>
  </translation>
  <translation lang="EN"/>
    <uw term="be operable" word="operable" type="adjective" rel="EQ" lang="EN">
      <!-- A subset (e.g. a specific connotation) of the dictionary entry for "operable" could be included here, especially the german retranslations. -->
    </uw>
  </translation>
<connotation/>
<connotation name="durchführbar" circumscription="machbar sein; durchführbar sein">
  <synonyms/>
  <examples/>
  <translation lang="EN"/>
    <uw term="be practicable" word="practicable" type="adjective" rel="EQ" lang="EN">
       <!-- A subset (e.g. a specific connotation) of the dictionary entry for "practicable" could be included here, especially the german retranslations. -->
    </uw>
    <intersection>
      <uw/>
      <uw/>
    </intersection>
  </translation>
<connotation/>
</uw>

The content provided by nested words (work, operable, practicable) could initially be mostly hidden by dynamic HTML folding (depending on the style sheet selected). --Fasten 13:17, 1 May 2006 (UTC)

Terms or phrases[edit]

Some words may only be available as circumlocutory terms or phrases in other languages. To be able to refer to those terms or phrases as translations it might be desirable to allow compound terms (e.g. "youth language") and complete phrases (e.g. "as it were") as dictionary entries. --Fasten 13:09, 1 May 2006 (UTC)

In WiktionaryZ they are. GerardM 11:17, 3 May 2006 (UTC)

Other new features[edit]

I would suggest that the new development of an ultimate WiktionaryZ is a good chance to integrate some enhancements, such as:

  • At the moment, an entry allows to find synonyms, antonyms, derived terms, related terms, in some languages other features such as
more general or more specific words.
I still miss
  • similar words, i. e. not synonyms, but words that have a meaning somewhat similar (e. g. for 'walk', include 'drive', 'fly', 'carry')
  • words that are colloquial terms, or slang terms
Is this already included in some of the existing categories, or planned as an extension?
--Alex May 10th, 2006
More general of more specific words are one type of relation that is already supported.
When you can categorise words, you can have provide for all kinds of similarity .. The Wordnet+ project is working on related concepts.. :)
When you want to add colloquial terms or slang, you have to identify them as such. Entering them as a synonym is already possible.

If the Wiktionary/ies do not - on the long run, i.e. decades - significantly support automated translation projects, they're imho a wast of efford. As I learned from a machine-translation reseach project in the 1970's that I have been lucky to contribute to, for modern European languages, we have to anticipate several dozen less obvious grammatical classifications per word+meaning, summing up to hundreds of word classes per language.

An example, simplified for clarity: english "… in/inside/into the …" translates to german "… in den/dem/die/der/das/dem …". Here we loose information: "in/inside/into" → "in", that is easy for this direction, and we have to gain information in "the" → "den/dem/die/der/das/dem", solving possible abiguities. This is done (a) by grammatical gender: m → "den/dem", f → "die/der" n → "das/dem", and (b) by detemining the object casus as either accusativ or dativ. This choice is a bit comples and one needs to know (c) verb properties: Is it a verb of movement? (d) object noun properties: Is it in a container class? (e) if a container object is involved, location/relocation property of the movement, if any: does the movement leave or enter the container object? All these questions can be answered using grammar alone. For (e) you have, for most verb classes, to take into consideration which of "in/inside/into" is in the orignal english text. (a), (c), (d) must be taken from properties stored in the dictionary. Currently we have (a) at best.

We need to make simple and clear Tests as to which such classes a word out to be put into, since the average (native) speaker is not aware of their existance - at least not usually on a conscious level -- Purodha Blissenbach 16:42, 25 June 2006 (UTC)

The idea is that when we start to develop the lexicological parts of WiktionaryZ the attributes will be language dependant. These Meta-data will be maintained by one person per language. One of the crucial aspects of the Meta-data will be the mapping to the Meta-data of other languages.. This will allow for instance to indicate where the translations of an inflection can be found in another language. GerardM 20:27, 25 June 2006 (UTC)
While it may appear to be safe, and likely keeps data consistent, I doubt that entering such info should be burdoned on one person. (Configuration is a quite distinct task, though) I assume that people not being able to complete a new entry would eventually be frustrateed. My (though not very thorough) experience is that such less obvious grammatical categories can pretty easily correctly be attributed by native speakers through simple questions. Using the container example from above, two checkmark-answerable questions of the type "can something be [ ] in/inside word?" would suffice to determine the two grammatical container properties of a german word.
I do not expect each of those grammatical properties or attributes of words to be documented in grammar books for the majority of all languages. Certainly not in conventional ones. Thus finding them will be research work sometimes, and have to rely on feedback from the machine-translation and artificial intelligence communities working on natural language grammars. That may mean providing a place for reports and discussions proper, which needs to be widely accepted.
-- Purodha Blissenbach 17:12, 29 June 2006 (UTC)


Deleted Wikipedia entry[edit]

In the English Wikipedia, the article about WiktionaryZ was deleted. I would like to presrve it here:

WiktionaryZ is a translating multilingual dictionary based on a relational database. It is commonly abbreviated as WZ (originally UW.)

The first steps towards "WiktionaryZ" date back to August 30, 2004 when on the Italian Wiktionary the first templates were inserted. These templates then served to permit a fast exchange of complete lists of words from one Wiktionary to the other.

In that moment two Wiktionarians from the Italian and the Dutch Wiktionary started to talk about methods of utilizing Wiktionarians' time more efficiently. It was clear that copying and pasting these lists from one language Wiktionary to another was inefficient and error-prone, while increasing the amount of time needed to make corrections. With separate Wiktionaries, the probability that a particular translation list would be out of sync with another version of the same list increased exponentially with each language Wiktionary added.

The logical consequence was to think about inserting this data into a common place. A project with the work name "Ultimate Wiktionary" took its first steps. By December 2004 the basic functionalities were clear: an extension for the Mediawiki software, building on the Wikidata project.

During the following year, when the programming of the software had already started, still many changes to the database design were made and more peculiarities of languages like Chinese, Korean, Arabic, Russian and also sign language were considered, some of these thanks to various contacts during Wikimania 2005 in Frankfurt.

Unlike the original Wiktionaries, WiktionaryZ will allow for the download of data, particularly for reuse with other software. For example, software CAT (Computer Assisted Translation), dictionary software on a local computer and spellcheckers.

Currently, a prototype including only the data of the GEMET multilingual thesaurus is online. This resource is considered to be an optimal collection of terminology to test functions with. Important dates during the life of the project

   * August 30, 2004 first discussions
   * March 2005 first database design
   * December 26 2005 first read-only prototype
   * January 2006 the name of Ultimate Wiktionary is changed to WiktionaryZ
   * January 2006 the WiktionaryZ Committee starts working
   * end of February 2006 a second read-only prototype is online
   * April 30, 2006 a first editable prototype (for a restricted number of people) came online
   * approx. August 2006 version 1.0 

Andres 04:27, 21 November 2006 (UTC)

The deletion page is here: en:w:Articles_for_deletion/WiktionaryZ, the reason given was "non-notable websites. alexa 418,401 rank". Kipmaster 09:07, 21 November 2006 (UTC)