User:TadejM/Inflections

From Meta, a Wikimedia project coordination wiki

First of all, I hope you can make at least some sense out of the following. Sorry, my native language is not English.

It would be very useful for inflected languages (or at least Slovenian Wikipedia ;) to support linking to the page titles so that one does not have to repeat whole words for inflections but only part of them.

For example, the Slovenian word for electricity is "elektrika". Slovenian has six different cases, which means that more oftenly than not users have to use one of inflicted forms:

"elektrike" for genitive "elektriki" for dativ "elektriki" for accusative etc.

It would be impractical to create redirects for all forms of all verbs and nouns in Slovenian, as this would mean a huge number of them (6 cases, 3 persons, 5 number forms etc.) and they do not only mean e.g. "elektrika in accusative", but have other meanings as well (this is especially true for shorter words). It also does not work for generic links (used in templates) that consist of predefined variables. E.g. "Template talk" would be "Pogovor o predlogi", while the coded translation for the namespace Template is "predloga". Besides that, these so-called generic links are many times composed not of namespace names but of other page names, so we cannot use this function.

So we have to use piped links now for the majority of text (or even this does not work in many cases). This means a lot of trouble for editors, as it clutters up the wiki source, demands a lot of disk space and most importantly, it burdens contributors unreasonably.

I would like to propose a feature that simplifies this work. It would work so that only the last letters that change with inflections for a given word should be entered.

For example, an editor would not have to write "elektrika|elektrike" anymore, but would only enter "elektrika||e" (separated with two pipes and put inside [[ ]]) and the parser would know by itself to replace the last letter "a" with "e" when displaying text to the reader. For the word "lestev" (meaning a ladder), where the genitive form is "lestve", the editor would write "lestev||ve" and the parser would replace the last two letters when displaying text to the reader. And so on, for any given number of letters written after the two pipes.

For the terms composed of more words, e.g. "bela lestev" (meaning "white ladder"), where the genitive form is "bele lestve", the editor would write "bela lestev||1e, 2ve" and the parser would replace one letter of the first word and two letters of the second.

This should also work with namespace names and other words displayed by predefined variables (e.g. "Predloga|e" (meaning "Template in genitive"), with words displayed by $x parameters in MediaWiki messages and with words displayed by templates.

If there is a more practical solution for this problem (e.g. using "|*|" instead of "||" so that the tables are not disrupted - but not only "|*"), I'll only be glad.

I'm sure this is important for some other languages too, not only for Slovenian (Finnish?). So I think this should be discussed before being proposed on Bugzilla.

To me a syntax such as "[[elektrik|a|e]]" might be easier. The two word example would look like "[[bel|a|e lestev||e]]". To make it work in tables, perhaps replace the "|" with ";" or "*", obviously then you would not be able to have this character in page names (and maybe also in other piped links?) and you would need to use an equivalent of the English Wikipedia's Template:Wrong title. Thryduulf (en,commons) 17:21, 22 December 2006 (UTC)
Thanks for your input. The problem with the solution you propose is that it cannot be used with predefined variables as they don't always have the same ending in nominative singular. So it is not always possible to determine the form of the lemma (the canonical form) in advance. I proposed to use |*| instead of || so that it won't be necessary to use the "wrong title" template. Well, probably it would be most practical to be able to use both ways. --Eleassar my talk 19:02, 22 December 2006 (UTC)

Maybe nice idea, BUT there are disadvantages: It's less readable/editable than the [[link|link text]] This is disadvantage both for bots and humans if they intend to fix the link. Confused newbie editor might broke both the link and the text easily if he don't know the new syntax. Link-fixing bots would have to completely re-write their parsers, as simple regexp probably won't do anymore (for example my bot relies on regular expressions for searching/replacing links, other bots on cs.wiki which do replace links would need their parsers redone too). So if this is meant to save time typing the long link - it may, but it will require more thinking to do so. I don't think it is good idea to implement it, as in long term it will require more work (bots, fixing links broken by newbies) than it saves. Also, for example on cs.wiki there is policy for no redirects from other cases - so to use your example, if there is article "elektrika", there are no redirects from elektriki/elektrike/elektriky to it. -- Singularita  09:33, 24 December 2006 (UTC)

I think this enhancement is absolutely needed. And I like your proposal. I agree with Singularita, however, that it would add an additional learning curve to page editing. I think that a better solution might be the following:

  • When editing or creating a page, there should be a metadata table which would contain all the inflections of the word.
  • When linking to this page, an editor would put whatever inflection in [[ ]], so [[elektrika]] or [[elektrike]].
  • When the page is sent to the parser, it would maintain whatever title the editor linked to, but it would internally (and in the outputted HTML) link it to the page that has the inflection in its metadata table.

What would be difficult about this, I assume, is that each page would have its title separated from its content as metadata. Of course, both the inflected forms of the title and the page content would have to be linked, probably my some numerical ID. As far as I know, this would be a major change to mediawiki.
I hope I even made any sense. Ask any questions and I'll be back to clarify. --Iamunknown 21:52, 29 December 2006 (UTC)

  • Since other put their comments here, instead of the talk page, I do, too.
  • There are already [[Something (clarification)|]] links, which can the typed like this, but are stored in expanded form as [[Something (clarification)|Something]] in the data base. Several of the arguments against the proposed [[Somethingonesuffix ||newsuffix]] links vanish, when they are treated similarly, and are stored in expanded form. Certainly, a pro argument (saving disk space) does, too, and the imho arguable advantage of having something more readable for later editors. Yet, I see that all the complicated contra indications, like having to change the rendering engine, bots, and editors personal knowledge, fall as well. So my sugestion is to amend the wiki code input processor only, and store links in expanded form.
  • I'd also suggest, to dikuss making this input engine behaviour an option, which can be selected or deselected at wiki setup time or via the users choosen language. Similarly, Esperanto input has a special treatment of the (non-Esperanto) character 'X', which other languages do not have. The Amharic language has an additional switch between Latin and native character set support, that effectively switches keyboard layout, without requiring editors to install anything locally. Other tweaks in the wikicode input treatment may prove useful for other languages and instances of MediaWiki. --Purodha Blissenbach 03:53, 31 December 2006 (UTC)