Community Wishlist Survey 2023/Wiktionary

From Meta, a Wikimedia project coordination wiki
Wiktionary
6 proposals, 125 contributors, 223 support votes
The survey has closed. Thanks for your participation :)



Add LUA function to read out previous section heading

  • Problem: Ca 99% of templates in wiktionary need a language code. The same code must be fed separately into every single template. This is junk work, and brings risk of wrong codes. Also, templates intended to be invoked from certain sections only cannot check whether they are used correctly.
  • Proposed solution: Add a LUA function to read out previous section heading of specified level. The most interesting levels are 2 and 3. Level 2 requested below ==Swedish== (at any distance until a different level 2 heading appears) would return "Swedish", level 3 requested below === Subordinator === would return "Subordinator", if no heading of given level has been passed then return empty string or type "nil".
  • Who would benefit: All contributors of wiktionary, template editors on wiktionary, to a lesser degree also all other wikis.
  • More comments: Probably relatively easy to implement, but maybe ideological obstacles.
  • Phabricator tickets: There is at least one but I can't find it anymore.
  • Proposer: Taylor 49 (talk) 19:02, 3 February 2023 (UTC)[reply]

Discussion

  • @Taylor 49: Question. Is this a problem because wiktionary is multilingual, yet repeats the same language within a certain section ? And maybe your proposal would be helped by linking to a very good example page, not everyone voting is likely to be familiar enough with wiktionary to understand the problem otherwise. —TheDJ (talkcontribs) 15:23, 5 February 2023 (UTC)[reply]

==Engelska==
===Substantiv===
{{subst|en}}
'''derma'''
#{{tagg|anatomi|språk=en}} [[läderhud]]
#:{{etymologi|Av ny{{härledning|en|la|derma}}, av {{härledning|en|grc|δέρμα|tr=derma|hud}}, av ''[[δέρω]]'' (''dero'', “att flå skinn”).}}
#:{{synonymer|[[dermis]]}}
#{{tagg|kat=mat|språk=en}} en [[judisk]] [[maträtt]] av kött och mjöl med mera

It is because wiktionary supports many languages. In the example above, the code "en" must be fed in 5 times, and this is redundant because the relevant information is already available in the word "Engelska" above. Missing or wrong language code is a common problem. Big wiktionaries workaround this by hacky JavaScripts and bots, the small ones are out of luck. Allowing to read out the word "Engelska" from a module, the need to feed in the same code again and again could be dropped. This would be a much more elegant and efficient solution. Futhermore, the templates could whine if an invalid or no language or word class has been specified above. Taylor 49 (talk) 15:43, 5 February 2023 (UTC)[reply]
==Indonesian==

===Etymology===
From {{inh|id|ms|derma}}, from {{inh|id|ms-cla|derma|t=alms}}, from {{der|id|sa|धर्म||morality, religion, duty, law}}, from {{der|id|inc-pro|*dʰármas}}, from {{der|id|iir-pro|*dʰármas}}, from {{der|id|ine-pro||*dʰér-mos}}, from {{m|ine-pro|*dʰer-||to hold, support}}.  {{doublet|id|darma|firma}}.

===Pronunciation===
* {{IPA|id|[dərˈma]}}
* {{hyphenation|id|dêr|ma}}

===Noun===
{{head|id|noun|head2=dêrma}}

# {{l|en|alms}}, something given to the poor as charity, such as money, clothing or food.
#: {{syn|id|amal|bantuan|donasi|infak|pertolongan|sedekah|sokongan|subsidi|sumbangan}}

====Affixed terms====
{{der4|id|
| bederma
| mendermakan
| penderma
| pendermaan
|}}
The code "id" has to be fed in 17'000 templates again and again despite the heading "==Indonesian==" above. Here additionally the parameter "noun" could be dropped below "===Noun===". Taylor 49 (talk) 15:51, 5 February 2023 (UTC)[reply]
The issue is similar in French Wiktionary, and the suggested solution is interesting but perhaps not the best workaround. Another way may be to have a subpage for each language, then using a part of page names to identify the language for each template and transclude content from subpages to the unique page with a blend view displaying all languages concerned by this unique sequence of characters. So, I think a dev team should explore your proposal deeply and solve this on-going critical issue that over-complexify wikicode in Wiktionaries! Noé (talk) 10:51, 6 February 2023 (UTC)[reply]
Thank you for that comment. While the hack with subpages presumably is feasible (with existing technology, or with changes making the transclusion easier), it is frequently preferable to see all languages at same time. For example, if I want to move the word "proton" from "Category:Physics" to "Category:Nuclear physics", it is much easier to do so manually or with bot if all languages are on same page, instead of editing 100:s of subpages. Taylor 49 (talk) 12:34, 6 February 2023 (UTC)[reply]
  • This is a very interesting use case, but I'm not convinced that "accessing the heading tree" is the best way to solve it. Fundementally, both the French wikitionary and the proposed wish here are stashing some relevant context inside someplace that /happens/ to contain it (the article title and "some section heading somewhere above the current point") but both of those places are user-visible and so not a great place to stash arbitrary information. The French wikitionary approach is slightly "better" in that the title suffix is "hidden" and so can actually be the language code directly, whereas the suggested wish here would need to parse "English" -> "en" multiple times, unless you were going to put machine-readable language codes directly in the heading. And I suspect if we were to continue down this path we'd end up hacking in other invisible things, like <h2>Some heading <span style='display:none'><span class="langcode">en</span><span>...other stuff...</span></span></h2> or [[Foo/en-otherstuff-morestuff]]. That seems like a disaster waiting to happen. I'd rather see a first class "context" extension, similar to mw:Extension:ArrayFunctions, with syntax like <context variable1=value1 variable2=value2>....</context> which would yield wikitionary markup like:
==English ==
<context lang=en>

... templates can access the "current context"...
{{#context|lang}}


</context>
This is slightly more involved, but would be scalable to many more uses and wouldn't leave to embedding arbitrary information inside page titles and section headings. Cscott (talk) 15:42, 6 February 2023 (UTC)[reply]
That's another interesting approach, and I agree, it could meet other needs too! The option I was suggesting is not how French Wiktionary work, actually, it was just another option to explore :) Noé (talk) 15:58, 6 February 2023 (UTC)[reply]
I of course meant taking the visible text "English" and converting it back to the code "en" (a feature readily avaialable on all decent wiktionaries), not putting invisible stuff into the headers. Taylor 49 (talk) 11:09, 7 February 2023 (UTC)[reply]
I have long wished that something like this were feasible, to make programming templates a little less frustrating, but maybe there are some language design issues. In this application I think the variable should have a more specific name, perhaps lemma-lang. Obviously we need this variable to be available within templates and modules invoked in their scope, but I wonder if there are cases where that is undesirable, as it could produce unintended interactions. Should templates (¿+modules?) specify how much of this environment they inherit? Should there be namespaces (formally or informally)? PJTraill (talk) 00:21, 15 February 2023 (UTC)[reply]
My idea is easier to use and access and more generic than variables like lemma-lang, and hopefully more difficult to abuse too. No need for variables, no need for namespace considerations. Taylor 49 (talk) 17:56, 15 February 2023 (UTC)[reply]
Right, generic, as in accessing previous H4 heading, previous H3 heading, previous H2 heading, whatever they may be. If it's "Indonesian", then it'd not be this particular function's concern to determine that the corresponding language code is "id". Something to consider is a broken header "==Indonesian=" that might cause the previous H2 heading to "spill-over" into the next. One way to get more control is to be able to access an array of all headings in the page, and an index of the template's or module's position in it. Then necessary precautions can be taken care of programmatically, based on the need in different use cases. ~ Dodde (talk) 18:28, 15 February 2023 (UTC)[reply]

Voting

Insert attestation using Wikisource as a corpus

  • Problem: Wiktionaries definitions relies on attestations, sentences from corpora illustrating the usages and meanings of words. Wikisource is an excellent corpus for Wiktionaries, especially for classic uses, but it is uneasy to search into the texts for a specific word. Now, the reference of the sentence had to be copy/paste by hand and it's a long and unfunny way to contribute, the result being few quotation from Wikisources (less than 3 % for French Wiktionary).
  • Proposed solution: This feature is inspired by Insert media but targeting Wikisource instead of Wikimedia Commons. So, instead of an snippet search offering pictures, Insert attestation would display a list of sentences from a targeted Wikisource (could be same language or other than the source project) that include the targeted sequence of characters. Their is no meaning requirement nor proximity, it is exact results only to keep it simple. In the displayed snippet of results, an editor would just grab a sentence with a single click and it will be added with the adequate sources picked from Wikidata associated with the Wikisource page. The feature would copy the sentence (no transclusion) and the source of the sentence (adding the information for the number of the page in the original manuscript optimally, i.e. "page 35."). This feature may need a specific parser to identify limits of sentences and to bold the targeted sequence of characters.
  • Who would benefit: Readers of Wiktionaries would find more examples of usages and a way to access the whole source directly in Wikisource. Contributors of Wiktionaries would have a fancy and enjoyable way to add attestations, similarly as Insert media tool that dig into Wikimedia Commons, and the community may grow with new people that like to add sentences from their readings. Editors of Wikisource would have a new way to shed light on their sisyphean work. Both projects visibility would increase in search engines with more links between them. The global audience of both projects may increase with more connectivity. Also, other projects may benefit from this feature, such as Wikipedia to add quotations in authors' pages.
  • More comments: This feature/tool/functionality should be accessible through WikiText editor and VisualEditor. It may be interesting to keep track of the reuses of Wikisource content in other project with a specific What's link here from Wiktionary to Wikisource, similarly as Wikimedia Commons indication of reuses in others projects, but this could be part of another development. This idea arrived #5 in 2020 with 57 votes but not done with this long explanation: "We unfortunately ran out of time and were unable to work on this. It can be re-proposed in a future survey.". Then, it was published in 2022 and had 24 supports. Previously, it was suggested in 2018 with 36 supports, in 2017 and supported by 32 people, a draft was suggested in 2016 with 19 supports and this idea was coined first in a MediaWiki discussion.
  • Phabricator tickets: T139152, T157802
  • Proposer: Noé (talk) 07:33, 22 October 2019 (UTC)[reply]

Discussion

  • This would be very useful indeed. I have added some Wikisource quotations to Wiktionary entries, but prompted from the opposite side (i.e. as I bumped into obscure words while transcribing books in Wikisource), because starting from the Wiktionary side is so cumbersome that I don't even think about it.

    I eventually had to keep a reference cheatsheet in my userpage to be able to make edits like this, this or this.

    And even with that, I have to do quite a bit of manual work e.g. translating the index page number to the actual page number printed on the page, formatting the title, marking the specific expression as bold, linking to the author's page on Wikipedia, etc. This is all work that should be automated.

    --Waldyrious (talk) 10:30, 11 February 2023 (UTC)[reply]

    Thanks for your comment and support. It is exactly the workflow we are trying to get rid of here. Picking an example from a corpora is an important task to build a dictionary, and it should be improve to reduce the manual work drastically. Noé (talk) 09:02, 13 February 2023 (UTC)[reply]

Voting

Allow users to emphasise languages when looking up words.

  • Problem: Sometimes a word is shown in many languages that do not interest a given user (much, at a given time). This applies both to the words shown on a page (both lemmas and translations) and to those in suggested in the various search boxes, and can result in inconvenience and confusion.
  • Proposed solution:
    • Interface Allow users to pick the languages they are interested in from a list, and to turn on or off the option to restrict words show on pages and during searches. A more luxurious version would show the user words in other languages as well (which can be helpful), but avoid them cluttering the view. Ideally, this feature would also apply to lists of translations, but I fear that would need a drastic change to the way these work.
    • Implementation I am not sure how hard this would be, but here are a few ideas. Perhaps searching for pages limited to a category would help, but I fear that is not currently possible. Perhaps the Tabbed languages gadget also uses relevant techniques. If is possible in CSS, that could be used to hide unwanted sections, so that cached pages require no extra processing in the server and no script in clients. Filtering translations might become feasible if they were generated from Wikidata.
  • Who would benefit: Particularly, anyone using Wiktionary (in a language they know, especially English) to support them while learning a specific other language. More generally, anyone with a strong focus on a few languages, which probably covers most users most of the time (though casual users would probably not use such a feature).
  • More comments: The Tabbed languages gadget goes a long way towards solving this as far as page display is concerned, but (a) does not help with searching, (b) appears not to work in Firefox on Android.
  • Phabricator tickets:
  • Proposer: PJTraill (talk) 23:41, 23 January 2023 (UTC)[reply]

Discussion

  • Thanks for this proposal, it is a real issue. Wiktionaries tend to have too much content and it is not really convenient for the readers. The gadget you mention is in use in English Wiktionary but not in many other versions, to my knowledge. This issue have a UX design part, and it could be explore through testing, in order to calibrate the options. Then, there is a structural component. Since the beginning, the description of different entries (words or multi-word expressions) from different languages are displayed in a single page based on the sequence of signs (letters mostly) used to write it. The content is also stored in those pages. Another option to explore could be the way Wikisource deal with content, with separated subpages for each languages and a simple page with several automatic transclusion. This option could help the filtering of information, but may complexify the editing process to add information. So, it is a great challenge, and an important one for the future of Wiktionaries 🙂 Noé (talk) 11:30, 24 January 2023 (UTC)[reply]
    Thanks for the positive reaction; I am glad you sympathise. While there is some superfluous content, I think the greater problem is presenting too much content. Perhaps we could put lemmas ¿for non-native languages? in separate namespaces, though that would probably need to be backed up by good editing tools and checks. The transition for a single Wiktionary could be fairly easy to automate, but only if we could take that wiki off-line for the duration. PJTraill (talk) 19:15, 26 January 2023 (UTC)[reply]
  • Very much needed. Even though I'm an experienced editor, I often use https://ninjawords.com as a frontend to Wiktionary when I just want to quickly look up a word, because the contents of a Wiktionary page are typically cluttered and full of information that should IMO be progressively disclosed. For beginning editors or readers, the current way information is displayed in Wiktionary pages is very likely to be overwhelming. --Waldyrious (talk) 10:36, 11 February 2023 (UTC)[reply]
  • This, or something similar to this, would be very much needed on mobile pages, for performance reasons. Pages are served with all language sections initially collapsed, and they are then all expanded by some Javascript once the page has loaded. (IIRC there's a checkbox for auto-expanding in the user settings, but I may be wrong, and in any case it doesn't help logged-out users.) On a not-top-of-the-line smartphone, this expansion is slow: it takes a good 5–10 seconds, an eternity in UX terms, for the browser to re-render this section expansion on long pages with more than 20 or so entries; pages such as wikt:a or other single-letter pages are basically un-openable. (An additional frustration is the user (me) tapping on a collapsed section's heading, but section expansion hasn't completed yet, and once it does the section that was tapped on will then immediately collapse again, hiding the desired content.) Oatco (talk) 16:23, 14 February 2023 (UTC)[reply]

Voting

Something like Extension:Variables to simplify template calls

  • Problem: As I wrote in 2020:

    Some templates used on Polish Wiktionary (e.g., wikt:pl:Szablon:imię, wikt:pl:Szablon:imię odojcowskie, wikt:pl:Szablon:forma rzeczownika, wikt:pl:Szablon:forma przymiotnika) put entries into laguage-dependent categories. To do that they need to know what language the entry is about. As many dictionary entries are stored in a single article (one section per one language), those templates cannot determine the language with the standard MW tools. Thus, the language needs to be provided in the template call (like {{imię odojcowskie|ukraiński|Абаку́м|m}} or {{forma rzeczownika|pl}})

    This could be done with the Variables extension, but it is unavailable on Wikimedia and the description page of the extension says that it “[…] is incompatible with plans to parallelize the parsing, as is intended by the use of Parsoid. Therefore, the future of this extension is uncertain, and it is expected to become incompatible with the standard MediaWiki parser within a few years.”
  • Proposed solution: Re-evaluate the “plans to parallelize the parsing” in the context of Wiktionaries and develop something like the mentioned extension or a completely new way to allow co-operation between templates.
  • Who would benefit: Wiktionary users who struggle with creating a dictionary using a software meant to edit an encyclopedia; especially the new ones for whom the complexity of the code and the differences between it and the final result are an entry threshold.
  • More comments: The workarounds are not good enough.
  • Phabricator tickets:
  • Proposer: DrogosławTALK 12:54, 28 January 2023 (UTC)[reply]

Discussion

@KSiebert (WMF): The issue is that doing it many times would make the page code a lot messier, a lot less user-friendly and require additional effort. Read the discussion you've linked, please. DrogosławTALK 11:02, 2 February 2023 (UTC)[reply]
  • As Peter Bowman pointed out,

    Expanding on PiotrekD's problem description, entry-based projects (such as Wiktionaries) may expect significant gains in enabling this feature, especially regarding stuff that can perform semantic categorization of entries - but currently doesn't, or at least not in the way categories are meant to work, rather by periodically inspecting page contents and maintaining large lists such as wikt:pl:Indeks:Francuski - Medycyna. This list collects all French entries related to medicine based on their transclusion of wikt:pl:Template:med, which doesn't accept a language parameter (precisely this would be nice for categorization purposes) and it will probably never do: we have tons of such templates used across the entire site, potentially making it quite tedious to update hundreds of thousands of tranclusions, also accounting for the process of making our veteran editors aware of this change. In contrast, we could easily upgrade {{med}} and similar to fetch the corresponding language code, conveniently exposed in a variable that relates to the language section this template is placed in, and use it to categorize the page - no need to alter the page contents at all. Peter Bowman (talk) 11:16, 18 November 2020 (UTC)

@User:Peter Bowman, User:PiotrekD, @User:Drogosław, @User:KSiebert (WMF): A different, possibly better way to address this problem is just below: Community_Wishlist_Survey_2023/Wiktionary#Add_LUA_function_to_read_out_previous_section_heading. Taylor 49 (talk) 19:07, 3 February 2023 (UTC)[reply]

@Taylor 49:: 1. You have just written it, it did not exist when I was writing this proposal. 2. It's not necessary “better”; it has definitely less versatile, though easier to implement. DrogosławTALK 19:44, 3 February 2023 (UTC)[reply]
@Taylor 49: I am convinced that your proposal is not easier to achieve than just flipping a configuration switch as it is asked for in this request. There is a prejudice we are trying to fight: previous requests had been denied on the sole basis of a dev back in 2006 saying it's unfeasible. We need attention from technical staff to re-evaluate the problem and tell us why something that works on non-WMF wikis would not work for us (as of today). As far as I can tell, the main blocker right now is phab:T250963 due to a Parsoid-related deprecation, but some work has been already started to target this: gerrit:721950. I'm afraid that adding more alternatives will drive said attention away from this issue and ultimately achieve nothing: your proposal entails something close to wikitext parsing (compare how succinct Extensions:Variables looks under the hood) and would require us, extension users, to additionally parse the contents of a section header. Peter Bowman (talk) 20:36, 3 February 2023 (UTC)[reply]
I do NOT fully understand this proposal. Is it just enabling mw:Extension:Variables? Please elaborate how you would use it.
> additionally parse the contents of a section header
You do not have to parse anything. You just have to convert lngcode <--> lngname, but this is a feature available on every intact wiktionary. My proposal is probably easier to use in wiktionary wikitext. Taylor 49 (talk) 20:45, 3 February 2023 (UTC)[reply]
Yes, this is about enabling Extension:Variables. Every language entry in plwiktionary is placed within a level-2 section and its header looks like this: == title ({{language_name}}) ==. Since it includes more elements than a simple language name or code, parsing would be unavoidable if we followed your proposal. We want the header template to define (#vardefine) some kind of variable referring to the language each section is about. Thus, this information would be made accessible (through #var) to a wide range of templates aimed at putting the current page into a language-specific category (e.g. Category:English medicine terms). In other words, we need to contextualize template calls. Peter Bowman (talk) 00:14, 4 February 2023 (UTC)[reply]
Indeed extra work, but it's a trivial task to isolate the part between brackets, no rocket science parsing. You still have ca 9'000 single templates for ca 9'000 single languages, ie a horrible design and a relict from the pre-module era. Taylor 49 (talk) 09:12, 4 February 2023 (UTC)[reply]
We are already doing such extra parsing in some similar contexts and from my experience it is not clean (regexes just aren't), not scalable (usage of expensive functions to retrieve page text) and not safe (relying on potentially untrusted input text). I am also well aware that we could just edit hundreds of thousands of pages on our wiki by adding the missing language parameter. There is a nice tool that seems to suit our needs and this proposal focuses on it. Peter Bowman (talk) 10:52, 4 February 2023 (UTC)[reply]
Taking the title ({{language_name}}) read out from == title ({{language_name}}) == according to my proposal, and isolating the language_name from it requires neither regex nor expensive functions. Taylor 49 (talk) 09:54, 5 February 2023 (UTC)[reply]
Yet it does indeed require parsing/isolating wikitext, which is way uglier and more unreliable than reading a variable, and also conveys limited information. Please discuss your proposal separately and focus here on Extension:Variables instead, thank you. Peter Bowman (talk) 11:36, 5 February 2023 (UTC)[reply]

Voting

Display definitions from Wikisource dictionaries

  • Problem: Wiktionaries aims to offer for each meaning one definition but there are many ways to describe a meaning, many words - including local uses (i.e. an American-centered definition and an Indian-centered one for the same word) and very technical terms sometimes with more or less vulgarized explanations. A synthetic one is a solution, but more than one is better. Some alternative definitions from other dictionaries could be mentioned in the reference section but they are not accessible in Wiktionary and do not add any value to the entries.
  • Proposed solution: Wikisources contains a lot of dictionaries and we should use them to display more definitions. A dedicated transclusion of paragraphs from Wikisource in Wiktionaries could be a solution, by hand/bot or with an automatic harvesting of entries with a specific tagging in the dictionaries hosted in Wikisources. They could come from several Wikisources, to be display in several Wiktionaries. It could be a new tab next to "Article" and "Talk", named "Dictionaries" with definition for the same sequence of letters from dictionaries published in Wikisource. For French, I can imagine at least a dozen of definitions from as much dictionaries. For underdescribed languages with at least one source in Wikisource, it could be an interesting way to compare the source and how it evolve after its inclusion in Wiktionary.
  • Who would benefit: Readers wanting more than one definition, with different perspective.
  • More comments: Some dictionaries are already properly tagged; for the others, it could be a good opportunity to do it accordingly to TEI Lex0 guidelines, so that they can more easily be reused in open source projects. Also, to undermine a tendency when someone talk about Wiktionary: No, Wikidata Lexeme could not be of any help here. This issue is about content and not data or relation. Definitions are under CC BY-SA 3.0 in Wiktionary and in Wikisource dictionaries. This proposal is the same as this proposal in 2022 (24 supports) and this proposal in 2021 (supported by 40 people), this one in 2020 posted by DaraDaraDara (32 supports).
  • Phabricator tickets: T240191
  • Proposer: Noé (talk) 11:43, 29 November 2020 (UTC)[reply]

Discussion

Voting

Different alphabetical sortings according to languages

  • Problem: For the time being, there is only one alphabetical order for a given writing system (used in Indexes, Categories, etc), even though alphabetical order can vary depending on a language. For example, in German o and ö are treated as one letter, but in Turkish first comes o and only then ö. Therefore the order: offensiv – öffentlich – offiziell – Offizier – öffnen – Öffnung – oft – öfter – oh is absolutely fine for German, but in Turkish: ocak – oda – ödemek – odun – ödünç – oğlan – öğle – öğrenci – oğul is completely wrong (it should be: ocak – oda – odun – oğlan – oğul – … – ödemek – ödünç – öğle – öğrenci), yet it is the only order that is found in all automatically prepared alphabetical lists. This way all Wiktionaries make fools of themselves, showing that they do not even know the correct alphabetical order. This problem affects each and every Wiktionary.
  • Proposed solution: There should be a way to sort lists alphabetically in different ways, depending on the language in question.
  • Who would benefit: All users of Wiktionaries, both editors and – first of all – their readers.
  • More comments: As I was informed by Peter Bowman, the problem has already been raised several times (most recently in 2019, and again in 2020), but never solved. There seem to be a ready solution already available (see Phabricator tickets below).
  • Phabricator tickets: phab:T30397#1039468
  • Proposer: Maitake (talk) 17:32, 27 January 2023 (UTC)[reply]

Discussion

Voting