Community Wishlist Survey 2019/Wiktionary

From Meta, a Wikimedia project coordination wiki
Wiktionary
7 proposals, 77 contributors, 167 support votes
The survey has closed. Thanks for your participation :)



Multiple collations per site

  • Problem: It is extremely common, on Wiktionary projects, to display entries of multiple languages on the same page. But, only one collation can be used on a particular Wikimedia project. That means: if a website uses a language-compliant collation, e.g. uca-default which is a English- and Portuguese-friendly collation, all categories concerning e.g. Swedish words, will sort words starting with Å under A, because Å is considered in English to be the same letter than A with a diacritic, while it is a whole new letter in Swedish (where it is sorted at the near end of the alphabet). Categories' headers are therefore incorrect for many languages with the current solution used on Wiktionary projects.
    Currently a way to circumvent the problem is to use the default Mediawiki collation (namely uppercase), but this implies that sort keys are added in all English/French/etc. entries with a diacritic in the title, as Å, É, etc., as all diacritic letters are considered as first-entry headers in categories, and this implies a huge amount of sort keys in pages to bypass this behavior (and thus sort Å under A for e.g. English), and makes Wiktionary projects less readable and editable for newcomers.
  • Who would benefit: users of Wiktionary categories, and new editors to all Wiktionary projects
  • Proposed solution: allow multiple collations per site, and therefore collation to be specified per category: uca-sv should be used for Swedish-related categories, uca-es for Spanish cats, uca-default for English (and similar), etc.
  • More comments: Liangent and Bawolff have been working on this in the past, but feasability seems also to depend on sysadmins (for increased system load).
  • Phabricator tickets: phab:T30397
  • Proposer: Automatik (talk) 12:18, 11 November 2018 (UTC)[reply]

Discussion

Voting

Additional edition interfaces and recording formats on the fly

  • Problem: Current edition by newcomers and third party parsing of Wiktionaries are difficult
  • Who would benefit:
    • new contributors wishing to participate without having to learn a lot of technical skill requirements
  • Proposed solution:
    1. Offer polished forms requiring zero technical learning to edit Wiktionnaries, while keeping classical wikitext edition available for more experienced users
    2. Save on the fly articles in multiple formats, especially a LSON tree representing the parsed article structure
    3. Allow to manipulate previous entries in modules in order to transclude elements of an article in a programatic way
  • More comments: Probably most of this should be buildable with current infrastructure. Especially tpt pointed me to ContentHandler this week-end which should be useful for this purpose.
  • Phabricator tickets:
  • Proposer: Psychoslave (talk) 20:11, 11 November 2018 (UTC)[reply]

Discussion

  • Comment Comment I think it is a very good idea, but, at least in Spanish Wiktionary, I've found at least 3 ways to organize the entries (I don't know if this happens in other Wiktionaries), we must choose one before doing an interface. --Giovanni Alfredo Garciliano Diaz (talk) 17:00, 17 November 2018 (UTC)[reply]
    • Thank you @Giovanni Alfredo Garciliano Diaz: for your feedback. Links toward relevant cases (or even documentation if this is a by design choice) is welcome. The proposal didn't specified it, but the tool should of course not be enforced on communities. That's up to each community to adopt the possible outcoming project or even better, involving more in it's conception to adjust its features to meet their desire and needs. --Psychoslave (talk) 03:54, 18 November 2018 (UTC)[reply]
  • Comment Comment Proposition 1. looks very good. What is this LSON thing is proposition 2. ? And proposition 3. would complicate things for newcomers. This proposition could be explained in a clearer way (and perhaps not include several tasks at the same time) to federate more votes. — Automatik (talk) 14:53, 20 November 2018 (UTC)[reply]

Voting

Wikidata module for translations

  • Problem: Currently each wiktionary maintains a list of translations for each sense (Ex.1, Ex.2, Ex.3). These translations are not connected between language versions, so the effort is repeated in each language edition.
  • Who would benefit: Wiktionary editors.
  • Proposed solution: 1) Create a tool to import existing translation boxes from wiktionaries into Wikidata. 2) Create a module that can display all the translations in each Wiktionary that chooses to use it. 3) Allow to add more translations into Wikidata from each Wiktionary, so that the contributions are not repeated.
  • More comments:
  • Phabricator tickets:
  • Proposer: Micru (talk) 14:44, 9 November 2018 (UTC)[reply]

Discussion

@Micru: Would this be solved by https://www.wikidata.org/wiki/Wikidata:Lexicographical_data and https://www.wikidata.org/wiki/Wikidata:Wiktionary ? --AKlapper (WMF) (talk) 20:58, 9 November 2018 (UTC)[reply]

@AKlapper (WMF): That is part of the solution (the data would be stored in Wikidata as Lexicographical data), however as mentioned we need a way to import the translation lists from any Wiktionary into Wikidata and then a way to display it on any wiktionary who chooses to.--Micru (talk) 21:15, 9 November 2018 (UTC)[reply]

In translation lists, plenty links are in red, and Wikidata do not accept red links, so it could be problematic. Also, the nomenclature of definitions differ from one language to another, there is not script mapping from one language to another. To be clear, A Spanish word will have x definitions in English Wiktionary but y definitions in French Wiktionary, because each definition refers to a culture. So, how do you imagine those can be mapped? Noé (talk) 10:10, 10 November 2018 (UTC)[reply]

@Noé: To import a translation list into Wikidata we need a Q-item representing the sense, and as many lexeme items (L-items) as words connected to that item. When there is a redlink it can be just a label in the Q-item in that language, without the need to create a lexeme. The nomenclature of definitions differ from one language to another, however the translation lists tend to be very similar and that is what matters. Normally the only difference between translation lists is that some language versions are more complete than others.--Micru (talk) 11:41, 10 November 2018 (UTC)[reply]
The importation of a translation list from CC BY-SA Wiktionary to CC0 Wikidata imply the consideration of translation list as not covered by the licence and free of reuse without keeping the same licence (SA means share alike). This position is not consensual and I personally disapprove. If I understood Wikidata lexicographical data model, Q-item is for concepts, not for meanings. I think it had to be connected with S-item rather than Q-item. I am not sure to understand the solution you suggested for redlinks. In my experience, translations lists doesn't tend to be very similar. The mapping of senses to the reality is different from one language to another, so translation lists are not similar. You can choose to not deal with complex cases and focus on simple cases in a first step, but I think it is not very effective to oversimplify the complexity of translation -- Noé (talk) 10:25, 13 November 2018 (UTC)[reply]
Yes! Let's focus on simple cases as a first step!--Micru (talk) 16:21, 17 November 2018 (UTC)[reply]

I agree with @Noé: that translation lists from Wiktionaries can't be imported into a CC0 project. Thus said, any tool that might help to coordinate our word relationship lists between different linguistic vesion would be very welcome. So a simple solution that might meet both @Micru: proposition and Noé feedback is a Wikibase designed to store this lists while keeping exact record of license and origin (which Wiktionary version and page). That is, one should not only import the list of translation proposed for joy in the English Wiktionnary and joie in the French one, but also the list of given translations for joie' in the English version and for joy in the French one. No automatic merge of this lists should be performed, but dedicated tools to compare matching lists and possibly manually transfer items from one list to the other would be warmely welcome. Also having a way to query directly this lists from the wiktionnaries would be a nice plus. Psychoslave (talk) 03:38, 18 November 2018 (UTC)[reply]

There are cases where someone seems to have manually used a reciprocal translation to write a translation, and it doesn't really work well. Languages don't always map like that. "word1, language A" can be the closest translation of "word1 in language B", but "word2 in language B" might be closer to the meaning of "word1, language A". HLHJ (talk) 06:59, 18 November 2018 (UTC)[reply]

Voting

Custom list for language learner

  • Problem: A student may wants to create wordlists to memorize a new language and then organizes the list by themes (animals, plants), functional categories (emotional verbs, action verbs, nouns) or whatsoever. Wiktionaries have the content for language learners but do not provide any tool for this need and people use instead other websites.
  • Who would benefit: Students learning a new language, contributors to have a to-do list or a list of beloved words, Wikibooks to have an interface with Wiktionary.
  • Proposed solution: Creating a Javascript or a core feature in MediaWiki to record a page into a personal space with an option for subcategorisation that include the link directly under a subsection (i.e. pick a word and add it to verb, colors, animals, new category section in the Custom list page).
We can imagine options to easily edit the Custom List such as:
  • a field to add a word in a category
  • an option to delete a word without editing the whole page
  • a rapid way to reorder sections and words
This tool may be included in Wiktionary or be apart, in a lighter interface, to share the list created everyone.

Discussion

Voting

Context-dependent sort key

  • Problem: In most Wiktionary projects, words of different languages share a page if their spellings are identical. Currently, the magic word DEFAULTSORT works for an entire page, which means we cannot define a default sort key for each language in the same page. That is an issue especially for Chinese, Japanese and Korean (hanja). They share characters but their sort keys are totally different (radicals or pinyin for Chinese, kana for Japanese, hangeul for Korean). If it is allowed to define a default sort key for each section, it will be much easier to correctly categorize pages.
  • Who would benefit: Editors of Wiktionary, especially those who edit Chinese and Japanese entries.
  • Proposed solution: Introduction of a new magic word, say, SECTIONSORT, that works for all categories after it up to the next usage of the same magic word. SECTIONSORT should override DEFAULTSORT if both are defined. The use of SECTIONSORT without a sort key should clear the previous sort key (and should not define an empty sort key).
  • More comments: see Community Wishlist Survey 2017/Wiktionary/Context-dependent sort key for a discussion in 2017. It is still a problem.
  • Phabricator tickets: phab:T183747
  • Proposer: TAKASUGI Shinji (talk) 12:19, 11 November 2018 (UTC)[reply]

Discussion

How it will be visible in category? Sections can't be added to category. --Wargo (talk) 21:48, 16 November 2018 (UTC)[reply]

Currently, one adds a sort key to an entire page. The goal of this proposal is to allow more than on sort key per page: one per section; e.g. one sort key for the Chinese section of , one sort key for the Japanese section of the same entry, etc. This is because a same word may not be sorted the same way in different languages, and Wiktionaries often have entries from multiple languages in the same page, as a page corresponds to a specific spelling (which may occurs in multiple languages). — Automatik (talk) 14:06, 20 November 2018 (UTC)[reply]
Notifying WargoAutomatik (talk) 14:07, 20 November 2018 (UTC)[reply]

See also my somewhat related proposal (I keep missing the deadline) Community Wishlist Survey 2017/Archive/Allow multiple entries within each category. Urhixidur (talk) 13:30, 17 November 2018 (UTC)[reply]

Voting

Insert attestation exploiting Wikisource as a corpus

  • Problem: Wiktionaries definitions relies on attestations, sentences from corpora illustrating the usages and meanings of a word. Wikisource is an excellent corpus for Wiktionaries but it is uneasy to search into the texts for a specific word. Now, the reference of the sentence had to be copy/paste by hand and it's a long and unfunny way to contribute, the result being few quotation from Wikisources (less than 3 % for French Wiktionary).
  • Who would benefit: Readers of Wiktionaries would find more examples of usages and a way to access the whole source directly in Wikisource. Contributors of Wiktionaries would have a fancy and enjoyable way to add attestations, similarly as Insert media tool that dig into Wikimedia Commons, and the community may grow with new people that like to add sentences from their readings. Editors of Wikisource would have a new way to shed light on their patient work. Both projects visibility would increase in search engines with more links between them. The global audience of both projects may increase with more connectivity.
  • Proposed solution: This feature is inspired by Insert media but targeting Wikisource instead of Wikimedia Commons. So, instead of an instant search offering pictures, Insert attestation would display a list of sentences from Wikisource that include the targeted sequence of characters (no meaning requirement). That's a snippets of results that you can choose from. An editor would just grab a sentence with a single click and it will be added with the adequate sources. The feature would copy the sentence (no transclusion) and the source of the sentence (with the information of the page in the original manuscript optimally). This feature may need a specific parser to identify limits of sentences and to bold the targeted sequence of characters.
  • More comments: This feature/tool/functionality should be accessible through WikiText editor and VisualEditor. It may be interesting to keep track of the reuses of Wikisource content in other project with a specific What's link here from Wiktionary to Wikisource, similarly as Wikimedia Commons indication of reuses in others projects, but this could be part of a second step of development. This idea was suggested last year and supported by 32 people, a draft was suggested the year before with 19 supports and this idea was coined first in a MediaWiki discussion.

Discussion

The problem of few quotaions should be too in archaicity of texts - wikisource texts are mostly old, by authors which died before 1948. Proposed solution should not be limited to wikitionary↔wikisource. The same use (sentence from wikisource) shoud be useful for wikipedia or wikiquote too. JAn Dudík (talk) 12:30, 30 October 2018 (UTC)[reply]

Wiktionary not only describe the language as it is in use nowadays, texts can illustrate archaic meanings. And some texts are published recently directly in a compatible licence. Similarly as Insert media, this tool would be accessible in Wikipedia and other projects as well. I am wondering how it can be use elsewhere, as I am mostly contributing to Wiktionary and Wikisource, so you are welcome if you have some idea to share Noé (talk) 18:50, 30 October 2018 (UTC)[reply]

Voting

Translation editor as an extension

  • Problem: One popular way to contribute to the Wiktionaries content is adding translation. Since editing Wiktionary code is not really user-friendly, javascript gadgets have been developed locally to simplify the addition of translation without the need to modify the wikicode. Actually, Recent Changes are showing mostly additions of translations by IP thanks to those gadgets. French Wiktionary uses Gadget-translation editor.js based on a Swedish Wiktionary gadget, Gadget-translation editor.js itself based on an English Wiktionary gadget, editor.js. On French Wiktionary, a recent evaluation made by Automatik and published on September Actualités showed that between July 2014 and September 2018, 234,989 translations were added thanks to this gadget. This gadget is a very important feature for Wiktionaries and could be improved and integrated with a modern design. This gadget should also work on mobile.
  • Who would benefit: Every person adding translations to Wiktionaries.
  • Proposed solution: A module could be developed to support this feature, improve the design to make it closer to the standard editor and include it in all Wiktionaries.
  • More comments: This feature also checks if the page with a same title exists in the version of the language added (for example, if you add a Spanish translation in English Wiktionary, the gadget checks in Spanish Wiktionary). Then a proper parameter is added to make a blue or a red link. This part may use Cognate module to check rather than a request.
  • Phabricator tickets:
  • Proposer: Noé (talk) 10:05, 10 November 2018 (UTC)[reply]

Discussion

  • Is it possible to improve the tool to allow adding translation during the edition of the wikicode? When I create articles, I must save the article to be able to add translation with the tool. Jpgibert (talk) 19:07, 18 November 2018 (UTC)[reply]
    Great suggestion! Merci ! Of course, it could be even better if this tool can work also during a classic edition, but I am not sure it could be done, as the two works very differently, but let's keep this idea as a goal Noé (talk) 06:51, 19 November 2018 (UTC)[reply]

Voting