Community Wishlist Survey 2020/Wiktionary/Context-dependent sort key

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Random proposal ►

 ◄ Back to Wiktionary


  • Problem: In most Wiktionary projects, words of different languages share a page if their spellings are identical. Currently, the magic word DEFAULTSORT works for an entire page, which means we cannot define a default sort key for each language in the same page. That is an issue especially for Chinese, Japanese and Korean (hanja). They share characters but their sort keys are totally different (radicals or pinyin for Chinese, kana for Japanese, hangeul for Korean). If it is allowed to define a default sort key for each section, it will be much easier to correctly categorize pages.
  • Who would benefit: Editors of Wiktionary, especially those who edit Chinese and Japanese entries.
  • Proposed solution: Introduction of a new magic word, say, SECTIONSORT, that works for all categories after it up to the next usage of the same magic word. SECTIONSORT should override DEFAULTSORT if both are defined. The use of SECTIONSORT without a sort key should clear the previous sort key (and should not define an empty sort key).
  • More comments: see Community Wishlist Survey 2017/Wiktionary/Context-dependent sort key for a discussion in 2017. It is still a problem.
  • Phabricator tickets: phab:T183747
  • Proposer: TAKASUGI Shinji (talk) 12:19, 11 November 2018 (UTC)

Discussion[edit]

How it will be visible in category? Sections can't be added to category. --Wargo (talk) 21:48, 16 November 2018 (UTC)

Currently, one adds a sort key to an entire page. The goal of this proposal is to allow more than on sort key per page: one per section; e.g. one sort key for the Chinese section of , one sort key for the Japanese section of the same entry, etc. This is because a same word may not be sorted the same way in different languages, and Wiktionaries often have entries from multiple languages in the same page, as a page corresponds to a specific spelling (which may occurs in multiple languages). — Automatik (talk) 14:06, 20 November 2018 (UTC)
Notifying WargoAutomatik (talk) 14:07, 20 November 2018 (UTC)

See also my somewhat related proposal (I keep missing the deadline) Community Wishlist Survey 2017/Archive/Allow multiple entries within each category. Urhixidur (talk) 13:30, 17 November 2018 (UTC)

  • I've been thinking a bit about this. The problem here is that you have multiple types (languages) of content inside a single page, with a single title. The page https://en.wiktionary.org/wiki/日本#References for instance (quoted as an example in the ticket) is English. And therefor all categorisation of the page is based on the English title of the page (even though the title is not in the english language). This is a fundamental problem (a mismatch to the wikipage concepts). It really means that the entire system should be changed to make use of MCR and specialised MW contenthandlers, so that more semantic info can be extracted out of the page. (Like how wikidata deals with different types of information in a single page). And then on top of that, you could have a Category be in a certain language, and the category could use the correct sort key for a page, by referring to the information of the applicable 'language section' inside the Page. —TheDJ (talkcontribs) 11:25, 6 November 2019 (UTC)
    • To further clarify, the community has laid meaning (a convention) into some of the content, which MW cannot contain for them. When you want software features that makes use of those meanings, that meaning first has to be machine extractible (at scale) before we can do things with it that are not; 'a simple wiki page that complies with the assumptions of the original wikipedia' —TheDJ (talkcontribs) 11:28, 6 November 2019 (UTC)
      If I got your idea right, you are saying that "Page content language" in Page information should be able to deal with more than one language, through a specific tagging in the page or by using a template use for language section title. Then, the ordering for each language could be fixed in MediaWiki. I think this is another way to solve the same issue, and maybe a more MediaWiki-centered one. Noé (talk) 10:05, 9 November 2019 (UTC)
      This is not what I understand. For en.wikt, the "Page content language" is always English (for apple as well as for pomme or Apfel), for fr.wikt, it's always French, etc. Anyway, there is no such issue with the "multiple collations" proposal. Lmaltier (talk) 13:57, 10 November 2019 (UTC)
  • This proposal seems to become useless if the "Multiple collations per site" proposal is adopted (i.e. a magic word stating the language for each category). Or do I miss something? Lmaltier (talk) 20:27, 8 November 2019 (UTC)
    It is mainly for Japanese and optionally for Chinese and Korean (hanja). You cannot generate a correct sortkey for each language in a page of Chinese characters. In the example above, the correct sortkey for 日本 is “にほん” for Japanese and “일본” for Korean. You can have only one default sortkey now. — TAKASUGI Shinji (talk) 23:08, 10 November 2019 (UTC)