Community Wishlist Survey 2017/Wiktionary

From Meta, a Wikimedia project coordination wiki
Wiktionary
8 proposals, 89 contributors



Swahili Permutations

  • Problem: Swahili permutations are rare, & sisyphean to create.
  • Who would benefit: Anyone trying to translate Swahili word-for-word.
  • Proposed solution: A bot/bots to create permutations for us, such as 'hatutaogopwa' & 'haikusema'.
  • More comments:
  • Phabricator tickets:

Discussion[edit]

Hi. I am not sure to understand the implications. Is permutations plan to be full entries (with translation, attestation, pronunciations, pictures,etc.) or light entries with a link to a main entry and a short grammatical explanation (like plural of, when object of a transitive verb, etc.)? Do you have lists of permutations somewhere to have an idea of how much pages can be in concern? Is there a grammatical explanation for this somewhere, in a Wiktionary or in a Wikibooks? Noé (talk) 19:43, 9 November 2017 (UTC)[reply]
This isn't a relevant request here. What Science Bird wants is for someone to run a bot to add Swahili conjugated forms on en.wiktionary, which is a good idea eventually, but would require a bot operator willing to work out all the problems with me and anyone else editing Swahili. Metaknowledge (talk) 08:00, 12 November 2017 (UTC)[reply]
And out of scope for the wishlist survey – we can do technical development (e.g. a bot that can generally do things for Wiktionaries, if that'd be a top wish and be realistic), but the WMF doesn't and shouldn't add content. /Johan (WMF) (talk) 19:32, 13 November 2017 (UTC)[reply]
Science Bird: Would you like to clarify the proposal? For a number of reasons – legal ones, among them – the Wikimedia Foundation can't add content to the projects. /Johan (WMF) (talk) 15:42, 16 November 2017 (UTC)[reply]
Hello. I was just asking if it is possible to make a bot that makes pages, such as hatuendelei & hatuandiki so that I wouldn't have to create the googolplex permutations of swahili. Science Bird (talk) 04:30, 17 November 2017 (UTC)[reply]

Voting[edit]

Custom list for language learner

  • Problem: A student may wants to create wordlists to memorize a new language and then organizes the list by themes (animals, plants), functional categories (emotional verbs, action verbs, nouns) or whatsoever. Wiktionaries have the content for language learners but do not provide any tool for this need and people use instead other websites.
  • Who would benefit: Students learning a new language, contributors to have a to-do list or a list of beloved words, Wikibooks to have an interface with Wiktionary.
  • Proposed solution: Creating a Javascript or a core feature in MediaWiki to record a page into a personal space with an option for subcategorisation that include the link directly under a subsection (i.e. pick a word and add it to verb, colors, animals, new category section in the Custom list page).

We can imagine options to easily edit the Custom List such as:

  • a field to add a word in a category
  • an option to delete a word without editing the whole page
  • a rapid way to reorder sections and words

This tool may be included in Wiktionary or be apart, in a lighter interface, to share the list created everyone.

Discussion[edit]

Thanks Pamputt for this proposal. I changed some aspects, in a way to simplified the proposal. I am still not sure about the better way to proceed to diffuse the result at the end, to make people aware of the tool. But still, I think this have to be developed for Wiktionaries! Noé (talk) 17:05, 19 November 2017 (UTC)[reply]

Isn't this sort of what we have Wikiversity for? /Johan (WMF) (talk) 16:22, 20 November 2017 (UTC)[reply]
Wikiversity and Wikibooks are good projects for educational purpose, but they are not tide with Wiktionary, and it is nowadays impossible to create a lesson from scratch starting in Wiktionary. This proposal can result in pages hosted by Wikibooks, it may be interesting, but at first, it have to start from and in Wiktionary to permit the gathering process. It is not very efficient to start a list of, to say, verbs to describe a picture, if you can't explore a dictionary doing so. I am fund of interproject connectivity, so I am definitively in favor of a rapid connection between projects, so your comment is inspiring! Thank you! Noé (talk) 22:43, 20 November 2017 (UTC)[reply]

@Pamputt: Vote is over. 13 votes last year, 24 this year, that's good! It is still not in the podium but that's not a major issue. I made a Phabricator ticket to keep this idea somewhere Noé (talk) 07:25, 18 December 2017 (UTC)[reply]

Voting[edit]

Add "Insert Attestation" to VisualEditor

  • Problem: Wiktionaries definitions relies on attestations, sentences from any corpora illustrating the use of a word. Wikisource may be an excellent corpus for Wiktionaries but it is uneasy to exploit.
  • Who would benefit: Wiktionaries and Wikisources mainly. As an example, French Wiktionary offer 330.000+ attestations but less than 3% of those came from Wikisource. A lot of entries do not have attestation and could benefit from an easy way to add them. On a more general view, it could benefit both projects to have more connectivity between them, to drive readers from a definition to a text.
  • Proposed solution: This feature is inspired by Insert media but targeting Wikisource instead of Wikimedia Commons. So, instead of an instant search offering pictures, it should display sentences from Wikisource that include the targeted sequence of characters (no meaning requirement). That's a snippets of results that you can choose from. A contributor should grab a sentence with a single click. The feature finally copy the sentence (no transclusion) and the source of the sentence (with the page in the original manuscript optimally) in the Wiktionary page and that's it!

Discussion[edit]

  • I strongly endorse this proposition. --Lyokoï (talk) 09:21, 8 November 2017 (UTC)[reply]
    Thanks dude, but as indicated on the light blue notice: Voting phase begins November 27th, once all the proposals are finalized. Votes cast before then won't count. Noé (talk) 11:19, 8 November 2017 (UTC)[reply]
  • I also endorse this proposal. --Psychoslave (talk) 09:22, 8 November 2017 (UTC)[reply]
    Now, to give more feeback, I think the feature should allow to find all occurrences of the current article name – and possibly valid inflections for the lexical item of the current section –, but also to provide recorded feedback of what users suggest as the matching definition, or marking it as false positive.
    I don't want this tool to look for inflections, because it means the suggested software request a stemming tool, and that's a different story. I don't want it to be too complicated for a start. Plus, I think the position for attestation will become more rigorous as the completion increase and attestations for inflections will be in inflections pages. So, it is better to display only exact correspondence based on characters. Noé (talk) 11:08, 8 November 2017 (UTC)[reply]
  • It sounds as an excellent idea, I will come back on November 27th to endorse the proposal. --Wikinade (talk) 21:48, 11 November 2017 (UTC)[reply]
  • Personally, I would rather have this as a gadget or part of the WikiText editor than as part of VisualEditor. Kaldari (talk) 19:41, 21 November 2017 (UTC)[reply]
  • I like the idea of parsing Wikisource texts and using extracted sentences for Wiktionary citations. I think that this work will be impossible without lemmatiser for Russian language (and other languages with a lot of inflected forms of a word). It will be interesting to see the search result from Wikisource in KWIC-format. -- Andrew Krizhanovsky (talk) 08:20, 30 November 2017 (UTC)[reply]

Voting[edit]

Context-dependent sort key

  • Problem: In most Wiktionary projects, words of different languages share a page if their spellings are identical. Currently, the magic word DEFAULTSORT works for an entire page, which means we cannot define a default sort key for each language in the same page. That is an issue especially for Chinese and Japanese. They share characters but their sort keys are totally different (pinyin for Chinese, hiragana for Japanese). If it is allowed to define a default sort key for each section, it will be much easier to correctly categorize pages.
  • Who would benefit: Editors of Wiktionary, especially those who edit Chinese and Japanese entries.
  • Proposed solution: Introduction of a new magic word, say, SECTIONSORT, that works for all categories after it up to the next usage of the same magic word. SECTIONSORT should override DEFAULTSORT if both are defined. The use of SECTIONSORT without a sort key should clear the previous sort key (and should not define an empty sort key).
  • More comments:
  • Phabricator tickets:

Discussion[edit]

  • This is a crucial issue in doing a dictionary to have wrong classifications for the entries. This problem is here since the beginning of Wiktionary and may be solve in a year, so I think it is a great suggestion for this survey Noé (talk) 09:34, 21 November 2017 (UTC)[reply]
  • Yes, because the current solution is neither easy nor elegant. PS: it concerns many languages (ex: Swedish vs Norwegian, Russian vs Ossetian, etc.). JackPotte (talk) 11:06, 21 November 2017 (UTC)[reply]
  • to me this just looks completely wrong. We have language specific versions of a wiki, linking to multiple translations of a term, and then expect to do multilingual sorting in categories... That's not what a wiki page is supposed to do.. This is an example of using the wrong hammer to solve your problem I think. Someone needs to take a deep look at technology for wiktionary and draw up a plan to make this more sustainable. —TheDJ (talkcontribs) 15:13, 29 November 2017 (UTC)[reply]
    The Latin Wiktionary separates each language section (cf. wikt:la:de (fr), wikt:la:de (nl)) but all other Wiktionary projects combine different language entries in one page if they share the same spelling. — TAKASUGI Shinji (talk) 10:00, 2 December 2017 (UTC)[reply]
  • IMHO the sort problem (at least in wiktionaries) is not a "page" problem but a category problem. We should be more directed to create "category" sort keys (which probably will solve allmost all problems) and not a new section key. General categories (ex. categories for "pages with errors") should have no key but categories for specific languages (or dialects of a language) should include a (say) CATSORT key.--Xoristzatziki (talk) 20:39, 9 December 2017 (UTC)[reply]
    You can specify a sort key for a category now. That is not a problem. In the case of Wiktionary, each entry often has quite a few categories. See wikt:en:日本 for example. Japanese categories should use a sort key of にほん, while Korean categories should use a sort key of 일본. Specifying an appropriate sort key for each category is possible but redundant and very tiresome. — TAKASUGI Shinji (talk) 05:33, 11 December 2017 (UTC)[reply]

@TAKASUGI Shinji: Vote is over now. Twenty votes, it's quite good. Should we put forward this proposal in Phabricator? I think a ticket could be a way to keep track of this request and may be appealing for a dev in the future. What do you think? Noé (talk) 07:08, 18 December 2017 (UTC)[reply]

Yes, that’s a very good idea. Actually I’m not sure about the procedure on Phabricator. What should I do now? — TAKASUGI Shinji (talk) 08:39, 18 December 2017 (UTC)[reply]
It's quite easy. In Phabricator, you can get an access with your wiki account and then on the top-right corner there is a link "Create task" where you can describe your idea. If I am right, there is two places with names: "subscribers" is for you and I, "assigned to" is for people that take responsability of coding it. There is "tag" where I usually put "Wiktionary" but as your proposal is wider, you may let it blank and someone with a better understanding of Phabricator could fill it. For the specific writing, I am still not very confident, so try to be as specific as possible. I am still a beginner on this platform, so I am sorry if I give you vague advices. Well, people are nice there anyway. Noé (talk) 09:02, 18 December 2017 (UTC)[reply]
Thank you very much. I have created a task: phab:T183747. — TAKASUGI Shinji (talk) 13:47, 28 December 2017 (UTC)[reply]

Voting[edit]

Share conjugation (among other things) templates on www.wiktionary.org

  • Problem: On Wiktionaries we share only interwikis. We often reinvent template or improve them separately, but we don’t mutualize this work. Some conjugations (like Russian ones) are really difficult to add on a wiki.
  • Who would benefit: All Wiktionaries editions.
  • Proposed solution: We could share data and templates/modules on [www.wiktionary.org] ([www.wikisource.org] is editable). www.wiktionary.org will be the world's biggest multilingual website about conjugation (at least it could be extended to other stuff). All Wiktionaries could choose to use it or not.
  • More comments:
  • Phabricator tickets:

Discussion[edit]

Hi ! Great idea and we could also share declension ! --Pom445 (talk) 18:36, 8 November 2017 (UTC)[reply]

It's quite clear for wiktionarians, I am not sure for foreigners. Well, I want to raise some concern about this idea, to make clear how challenging it is. A conjugation is a table with forms and labels. Depending on languages, there is different ways to name those labels that cannot be translated easily. For example, preterit and passé simple do not cover the same functions. Plus, readers do not always know the way a language is describe in the language itself, i.e a French apprentice may not know what a plus-que-parfait is. Readers have habits of labels from school, and there is traditions in linguistics to describe other languages in a way or another, based on old analyses. So, at the end, labels in the table of conjugation are complexes and need to be adapted for each Wiktionary. Nevertheless, it is possible to share at least the inventory of forms. And, if possible, a prototypic table easy to reuse and adapt. And the better should be to have a magic table with a possibility to choose the name of every label on the fly. That could be a huge improvement for Wiktionaries and Wikidata developers do not plan to do that at all. Noé (talk) 19:29, 8 November 2017 (UTC)[reply]

Indeed, this is one of the Wikidata-style ideas that sounds good in theory, but is deeply problematic in practice. Different language Wiktionaries will want to display different forms in different orders with different orthographies and scripts, and making master databases to draw them from will be exceedingly difficult and subject to technical expertise. I doubt many larger Wiktionaries will want to give up their autonomy to those few who can handle and oversee the needfully complex databases that you propose. Metaknowledge (talk) 08:10, 12 November 2017 (UTC)[reply]
I join @Noé: and @Metaknowledge: on their concerns. While I do wish to see a way to share more information between Wiktionaries, this can not be resolved in a framework that assume a single linguistitc theory. I provided the same feedback to Wikidata team which went with there Lexeme model, when it obviously doesn't offer enough expressiveness for presenting both varieties of lexicological theories and what they each claim about discourse segments. --Psychoslave (talk) 08:44, 12 November 2017 (UTC)[reply]
Well, We could try fore some similar languages. like romance ones. For exemple the two page fr:wikt:Annexe:Conjugaison en italien/amare en:wikt:Appendice:Coniugazioni/Italiano/amare share almost the same content… Otourly (talk) 16:33, 23 November 2017 (UTC)[reply]

Voting[edit]

Parse dumps for DICT clients

  • Problem: Wiktionary is a knowledge silo; its content is effectively unavailable to potential users except via the web-based interface. It is rather difficult to search or make additional use of the content via search engines, third-party software, or even as a spell-checker database despite its wide acclaim in the linguistic's academic community as a massive resource without peer.
  • Who would benefit: Readers, writers.
  • Proposed solutions:
    • Standard dictionary dump
      Create DICT database output as part of regularly scheduled database dumps.
      Custom dictionary api
      Build a DICT server extension which monitors port 2628. A wide range of clients are already part of many operating systems such as MacOSX (OmniDictionary), Kdict/GNOME Dictionary/MATE Dictionary on Linux, and is even directly implemented in cURL.
  • More comments: Do something small, now. Parsing dumps to produce dict-style-jargon files is simple and quick. Building on that to produce DICT databases, expose a DICT server, and eventually producing standard, reliable data in formats consumable for spelling dictionaries, education dumps, translation dictionaries, and more are really just minor investments to a readily expandable pile of value-added products.
    The most important element is to do something, anything, to leverage one of the more valuable WMF assets.
  • Phabricator tickets:
    • T38881 Wiktionary needs usable API
    • T31229 Extension to provide access via the dict protocol
    • T986 Use structured data on Wiktionary
  • Proposer: Initially I think it was brion, back in 2003-ish. Never happened. -- User:Amgine

Discussion[edit]

The title is too short to be useful, shouldn't you add just 3 or 4 more words to make that "non single" short? --Liuxinyu970226 (talk) 13:55, 15 November 2017 (UTC)[reply]

You probably need to generally flesh out this proposal. It's not immediately obvious to everyone what it is, what would happen and how it would benefit readers and editors. For example, not all Wikimedians know what an API is. /Johan (WMF) (talk) 15:18, 16 November 2017 (UTC)[reply]
Moving this to Community Wishlist Survey 2017/Wiktionary/Parse dumps for DICT clients failed with "already exists" error; same with a couple other variants. - Amgine/meta wikt wnews blog wmf-blog goog news 07:36, 19 November 2017 (UTC)[reply]
I've move it without any trouble. I am still not sure to understand properly the direction of this proposal, but I agree on parsing dumps to offer more exploitability! Noé (talk) 10:01, 20 November 2017 (UTC)[reply]

Voting[edit]

Wikisource dictionaries for Wiktionary

  • Problem: Wiktionary is a dictionary but to understand a word, it is always better to look at several dictionaries. Lucky us, Wikisource contains dictionaries! Let's connect them!
  • Who would benefit: Every person in a need for a definition, readers of dictionaries (yes, this kind of person do exist)
  • Proposed solution: Dictionaries in Wikisource have to be properly tagged to transclude targeted paragraphs with entries and definitions. Then, a new tab could be added in Wiktionaries, automatically populated by Wikisource content. So, for an entry "dog" in English Wiktionary, every definition tagged as "dog" in English Wikisource will be displayed automatically, for an entry "chien" in French Wiktionary, every definitions tagged as "chien" in French Wikisource will be displayed automatically. The final picture (in my dreams) is a reader that just click on "Other dictionaries" tab and a list of definitions pops up, with links to the whole quoted dictionaries in Wikisource.
  • More comments: Other online dictionaries are multidictionaries in the same way, like CNRTL for some Dictionnaire de l'Académie française or Dico d'Òc for Occitan with 15 dictionaries. This idea first emerge in Noé and Lyokoï heads and then was discussed with Wikisourcerers in 2016 and 2017.
  • Phabricator tickets: T183047 (created December 15th)

Discussion[edit]

  • Great idea, it could probably benefit from the Lexeme lexicographical data support in Wikidata (@Lydia Pintscher (WMDE) and Lea Lacroix (WMDE):). Cdlt, VIGNERON * discut. 20:28, 19 November 2017 (UTC)[reply]
    I think this idea can be realized with or without a support of lexicographical data by an external project. I think the development may be more important on Wikisource side, to capture properly the dictionaries, and I can't figure out how Wikidata may help for this aspect. In general, I do not want to wait for Wikidata+Wiktionary to have any development for the project. Lexicographical data in Wikidata may never be operational, or become a fork of Wiktionary and never be connected with it, so please consider this idea as a distinct development. Noé (talk) 21:27, 19 November 2017 (UTC)[reply]
    Obviously, we don't need or have to wait after the lexicographical support (which is not the same thing as 'Wikidata+Wiktionary' but another brick of the same wall), I'm just saying it could help a lot. For tagging what is a dictionary or not on Wikisource, clearly the info is already something Wikidata can do : see this query (both the query and the data in Wikidata need improvement but it works - and way better than current categories -, same thing for entries of Wiktionaries, a lot has been done for the s:Dictionary of National Biography for instance - 28k items already created and structured - and should be easy to replicate to other dictionaries). Cdlt, VIGNERON * discut. 20:07, 20 November 2017 (UTC)[reply]
  • The idea of the project (wiktionary) is to collect also the knowledge, included in all these wiktionaries, and not just create links to them. Apart of that, linking to other dictionaries without warnings about the definitions, included there, will disorient the reader. Minor example: Some definitions are mistakenly included, but still should exist, for the sake of "originality", in the linked dictionary. The idea says exactly (sic!): Wikipedia is an encyclopedia but in order to understand the lemma we should read more encyclopedias. Then why we create lemmas (both in wiktionary and in wikipedia) and not plain links? So the idea (not the proposed better "category or appendix make" in wikisource) is in wrong basis. --Xoristzatziki (talk) 20:23, 9 December 2017 (UTC)[reply]

Voting[edit]

Flash dialog editing

  • Problem: I would say the actual wikicode or VisualEditing of Wiktionary is boring and to much time consuming
  • Who would benefit: Wiktionary itself by getting more active users
  • Proposed solution: This should be done in two steps: 1) Wikidata integration, 2) new editor for Wiktionary. On English Wiktionary thers is a javascript option, which have dialog windows
  • More comments: The mentioned option, should be extended. Actual structer of data on all Wiktionaries is not same, but still you have to do everytime the unified structure, which is time consuming. This sould be automated. Once Wikidata will be integrated, the structure is samy so it is much easy to leave as much as possible editing load on software and ask the contributor just those thing which are necesary, with good support (not everbody is perfect linguist) with some flash graphical design.
  • Phabricator tickets:

Discussion[edit]

I do agree with the problem, but I am not sure about the solution. I am not fond of Wikidata integration, as I feel it is oriented to comply with wikidatians goals only, without looking at potential improvement for Wiktionaries. Plus, improving editing is not part of Wikidata Community Tech plans. I suggest another way to challenge this issue by adding suggestions into the editor, not only for Wiktionaries but for every project : Community Wishlist Survey 2017/Editing/VisualEditor Template Suggestion Noé (talk) 21:54, 9 November 2017 (UTC)[reply]

Well I dont insist on Wikidata integration here, as I dont know, weather it is a good idea.--Juandev (talk) 22:02, 9 November 2017 (UTC)[reply]
Flash?! --Nemo 22:35, 1 December 2017 (UTC)[reply]

Hi @Juandev:,

Until Wikidata respect licenses of content they massively extract to re-license under CC0, I will vote against any integration of Wikidata within Wiktionaries. More generally, they should allow users to use whatever free licenses they wish, just like Commons, and provide proper credit when performing massive import of copyleft licensed material. You can read more in the related proposal I made.

All the more, I red the whole feedback that the community provided and their Lexeme model proposal, and it just doesn't match.

Thus said, I'm all for a more structured data structure, that's exactly what I'm working on with the item template on the French Wiktionary. That's a work in progress. If some successful adoption at any degree outcome from this experiment, I'll see for a proposal that include more requirements taking into account other Wiktionary versions. So this proposal, for what I understand, addresses the same concerns as Improve TemplateData to ease description of lexical items. --Psychoslave (talk) 10:30, 13 November 2017 (UTC)[reply]

As it seems I proposed too much things, I do have to drop some of them. Luckily it seems this one also cover the same topic of one my proposal. For memory, here was the proposal:

Archived proposal (click to expand or collapse)

Voting[edit]