Community Wishlist Survey 2020/Wiktionary/Insert attestation using Wikisource as a corpus

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Random proposal ►

◄ Back to Wiktionary  The survey has concluded. Here are the results!


  • Problem: Wiktionaries definitions relies on attestations, sentences from corpora illustrating the usages and meanings of words. Wikisource is an excellent corpus for Wiktionaries, especially for classic uses, but it is uneasy to search into the texts for a specific word. Now, the reference of the sentence had to be copy/paste by hand and it's a long and unfunny way to contribute, the result being few quotation from Wikisources (less than 3 % for French Wiktionary).
  • Who would benefit: Readers of Wiktionaries would find more examples of usages and a way to access the whole source directly in Wikisource. Contributors of Wiktionaries would have a fancy and enjoyable way to add attestations, similarly as Insert media tool that dig into Wikimedia Commons, and the community may grow with new people that like to add sentences from their readings. Editors of Wikisource would have a new way to shed light on their sisyphean work. Both projects visibility would increase in search engines with more links between them. The global audience of both projects may increase with more connectivity. Also, other projects may benefit from this feature, such as Wikipedia to add quotations in authors' pages.
  • Proposed solution: This feature is inspired by Insert media but targeting Wikisource instead of Wikimedia Commons. So, instead of an snippet search offering pictures, Insert attestation would display a list of sentences from a targeted Wikisource (could be same language or other than the source project) that include the targeted sequence of characters. Their is no meaning requirement nor proximity, it is exact results only to keep it simple. In the displayed snippet of results, an editor would just grab a sentence with a single click and it will be added with the adequate sources picked from Wikidata associated with the Wikisource page. The feature would copy the sentence (no transclusion) and the source of the sentence (adding the information for the number of the page in the original manuscript optimally, i.e. "page 35."). This feature may need a specific parser to identify limits of sentences and to bold the targeted sequence of characters.
  • More comments: This feature/tool/functionality should be accessible through WikiText editor and VisualEditor. It may be interesting to keep track of the reuses of Wikisource content in other project with a specific What's link here from Wiktionary to Wikisource, similarly as Wikimedia Commons indication of reuses in others projects, but this could be part of another development. This idea was suggested in 2018 with 36 supports, in 2017 and supported by 32 people, a draft was suggested in 2016 with 19 supports and this idea was coined first in a MediaWiki discussion.
  • Phabricator tickets: T139152, T157802
  • Proposer: Noé (talk) 07:33, 22 October 2019 (UTC)

Discussion[edit]

  • I support the idea. If it will be possible to add examples from Wikisource to Wiktionary examples then it will be great! It will be real integration of two projects: Wiktionary and Wikisource. The solution of this problem requires: to split Wikisource text into sentences, to lemmatize words, create tables with links from lemmas, wordforms to texts. --Andrew Krizhanovsky (talk) 10:34, 24 October 2019 (UTC)
    In my opinion, it is not necessary to lemmatize, as inflected forms can be good examples for infected forms entries. There is an entry for each forms in Wiktionaries, so if you look for teeth, you don't want tooth in the results. Noé (talk) 13:29, 24 October 2019 (UTC)
    Agree with Noé--So9q (talk) 20:51, 24 October 2019 (UTC)
    I want the user to be able to choose: (1) search for sentences with all word forms of the lemma, or (2) search only this word form (strict search). --Andrew Krizhanovsky (talk) 18:22, 1 November 2019 (UTC)
  • I would prefer a tool like [1] for this task. Anyone can adapt or improve it by adding the Wikisource Search API as a backend. Example search for "meaning".--So9q (talk) 20:51, 24 October 2019 (UTC)
    Would you accept making a tool like that official and integrated by default, as suggested in the proposal? That's what the proposal is really about for me—making it easily accessible to all and not just custom JS users (who are a small fraction of Wiktionary editors, let alone readers). Yannis | 13:15, 27 October 2019 (UTC)
  • Strong support. I have done this manually many times, especially for words I first encounter at en.ws. This is a great proposal. —Justin (koavf)TCM 19:41, 2 November 2019 (UTC)
  • My idea is - page Foo (1), Click on something will run search in wikisource for sentences conatining word foo (2). Then editor must chceck, if this word is in correct context/sense and select part for copy to some input field (3). Sometimes some corrections are needed (…), shorten long sentence, add missing subject from previous sentence... Then click to OK - and there will be example (4) with reference.
    1. I have word cs:wikt:pitel,
    2. Search on Wikisource gives me some examples
    3. I select one of them - sentence Od dávna trvající věrný pitel vína dobrého. from Paměti
    4. I got #* {{Příklad|cs|Od dávna trvající věrný pitel vína dobrého.}}<ref>Mikuláš Dačický z Heslova: [[s:Paměti/1601–1605|Paměti]]</ref> for copying to Wiktionary.

JAn Dudík (talk) 20:52, 7 November 2019 (UTC)

  • I agree with your description, but I also think it could be done in VisualEditor without even see any wikicode. It could be user-friendly and easily accessible for new user, like "Add translation" in some wiktionaries. Like "Insert Media", very easy to use in Wiktionary. - Noé (talk) 16:47, 8 November 2019 (UTC)
    But because Wiktionary pages are mostly from various templates, VE is hardly usable in Wiktionary [2]. JAn Dudík (talk) 10:06, 10 November 2019 (UTC)
    French Wiktionary use it. It imply to document every templates with TemplateData and still, it adds several unnecessary line break, but it is possible to use it Noé (talk) 11:00, 11 November 2019 (UTC)

Voting[edit]