Community Wishlist Survey 2020/Wiktionary/What's in the newspaper today?
Appearance
What's in the newspaper today?
- Problem: Wiktionarians can't detect every new used word in real time to include them as soon as they appear, although they are examples of use accessible online.
- Who would benefit: Contributors and readers
- Proposed solution: Development of a tool that harvests online newspapers to record words that are missing in Wiktionaries database.
- More comments: This tool have to be adapted for each language and/or resource. Darkdadaah created a similar tool and had made it run from 2010 to 2013 for French.
- Phabricator tickets:
- Proposer: DaraDaraDara (talk) 14:54, 8 November 2019 (UTC)
Discussion
- Harvesting newspapers is a great way to detect new words, and it helps to have selected sentences to add as examples, after some manual selection as some sentence are correct but too long or too much in need of the context. Also, a thematic labelling may help Wikinewsies and Wikipedians to find more sources. Noé (talk) 11:27, 15 November 2019 (UTC)
- What about licenses of those newspapers? -Theklan (talk) 10:30, 22 November 2019 (UTC)
- @Theklan: it does not matter. The same rule applies for the book. Here the idea is just to crawl all the newspaper everyday and to extract only the sentence with the new word. In that case, this is short citation and it is allowed to use it. See this page with French words for example. Pamputt (talk) 13:40, 23 November 2019 (UTC)
- We already have "Wiktionary:Frequency lists" which is based on tv subs and have thousands of missing words in all languages including English. Newspapers will give you a lot of typos and game plays. Not needed as long as we have big lists of missing words. We can also use aspell lists to locate some more missing words. In Wikipedia there is a project named moss (under typo team) that offers thousands of missing words that are used in Wikipedia.Uziel302 (talk) 21:18, 25 November 2019 (UTC)
- @Uziel302: indeed we will have typo but they should be limited because here we parse newspaper (not blog and forum). And yes, there are alrealdy a lot of missing words but parsing newspaper will help to identify neologisms and then to create the missing entry. It is interesting because people can look for neologism more than rarer words. Pamputt (talk) 06:45, 26 November 2019 (UTC)
Voting
- Oppose Uziel302 (talk) 06:29, 26 November 2019 (UTC)
- Support Noé (talk) 21:44, 20 November 2019 (UTC)
- Support Great idea ! Lyokoï (talk) 12:32, 21 November 2019 (UTC)
- Support--Pom445 (talk) 16:10, 21 November 2019 (UTC)
- Support Otourly (talk) 16:12, 21 November 2019 (UTC)
- Support Hildepont (talk) 16:45, 21 November 2019 (UTC)
- Support Exilexi (talk) 20:11, 21 November 2019 (UTC)
- Support Pamputt (talk) 21:31, 21 November 2019 (UTC)
- Support Libcub (talk) 08:55, 22 November 2019 (UTC)
- Support but Theklan makes a point. BEANS X2 (talk) 12:09, 23 November 2019 (UTC)
- Support Shizhao (talk) 03:05, 25 November 2019 (UTC)
- Support Skirienko (talk) 13:12, 25 November 2019 (UTC)
- Support DemonDays64 (talk) 14:43, 25 November 2019 (UTC)
- Support--Yoavd (talk) 15:15, 25 November 2019 (UTC)
- Support Liuxinyu970226 (talk) 15:45, 25 November 2019 (UTC)
- Support A garbage person (talk) 16:13, 25 November 2019 (UTC)
- Support 游魂 17:06, 25 November 2019 (UTC)
- Support Waddie96 (talk) 18:43, 25 November 2019 (UTC)
- Support Eunostos (talk) 20:51, 25 November 2019 (UTC)
- Support Harshil169 (talk) 13:13, 26 November 2019 (UTC)
- Support --Thibaut120094 (talk) 16:42, 26 November 2019 (UTC)
- Support Cinemantique (talk) 22:59, 26 November 2019 (UTC)
- Support The Editor's Apprentice (talk) 18:22, 29 November 2019 (UTC)
- Support Romainbehar (talk) 12:52, 30 November 2019 (UTC)
- Support Novak Watchmen (talk) 17:45, 2 December 2019 (UTC)