Jump to content

Community Wishlist Survey 2021/Search/Maintaining a list of the most common search terms which do not correspond to an article name

From Meta, a Wikimedia project coordination wiki

Maintaining a list of the most common search terms which do not correspond to an article name

  • Problem: Not knowing which search terms are failing in directing users to an article.
  • Who would benefit: All users and editors who like to make redirects and new articles.
  • Proposed solution: Maintain a list which shows the most common search terms that do not successfully direct or redirect to an article.
  • More comments : This list would allow editors to easily see which terms require a redirect or a new article the most. A filter for common or offensive words such be applied, and the list should remove words that have subsequently had articles made, either by refreshing or redlinking.
  • Phabricator tickets:
  • Proposer: HappyMihilist (talk) 16:04, 18 November 2020 (UTC)[reply]


Also this is about searching, not redlinks, so that page didn't list them, so I don't think there has been a system like this. Dreamy Jazz talk to me | enwiki 00:13, 20 November 2020 (UTC)[reply]
  • Really good idea. Although it would require storing search histories, this data can be anonymous. A way to filter out intentional redlinked search results would be good, as unlike Wanted categories or counting redlinks, there is no way it can be removed from the list by editing. A way to filter out I think is needed, especially if an LTA decides to spam. I suggest this data should be time limited (so that it drops of the list without needing to filter) and that entries should be removed if the page is created. Dreamy Jazz talk to me | enwiki 00:13, 20 November 2020 (UTC)[reply]
Yes, I think some kind of management of the list is definitely necessary to avoid offensive or too common words in being on the list. I'm sure there already exists lists of such words to use. I actually do think however that the list could redlink the items by default, as this would allow for quickly removing unnecessary entries. Or it just gets updated daily, in which case those entries automatically disappear. HappyMihilist (talk) 06:17, 20 November 2020 (UTC)[reply]
I caution on “offensive” however. If a word pops up enough to be included on such a list it’s obviously in use enough to be of interest. Also to mind is what offends some doesn’t others. The British use of C—- is even heard on the floor of parliament at times. Use on the floor of the US a House or Senate would be national news. Using god in any exclamation is extremely problematic in many southern areas of the US, yet commonly used elsewhere. Etc.
  • Please make this happen, it would be so great. Abductive (talk) 18:28, 20 November 2020 (UTC)[reply]
  • The data required for such a summary appeared in 2012. I think it rapidly disappeared for privacy reasons. The technology is (or was) available; perhaps a summary could be released again. I promise not to have my PC search continually for The Certes Garage Band before claiming it to be our most wanted missing article. Certes (talk) 22:52, 22 November 2020 (UTC)[reply]
  • Love this idea. Should be easy enough from a data perspective and would be high impact, as it would give editors a list of the most important articles or redirects that are yet unwritten. One extension could be some way of handling misspellings of the same query and bundling those into one "search term". —Shrinkydinks (talk) 22:35, 24 November 2020 (UTC)[reply]
  • From a Wiktionarian perspective, it sounds very useful to. People may look for neologisms and we may track them with this list! Noé (talk) 12:09, 29 November 2020 (UTC)[reply]
  • phab:T8373#1856037 and [1] have some more information about why this hasn't been done before. The core of the issue is that it's difficult from a privacy perspective and it's also not very useful data, which made overcoming the privacy concerns not seem worth it. --Deskana (talk) 23:31, 30 November 2020 (UTC)[reply]
  • I use Wikipedia a lot with maths and physics students aged 18 or above. Many of them opt out soon for two mainly two reasons: a) the first lines of an article are unintelligible as they directly address experts. And b) searching for a specific topic often leads to confusing and non-specific search results. Collecting dead-end search words to create redirects has the potential to increase usability for well-educated but not expert users. Rhetos (talk) 08:07, 15 December 2020 (UTC)--Rhetos (talk) 07:13, 15 December 2020 (UTC)[reply]