Community Wishlist Survey 2019/Search

From Meta, a Wikimedia project coordination wiki
Search
6 proposals, 122 contributors, 166 support votes
The survey has closed. Thanks for your participation :)



CAS numbers of chemical substances

  • Problem: Some chemicals have many alias, especially the organic ones. When we want to create a page for a chemical substance or search one, it is more convenient when a CAS redirect exists.
  • Who would benefit: Users who create/translate articles about chemistry; users who search them; small wiki editors and cross-wiki editors.
  • Proposed solution: The CAS number is unique to a certain chemical substance. When we search a CAS number, we can find a page that already exists. If it doesn't, in search results, we hope to find the substance in other wikis or Wikidata. It is helpful when we want to get information, create/translate a page, link the pages to other wikis or know if there is any corresponding image (category) on Commons.
  • More comments:
  • Phabricator tickets:

Discussion

Note that CAS numbers are searchable in Wikidata; for example search on '15307-86-5' finds 'diclofenac'. That works of course if the CAS registry number property has been set for that substance. ArthurPSmith (talk) 17:25, 1 November 2018 (UTC)[reply]

I think this works because there were redirects from CAS numbers imported from some wikis as aliases; for all compounds it works using 'haswbstatement:P231=<CAS number>'. But I think this proposal is out of scope of Community Wishlist Survey; it's up to every single one Wikipedia language version whether or not they want to create such redirects. Wostr (talk) 18:39, 1 November 2018 (UTC)[reply]
The following tool may be used somehow: https://tools.wmflabs.org/wikidata-todo/resolver.php?prop=231&value=<CAS>, e.g. https://tools.wmflabs.org/wikidata-todo/resolver.php?prop=231&value=87-82-1 --Leyo (talk) 09:58, 28 November 2018 (UTC)[reply]

Voting

Search by suffix

  • Problem: Here is an example: Sometimes I would want to look up for all the articles in Wikipedia that end with "by elevation" (an example returned by this search result shall include, say, list of all the countries by elevation). However it is not supported in the current searching function and it seems that no regEx would make this work either.
  • Who would benefit: Advanced editors who would want to patrol pages named with a fixed ending.
  • Proposed solution: Something similar to Special:PrefixIndex
  • More comments: This would be helpful for people who don't know RegEx well enough (like me). I anticipate that this would mostly be used on maintaining article namespace, category namespace and file namespace.
  • Phabricator tickets:
  • Proposer: 燃灯 (talk) 18:34, 4 November 2018 (UTC)[reply]

Discussion

@燃灯: Is https://en.wikipedia.org/w/index.php?search=intitle%3A%22*by+elevation%22&title=Special%3ASearch&go=Go what you want? See mw:Help:CirrusSearch. --AKlapper (WMF) (talk) 00:30, 5 November 2018 (UTC)[reply]

It works for whole words, but what about searching words with some suffix on wiktionary? "*ština"&title=Special%3ASearch&profile=default&fulltext=1&searchengineselect=mediawiki *ština finds only one result, but there should be 100+ more. JAn Dudík (talk) 20:56, 5 November 2018 (UTC)[reply]
@JAn Dudík: Is intitle:/.*ština/ as per mw:Help:CirrusSearch#Regular expression searches close enough? --AKlapper (WMF) (talk) 11:25, 6 November 2018 (UTC)[reply]
Yes, thanks, only problem with copying these urls. But not much user friendly for normal users. JAn Dudík (talk) 12:54, 6 November 2018 (UTC)[reply]
Thanks; I don't think many editors know that. But /.*ština/ does just the same as /ština/ (more slowly): it matches "Opština Vrbas" etc. intitle:/ština/ -intitle:/ština./ seems to work. Certes (talk) 00:22, 7 November 2018 (UTC)[reply]
  • It is very specialized and average readers do not usually need it. Advanced users can search it via query SELECT page_name FROM enwiki_p.page WHERE page_name LIKE "%by_elevation". --Tohaomg (talk) 21:45, 10 November 2018 (UTC)[reply]

Voting

Add "Page Views" an input filter and as a sort option in the output.

  • Problem: Users may use search to choose which pages to edit first, and they may want their editions to be more meaningful in terms of audience. They have a lot of external tools for page stats such as page views, but this not embedded into the search function.
  • Who would benefit: Let's say an user search for pages in a particular category missing a specific template and get 2000+ results. He wants to start editing the pages and add templates and he wants to do it in the more meaningful way. Such users will benefit from this functionality.
  • Proposed solution: Add 'Page Views' as an input filter and as a sort option in the output.
  • More comments:
  • Phabricator tickets:
  • Proposer: Cabeza2000 (talk) 13:11, 8 November 2018 (UTC)[reply]

Discussion

Perhaps it is part of the default formula to sort results but I think it has low weight. I use the search functionality quite frequently and I believe this may be the case. I can also tell you that for the same search the output sorting remains the same over time (as expected) but is not clear how it works, neither it is explained at Help:CirrusSearch.--Cabeza2000 (talk) 11:00, 9 November 2018 (UTC)[reply]
I have dig into this and I see that CirrusSearch already have two rescore profiles giving weight to Page Views. These are named wsum_inclinks_pv and popular_inclinks_pv with the second having a "very high weight on Page Views". This can be seen here. Sadly, I could not find any indication of end users having the ability to modify the rescore profile by themselves. It seems it is just an option for people defining how the CirrusSearch extension is used in each wiki. In the other hand, it does seem that developing support for end users to switch rescoring profiles may not be that hard to develop by the Community Tech.--Cabeza2000 (talk) 12:44, 9 November 2018 (UTC)[reply]
The default search algorithm on the top 18 wikis by search volume is an ML ranking algo, this uses page popularity (derived from page views) as a feature of the ranking model but it's not quite as simple as more page views -> higher rank. On wiki's outside the top 18 they do not use the page popularity as a ranking signal because outside of ML the way it is combined with the ranking signals would have to be hand tuned for each wiki.
For the question at hand, searching on this field, sure we can add it to a list somewhere and it becomes one of the sortable properties. IMO what this question is asking, with respect to whats currently implemented, is for someone with UI experience to expose sorting in the web ui. Adding the popularity as an option for direct ranking is relatively trivial. EBernhardson (WMF) (talk) 15:26, 9 November 2018 (UTC)[reply]

Voting

Linksearch overhaul

  • Problem:
    • Protocol-specific: Currently, I have to do two separate link searches to find links to cover both secure and non-secure links e.g. Special:Linksearch/*.example.com and Special:Linksearch/https://*.example.com.
    • I can't filter by namespace - unless using the API, and even then it is clunky as the filtering is done in PHP and not MariaDB.
    • Perform more complicated searches (e.g. blogspot.*)
    • Result set size limitations.
  • Who would benefit: Anyone who uses this special page.
  • Proposed solution:
    • Eliminate the technical debt in the externallinks table to make queries faster.
    • Separate the domain and protocol out as distinct database columns. Make these queryable.
    • Add a proper namespace filter.
    • Return results for all protocols when no protocol is specified.
    • Make all of these improvements available through the API.

Discussion

Some notes:

  • Filtering by namespace would require creating and populating a column in the database and adding appropriate indexes. gerrit:163470 might be relevant there.
  • Separating the protocol from the rest of the URL would similarly require database changes. It still might not be possible to request "http OR https", just "http", "https", or "any protocol".
  • More complicated searches on the domain/path are rather unlikely, as efficient SQL search of text columns is generally limited to prefixes (or depends on methods that are heavily database-dependent; note MediaWiki actively supports three database engines and sort-of supports two more).
  • There's also the fact that searching for links to internationalized domain names (IDNs) means you have to try both the encoded and IDN version. That'll be fixed by gerrit:322729, if it eventually gets merged.
  • There's also the fact that searching for links using IPs doesn't work very well. That too will be fixed by gerrit:322729.
  • If by "eliminate the technical debt [...] to make queries faster" you're referring to the fact that it gets slower as you page through the results and there's therefore a limit on the special page, that should be fixed by gerrit:322729 too. Although that patch doesn't actually remove the special page's limit.

Anomie (talk) 14:59, 30 October 2018 (UTC)[reply]

  • @MER-C: Just a ping to let you know about the comments above. Possibly they could help you clarify your proposal before the voting begins. (With 200+ ideas, clear proposals make everyone's life easier!) Quiddity (WMF) (talk) 01:30, 8 November 2018 (UTC)[reply]
  • There is a really powerful workaround for linksearch and that's just to use Special:Search insource. It handles arbitrary URL schemes, can be filtered by namespace, has wildcards if you know a little regex, and has no result size limit. I might even advocate we eliminate the technical debt by removing the Special:LinkSearch page and advocate Special:Search instead. --Izno (talk) 23:57, 8 November 2018 (UTC)[reply]

As an alternative to Anomie's suggestions, this can also be done in the elasticsearch cluster. We already index the external_links for every page on every wiki, they are just not analyzed in a way that is useful for this type of search. It is certainly possible to run analysis on the external links to create sub-fields like domain name, url pieces, etc, and search against those. Additionally we could probably expose regex on external_links if if was a common enough request. EBernhardson (WMF) (talk) 15:33, 9 November 2018 (UTC)[reply]

I note that Special:LinkSearch is a core MediaWiki feature, while use of ElasticSearch is optional and requires an extension. We should keep in mind usability by non-Wikimedia wikis. Anomie (talk) 19:04, 9 November 2018 (UTC)[reply]
  • Adding a namespace filter to linksearch has been on my top five list ever since it was briefly implemented then withdrawn. It is massively useful. You can do it from AWB, I have on occasion resorted to using AWB to process a list then bringing the list back to enWP to fix. PLEASE do this! JzG (talk) 00:31, 16 November 2018 (UTC)[reply]

Voting

Sort search results by date

  • Problem: Currently, the search results appear in random order and there is no option to sort them by date (i.e. most recent first). This is a problem especially with administrative noticeboards that have hundreds of archive pages. For example, see this search result of the English Wikipedia's AN/I: [1]
  • Who would benefit: Everyone.
  • Proposed solution: Enable sorting by date. Alternatively, simply bringing in the search results by most recent first would help also.
  • More comments:
  • Phabricator tickets: T18237, T195071, T40403
  • Proposer: Pudeo (talk) 13:50, 3 November 2018 (UTC)[reply]

Discussion

I tried the Wikipedia's AN/I link above with mw:Help:CirrusSearch#Prefer-recent as [2], but that order is still confusing. Meh. --AKlapper (WMF) (talk) 22:36, 3 November 2018 (UTC)[reply]

A sort by function would be very nice. Toasted Meter (talk) 02:25, 4 November 2018 (UTC)[reply]

@Pudeo: The technical ground work for this is already done. See ANI sorted by last edit ascending. I propose rewording this proposal and request to expose this functionality with the User interface. —TheDJ (talkcontribs) 11:08, 5 November 2018 (UTC)[reply]

My wish: Possibilities to sort search results not only by date, but also alphabetically by filename and/or uploader/author. JopkeB (talk) 10:04, 4 November 2018 (UTC)[reply]

@JopkeB: If you have additional requests, please create separate tickets as well. That is probably the best way to get attention for them. —TheDJ (talkcontribs) 11:08, 5 November 2018 (UTC)[reply]
  • Hello, the proposal says "sort them by date (i.e. most recent first)". Actually, there are multiple dates of interest: first edit, last edit, and content's date (like the date of an event). --NaBUru38 (talk) 19:13, 7 November 2018 (UTC)[reply]

You can currently sort by last edit date (among other options, see ApiSandbox for the search query for the full set of options): https://meta.wikimedia.org/wiki/?search=survey&fulltext=1&sort=last_edit_desc Sorting by first edit is currently populating in the search indexes, but will be available in december or january. EBernhardson (WMF) (talk) 15:30, 9 November 2018 (UTC)[reply]

Voting

Index all labels and aliases on Wikidata

  • Problem: Sometimes the names of entities are transcribed differently in different languages. A subject for instance could be spelled in different ways in Russia, Germany, France and England (not to mention the numerous other languages). Often these are already recorded as labels and aliases in Wikidata in say German, French and Russian but a user searching, say, the English Wikipedia will not find the relevant article unless redirects for other variants have been created on the en.wiki.
  • Who would benefit: All people searching using the Wikipedia search box. It would also help foreign-language users search content in non-home Wikipedias. Additionally many users searching on Mobile phones often work only with the English keyboard making searches in say a Hindi Wikipedia could sometimes be easier using the English label (as users in that geography tend to be bilingual as far as keyboard usage goes).
  • Proposed solution: I believe this should be so little work that it could be rather easily done - in the worst form (because it involves duplication) of the solution, the community could write bots to add redirects to all wikis by examining the label and alias fields on Wikidata.
  • More comments:
  • Phabricator tickets:
  • Proposer: Shyamal (talk) 07:33, 4 November 2018 (UTC)[reply]

Discussion

  • We are already indexing all labels in all languages, and you can use it when searching Wikidata. However, I am not sure how to use it to search on language wikis - e.g. if you search on enwiki and it matches item label on Wikidata, what should the result be? Should it return enwiki sitelink? What if there's no enwiki sitelink for this item? Smalyshev (WMF) (talk)

02:15, 7 November 2018 (UTC)

Yes, wiki on which one is searching if site link exists. Shyamal (talk) 20:19, 8 November 2018 (UTC)[reply]
Just to clarify - here is a specific example - there is the article en:Władysław Taczanowski but there is no en:redirect at Ladislaus Taczanowski but that is already listed as a label in German for the same entity on Wikidata. A search on the English Wiki for "Ladislaus Taczanowski" should ideally have taken me to the right article even if there are no explicit redirects based only on the entries in Wikidata as listed under aliases/"also known as" and the language labels. Maybe some amendements need to be made based on what is not to be indexed, but I am afraid I do not fully understand those issues. Shyamal (talk) 10:42, 12 November 2018 (UTC)[reply]

Voting

Thank you for the comment. Actually you seem to be voting on a solution to the problem, which I believe should be open to further discussion. The use of bots would indeed be the a bad solution - as indicated. I think it would be the second worst, as others have even suggested that this can be manually fixed with redirects. The point is that Wikidata and Wikipedia working as a single system would be beneficial. Shyamal (talk) 12:05, 21 November 2018 (UTC)[reply]