Jump to content

Community Wishlist Survey 2022/Archive/Refine Search Specifications for Search not in this category - to exclude pdf

From Meta, a Wikimedia project coordination wiki

Refine Search Specifications for Search not in this category - to exclude pdf

NoN Issue being resolved in a Commons user script

  • Problem: COMMONS Image Search "NOT in this category" delivers too many results, due to missing quotes and scanning texts of PDF and DjVu files

Example current search: Jan van Gool -incategory:"Jan_van_Gool" - delivers over 150 results
Improved search: JPG +"Jan van Gool" -incategory:"Jan_van_Gool" - delivers the right set of 8 results, note the quoted name to get exact results
Improved search trial 2: NOT pdf +"Jan van Gool" -incategory:"Jan_van_Gool" - delivers 10 results

  • Who would benefit: Categorizing large art collections, like drawings from different museum collections by creator
  • Proposed solution: Would it be possible to add the quotes by default and create an advanced search button to toggle include/exclude PDF files and DjVu files? Or search by filetype with checkboxes to include PNG, JPG and TIFF only.
  • More comments:
  • Phabricator tickets:
  • Proposer: Peli (talk) 18:53, 21 January 2022 (UTC)[reply]

Discussion

  • @Pelikana: It's not quite what you're proposing, but would the new MediaSearch system work for what you're trying to achieve? It draws on the power of structured data on Commons to search by subject and file type — for example, an image search for Jan van Gool finds a good number of results. — SWilson (WMF) (talk) 07:55, 26 January 2022 (UTC)[reply]
  • @SWilson (WMF): On commons I open the Category:Jan van Gool and under topbar button "More" i click "Search not in category". To get just the right set of images i have to restate the search as :[ JPG "Jan van Gool" -incategory:"Jan_van_Gool" ]. Than I get a comprehensive overview of the few exactly right images and fragments of the text that often also show me the info "by ... / after ... / dedicated to ..." so these images can be added to the proper categories and subcats where they fit best. Thank you for showing me special image search but that seems neither not doing it surgically correct and does not show me any textinfo in the overview, plus it does not pop up my Cat-a-lot tool. It's not about finding plenty of images that are faintly related like Roger+van+Gool but to find exact results like "Jan van Gool"+jpg -incategory:"Jan_van_Gool". I think most important improvement would be to by default add quotes around the searchkey. - I moreover would desire the possibility to exclude all and any subfolders of cat:Jan van Gool. But that might be asking to much, so sometimes I try it with double -incat statements. Note: only for purpose of this experiment the search results are kept "as is" and not further processed, the goal of the effort would be to copy/move each of these nine files to the proper cat and subcat of Jan van Gool. And result of renewed search will be 0 items, to know job is done. (My search prefs in commons are 'exact' and 'standard' enabled.) Peli (talk) 11:15, 26 January 2022 (UTC)[reply]
    @Pelikana: Ah! That makes sense — this is about the searchnotincat gadget! Sorry, I didn't have it enabled so didn't recognise the "Search not in category" label. Adding quotes around the search term (page title) should be a simple fix. I've left a note on the technical Village Pump asking about this change. To also remove files in subcategories from the search, does the following search work? "Jan van Gool" -deepcat:"Jan_van_Gool" SWilson (WMF) (talk) 01:18, 27 January 2022 (UTC)[reply]
    @Pelikana: I've forked the gadget and made the above changes. Instructions for trying it out: commons:User:Samwilson/Searchnotincat. Let me know what you think. I'll archive this proposal, and we can continue discussion over on Commons. Sam Wilson 04:16, 28 January 2022 (UTC)[reply]
    Ya. Thank you. And -deepcat worked fine. I will try more and comment. Peli (talk) 04:23, 28 January 2022 (UTC)[reply]