Jump to content

Guidance/facilitation of categorization of files at upload in UploadWizard (Community Wishlist/W526)

Under review
From Meta, a Wikimedia project coordination wiki

View all wishes.

Description
Uncategorized files on Commons over time
There are 191,749 files without category in c:Category:All media needing categories as of 2025 for example. c:Commons:WikiProject Minimum One Category is a new WikiProject to reduce numbers like this and recently all files up to including 2020 were finished (all files got ≥1 nonhidden category) but it's a lot of work for volunteers and often the categories that are being set are quite broad.

It would be better if, instead, the fraction of files with no category after upload is reduced via helping and facilitating uploaders to set fitting categories.

Uploaders often know better which category/ies fit the file (eg they know the location of the photo), they would need less time to set fitting categories, it would make the file available in given category much earlier, and it would result in less valuable time being drained from active volunteer contributors who could then use it for other contributions.

In addition, the problem is not limited to files without any category – oftentimes files that get uploaded lack a key category or the main category where one would look for a file like it. For example, I've gone through the feed of recent video uploads for many months and found most videos that I checked had some essential category missing such as many videos only having categories like "Videos of 2023" or "City name" but no category for what the video is about or what it shows – such as "Videos of polar aurora" – (the subject is essential) or for language (also key) like "Videos in Chinese".

It needs more categorization to be done at the source, at the time of upload.

Implementation

Please facilitate and aid more users to categorize their files themselves during and/or right after uploading.

  • For example, by showing some guidance underneath or in a box on the right with some info on how to think of relevant categories (like Think about what type of image it is – e.g. a photo, diagram, video or chart – and which language it is in) if no categories were added (or just one) and the user tries to upload in the Upload Wizard.
  • The info displayed could be dynamic based on for example what's in the file title, or in the categories if any is set, and on the media type (eg image or video). One could maybe also show concrete category suggestions there that the user could confirm but that may work best based on categories already set by the user instead of e.g. file title. If at least some guidance for categories is added, this could be improved over the years.
  • One could also show prompts asking the user things like Which language is the chart in? Please use a category like c:Category:English-language charts if the user just set broad category c:Category:Charts.
  • If the file is a video one could show info prompt(s) like/including What language is the video in? or Tip: here you could specify which language the video is in if any.
  • Categories could be suggested based on file-title, description, metadata, and content, similar to c:User:Alaexis/Diffusor (this tool can only be used if the file is in at least 1 cat). The user doesn't have to pick these categories so it doesn't matter much if it suggests some categories that don't fit: the user is expected to select the ones that fit or use these as help to understand which kind of categories are available and how they are named. A very useful type of category suggestion would be location categories based on GPS coordinates in the EXIF metadata if any is present.
  • If there is no or only one or two categories set, one could display the info that the file is less likely / easy to be found and used if the user does not add a/more categories.

Ultimately all of this is largely a two-stage process where adding initial category/ies is stage 1 and diffusion into more specific categories is stage 2; categorization can be improved a lot if initial category/ies are set if the one/s set is/are about the main topic/usefulness/uniqueness of the file. Probably both stages need some development.

Reasons/uses for better categorization

Categories can then be used…

Basically, a file about a topic should not be missing in the category branch about that topic.
Likewise, things like specifying the language a video or diagram is in better enables translating files (eg see which diagrams in English are used in your nonenglish Wikipedia to translate these) or searching/filtering for a specific language. It also enables identifying media gaps and seeing which other files are already available or not available about a subject. Content creators can also use these to quickly find good-quality media for their needs. There's countless reasons for why it's good if files are well-categorized.

Prior proposals
Related topics

These are not proposed here but could be discussed separately:

  • Categorization should be aided not only when files get uploaded via UploadWizard but also when they get imported via tools such as Video2Commons and Flickr2Commons
  • Lack of categorization can be addressed not only at time of upload but also afterwards – e.g. there could be a reminder on user's talk pages if they did not categorize their files and/or one could use file-uses in mainspace to suggest the Commons category linked to the article where the file is used (this is currently done manually, partly using tools like GLAMorgan)
Assigned focus area

Unassigned

Type of wish
Feature request
Tags
Affected users

People using Commons and people uploading to Commons

Other details
Voting

This wish currently has 5 supporters. Voting for this wish is open until it is completed.

Supporters of this wish
Support Prototyperspective (talk) 13:02, 22 March 2026
Support The website's upload wizard should take many hints from the Commons Android app which has far superior categorization. For instance start by asking for Wikidata depictions, which are localized (category search only works for speakers of one language). Give suggestions based on EXIF, filename, uploader's history, maybe even on privacy-friendly LLM analysis hosted on Wikimedia's server. These suggestions can then be selected by the uploader, who best knows the subject. Syced (talk) 01:38, 23 March 2026
Support This is great. Even the probing questions would be of huge benefit. Hiàn (talk) 21:42, 23 March 2026
Support Pppery (talk) 23:06, 23 March 2026
Support We need to mitigate the risk of generating a very large backlog of ucategorized files that cannot be resolved by volunteers. NearEMPTiness (talk) 21:00, 27 March 2026