Jump to content

Community Wishlist/Wishes/Add machine translated category titles on WMC

From Meta, a Wikimedia project coordination wiki
Add machine translated category titles on WMC Open

Edit wish Discuss this wish

Description

Until now, internationalization of category pages on Wikimedia Commons consisted of Commons editors in quite rare cases adding an small number of descriptions in entirely arbitrary other languages than English via templates like {{ja|日本語表記}} or {{en|English description}}.

For example c:Category:Retention ponds is one of the small subset of categories with descriptions in several languages: in this case, Hungarian, English, German, and Japanese.
In contrast, even very large categories linked on the Main page only have the English title and no or only an English description. These:

  • clutter these pages for editors who have the default config where all translations are shown
  • is a place where there can be lots of undetected vandalism or inaccuracies (even if people see it they can't read it)
  • isn't useful or reliable as only few categories have these
  • isn't useful or reliable as the categories that have translations only have a few of them (and often only quite small languages and not larger ones in terms of overall speakers or percentage of readers' language)
  • are a large time sink where adding and checking these only takes away editors' time they could use for other contributions

Machine translation (MT) is starting to become very accurate now, for example try DeepL and other MT tools like SoniTranslate.

Category titles are usually quite short (typically only a few words or a short phrase) and for these lengths, MT is nearly always quite accurate.

The rare cases where it gets things wrong are a) worth the benefits and these b) flaws can be corrected using a machine translation correction system where people can see the machine translations for the languages they speak and if necessary adjust them.

  • When adjusting these, the other translations either get auto-adjusted as well (e.g. via translating the changed part from the adjusted part) or the page gets tagged as needing review of the MTs. For example "light" can refer to the three different concepts 'lighting', 'bright' or 'lightweight' and when adjusting the German translation from "Licht" to "leicht" (lightweight) the other translations could be automatically adjusted as well.
  • The MT system could also make use of connected Wikimedia items such as Wikipedia articles in other languages to develop the likely best translation. Let's take the example of c:Category:Light painting: of the linked items the one in Czech WP is not ambiguous and called "Luminografie". From this unambiguous WP title one could disambiguate the proper machine translation. One could also later use something like RAG that reads the lead section of several connected items to create the best translation.
  • One could also show a small note near the cat title in target languages that the title has been machine-translated.

One could try this first with some of the best-working languages like Spanish. At a later point this could also be done for category descriptions if there are any (most categories are self-explanatory from the title and don't have any). Even further down the line, this could also be implemented for English Wikipedia categories but that is a different subject and just an idea, let's not discuss this here and maybe that wouldn't be a good idea as it's English Wikipedia anyway. Related wish: Wikipedia Machine Translation Project. It would also solve what is discussed at here as people can simply see the cat title in their own language.

Two key benefits of this are:

  • It would improve multilingualism of Wikimedia Commons, making the site much more accessible and understandable (with connected benefits potentially also including more diverse contributors and greater popularity of the site). Maybe these translated category titles could also be set via HotCat or when uploading images so more people categorize their files well and are better able to find relevant search results with Wikimedia search for non-English search terms. Rarely people create redirects in other languages (example) which can also e.g. be used with HotCat - the approach proposed here is more scalable and much better than creating redirects like that.
  • This may also allow many more people to find the WMC pages if they search the Web in their own language and indexing works well (this is beneficial for these readers and also makes all the time and effort spent there more worth it and the Wikimedia sites more useful and used more).

Assigned focus area

Unassigned.

Type of wish

Feature request

Wikimedia Commons

Affected users

Wikimedia Commons contributors & readers

Other details

  • Created: 17:05, 4 August 2024 (UTC)
  • Last updated: 16:29, 9 August 2024 (UTC)
  • Author: Prototyperspective (talk)