Jump to content

Wikidata/Notes/Language fallback

From Meta, a Wikimedia project coordination wiki

From the perspective of a client (e.g. the French Wikipedia) that displays data from a Wikidata item in an infobox, some properties in an item may not have a language-specific label in the currently selected language available. In this case, a fallback mechanism should be used to show an available label from another language.

Fallback can be based on:

  • similarities between languages (e.g. en-GB -> en)
  • the languages a signed-in user speaks
  • the languages signaled by the browser
  • the language of the client (e.g. en for en.Wikipedia.org)
  • eventually *any* language will do.

If a value from a different language is shown in the UI, then

    • the value's language should be indicated (small colored text?)
    • there should be a button to approve the value for the current language ("yes, it's the same in my language")

User fallback chain[edit]

The preferred user languages are defined in Special:Preferences for the various users. There are some limitations where the user preferred languages can be used. It seems likely they can be used when the API is accessed, or when the rendered content isn't cached, or where the cached objects can be specialized to a given language. For now the user fallback chain is one set of alternate languages where all have the same weight. Later it might be possible to add weight to the languages by giving them weights explicitly or by giving them some specific order. There can also be other hints used, such as the weights set in the browser.

If a user has defined alternate languages, all of these languages can be tried in sequence, but the main (primary) language is tried first. Content (messages, labels, or other strings) may or may not exist in the given fallback language.

For anonymous users that are not logged in, the list of preferred languages defined in the web browser may substitute the mediwiki-language-preference-settings.

Global fallback chain[edit]

Languages have relationships with each other which can be used to build fallback chains so labels in a language with a close relationship can be used if a specific label in a given language is missing. For example is the South Saami language part of the Southern group, which is again part of the Western group of Saami languages. [1] In that group there is also a Northern group and languages in that group are Northern, Lule and Pite Saami.

Mediawiki defines fallbacks for languages (defined in the languages/messages/MessagesXY.php files, see $fallback). For a specific language a list of fallbacks can be acquired, with a default language as a final fallback. Each one of the fallbacks can then be tested against whatever strings there might be in any of the language, and the string for a specific language can then be returned. If no match is found before the end of the chain a second pass can be done, but this time by checking the fallbacks for the initial list.

At present there are only few languages that define fallback. This is rather awkward for our use, but it is perhaps a wanted behavior for an ordinary wiki. A first attempt could be to extend the fallbacks with languages from the same group, perhaps also supergroup, and if this is impossible then to go for a hook and do something similar but specific for our extension.

Client fallback chain (e.g. Wikipedia-language)[edit]

In the case of Wikipedia clients, the most important fallback chain will be provided by the client. When a user changes the user interface language, on most Wikimedia foundation projects (known exception: Commons) and most (all?) Wikipedias, only the user interface changes, but not parts of the content. Changing the content of infoboxes in cases where the user language preference conflicts with the content language of the Wikipedia might be surprising. A decision whether this is ultimately desirable would probably have to be left to the community of each Wikipedia.

The base behaviour would therefore be fallback chains that start with the language of the Wikipedia (= the wikidata client). The fallback chain for a German Wikipedia might then be, e.g., "de -> en -> fr -> nl -> es -> it -> first-language-present". A necessary decision will be whether to provide a community-specific means to define and discuss this or whether only a large set of global fallback chains is centrally defined for each language that corresponds to a Wikipedia. To support community decision, a property "Property:language_fallback_chain" could be defined and set on the each item page inside Wikidata that represents a Wikipedia. This property information could then either be used live by Wikidata, or it could be harvested in regular intervals (similar to information from translation wiki).

Support for fallback in the Wikidata editing interface[edit]

A special problem occurs if a new value is entered in a language with a script that only a relatively small percentage of people read (and thus help to transliterate. An example is the name of a mayor in a city in Israel, the original name and spelling being hebrew, see mailing list: "[Wikidata-l] watching Wikidata changes that affect my wiki" of 14.8.2012.

Automatic transliteration is possible, but error prone. The Wikidata editing interface may therefore encourage (not force) editor in addition to enter information in the chosen language, to also enter transliterated and sometimes translated labels in one or several fallback languages. The list of fallback languages could be automatically determined by frequency of occurrence in fallback definitions (in practice, English will probably be the most frequent one).

The user interface interaction would require:

  • a list of languages for which transliteration or translation is especially desired (if desired this step can be skipped, making this the default behavior for all languages)
  • if a label is entered in one of these language, the software checks whether at least one label in one of the top 5 fallback languages is available.
  • if not, it offers the 5 top fallback languages and prompts the user to consider entering at least one.
  • In a second phase, a web service hook for automatic transliteration should be build in. The return of this service should not be entered directly as a new label, but could be displayed next to it ready to be accepted or modified.