Interlanguage use case

From Meta, a Wikimedia project coordination wiki

This is an example of an article that has been originally written in multiple different languages, translated into many more, and associated with related articles of greater or lesser detail. The motivation is, how best to present interlanguage links for these articles.

Basic questions[edit]

  1. Should there be a single list of interlang links to which each of these articles is connected?
  2. Should the question of which links are associated with an article be left up to the discretion of its encompassing language-project?
    • Should my changes to the it: wiki affect the observed interlang links of related articles on other wikis?
  3. Should it be possible to associate an article with more than one article in a target language?
    • If the answer above is "no", how best to handle the inevitable many-to-one issues that spring from variations in article granularity and linguistic specificity among languages?
    • If the answer above is "yes", how to keep the interlang interface from being overloaded with options?
  4. Should all interlang links be bidirectional? Should it be acceptable (in an ideal world) for a new language to link out to an article on en:, with no en: article linking back to one in that new language?

Related notions to consider:

  • The cost of moving a page with many interlang links (currently an O(N) problem best cleaned-up by a bot)
  • The prevalence of asymmetric interlang links, with most articles linking to en and en linking back, but other langs not linking to one-another.

Parallel abstracted questions to consider:

  • How this discussion reflects decisions re: article naming within a language
    • The cost of cleaning up redirects and history merges when moving a page with many incoming links (another O(N) problem best cleaned up by a bot)
    • The unavailability of universal article-IDs (relying on article name to track an article's history and incoming links)

Use case[edit]

Originals
E1 en:early-foo
E2 en:middle-foo
E3 en:late-foo
D  de:foeo
J  ja:fooiro

Translations, Similar arts
E' en:foobar
N1 nl:aef
N2 nl:oef
I  it:fooletto
I' it:fooferaw
F  fi:uffala
Z  zh:fooxi
Z' zh:Keng-foo
  1. Jane creates E1,E2,E3.
  2. John notices E1, and connects it to N1
    • N1 covers part of E1 and part of E2; someone also links N1 to E2, and links N2 to E2 and E3
  3. Moritz creates D, and links it to I.
    • Later he finds F and links that to I also.
  4. Carlo manages I; sees the link to D, and changes that link from D to I'.
    • Moritz has no idea that D and F now link to different it: pages
  5. on Zh:, two different people link fooxi and Keng-foo (a subvariant of the rather broad term fooxi, with many specific examples; the longer article) to different combinations of the meaty articles E1, D, and J.
  6. F is accidentally linked to the popular but unrelated E', mistakenly thinking (without reading the article) that it was the nonexistent overview article en:foo. There is an ongoing edit war on E' regarding placement of a dab notice redirecting to E1-E3 and other foo-like articles; the dab is currently only on the Talk: page, with a single link there from ==See also==.

IRC discussion[edit]

TimS: What we have is a complex data structure, so like any complex data structure, the best way to handle it is to display it to users in all its glory, preferably in some kind of graphical representation, and allow them to manipulate it.

What you have is a cluster of articles. Now in theory, interlanguage links could zig-zag away from the original meaning, following slightly different shades of meaning in different languages... but that would be bad, so we should discourage that on a structural level. So we have a well-defined cluster of articles, composed of all the articles you can get to by following interlanguage links.

Sj: As far as I'm concerned, 99% of *good* sets of interlang links are about a single lang-neutral concept. If this were true, a collection of lang-links (~one link per lang, one list per concept) could provide the base case. On top of this there could be (possibly inefficient) patches for exceptions.

Other advantages of one-list-per-concept:

  • Choice of concept cluster
    Consider an article that is in more than one cluster (imagine a small wiki with few articles, w/only one art about a dual-meaning topic, rather than a dab).
    If you add that article as an interlang link to a different art, a smart interface will show you how many possible concept-lists you might want to join, and offer you a choice among them.
    Of course if you *really* object, you could create yet another concept-list, and join that one.
  • 99% of lists would be stored with N links, rather than N^2 links (where N is the number of langs the topic exists in)

"Graph" model[edit]

Imagine the graph of all articles connected to one another (or implicitly connected back to one that has connected to them) via interlang links.

Say C links to A, but then someone later decides that B2 is a better match to C. they should be able to change that link without affecting the rest of the interlanguage links. So perhaps we have implicit links and explicit links explicit ones taking precedence first there's an explicit link from C to A, which causes an implicit link from C to B1 then another explicit link may be created, from C to B2.

We can still have explicit link conflicts:
B1-->A
B1-->D1
 C-->A
 C-->D2

Q & A[edit]

Then, when does an update to C's links propagate to other langs? If C changes to link to A' should B1-B3 also change their links?

Well, implicit links will be reconfigured immediately but not explicit links.
It should be possible to rename A without affecting the graph at all

sj: If B1 and B2 both link to A, but A doesn't link to either, how should A be linked? What if meanwhile, B1 and B2 each have separate graphs of their own; does A only pay attention to the one it picked?

A would have to trace out the whole graph say if C linked to B1, and D linked to B2 then A would have to display links to both C and D which really makes it hard to justify selecting either B1 or B2 to display

TimS: the thing is, if you allow multiple items to be displayed, the graph model becomes equivalent to the list model if every member of the graph is equivalent and displayed, then all those members can be stored as part of a list

I think there is an aspect of "choosing a concept" in the list model

which I don't see in the graph model (which effectively says "show everything that links to this page in any context")

you can choose multiple concepts... an article can be in more than one list... ah, there's the difference: in the graph model, the list of relevant articles is the same for every article in the graph, whereas in the list model, the list membership may change from article to article.

the original concept of (occasionally overlapping) clusters ... could be implemented as a list-with-choice or a graph-with-choice, but choice of cluster is important and avoiding meaningless superculsters too

Distance method[edit]


Things get interesting in the graph model if you start selecting articles to display based on distance. Say B links to A, C1 links to A, C2 links to B Then C1 is "closer" to A (1 step instead of 2), so it gets the link from A to C.

This can also get awfully misleading; then you could have half the interlang links about one topic and the other half about an entirely different topic.

"List" model[edit]

sj: I think the ideal solution for lang links is to have a list of article links, not associated with any particular language, for each cluster/abstract topic. Some rare articles might ber part of more than one cluster/topic, and therefore on more than one list of links. Then there could be a table to store links between articles and lists, so that a given article can be automatically associated with all relevant articles in other languages.

When an editor adds an interlang link, she would be adding the article in question to an existing cluster, or creating a new cluster.

TimS: maybe it's better to link to disambiguation pages in the target language, than to link to more than one article in the language

"disambiguation" to a foreign-speaker might not require a dab page for native speakers (e.g., Pedagogy v. Education) but I suppose we could train people to create special for-interlang-only pages...

Yes, a disambiguation page for a foreign word. if two articles in language A correspond to one article in language B, then the links can't be bidirectional, if only one link for each language is allowed, A:1 and A:2 both link to B; B linkes to A:dab.

Q & A[edit]

TimS: if you have one list per concept, does that mean only one article on a given language can be a member of a given list?

following an interlang link for a fringe case (i.e., more than one related art in a given language)could show you a little tree structure of the relevant arts in that lang in the sidebar itself, you could show "# of related arts" attributes next to the lang name
English(3) (if there are 3 equally important arts)
Deutsch(1+2) (if there is one primary art and 2 others)
of course this extra information would only be turned on as a pref
users within a lang adding Yet Another interlang link from a specific to a general article would be shown the current tree and asked to add theirs in the appropriate place.
Later, users of the target lang could better redistribute the specific articles so that they link directly to specific subarts
(rather than automatically generating these trees, we're letting users actively change them, as per earlier statements about exposing gory details to the user)

Older answer:

could go either way. IF we do one article per language, then there could still be many arts in that language which link to the list, but only one 'primary' one. For instance, "Origins of the American Civil War" (a four-article set with a single highest-level art): each subarticle could show the same list of interlang links but someone coming in from outside would always be directed to the primary art
If we allow multiple arts per language, what if another language gets the same 4 subpages? The interlanguage links would always point to the main article, right?

they wouldn't go between subpages?

Right. You could create separate lists of links b/t subpages, but those wouldn't include the links to the langs without detailed subpages.