Tips for resolving interwiki conflicts

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Noto Emoji Pie 1f4c4.svg This is an essay. It expresses the opinions and ideas of some Wikimedians but may not have wide support. This is not policy on Meta, but it may be a policy or guideline on other Wikimedia projects. Feel free to update this page as needed, or use the discussion page to propose major changes.
 I said it in Hebrew—I said it in Dutch—
     I said it in German and Greek:
 But I wholly forgot (and it vexes me much)
     That English is what you speak!

Lewis Carroll, The Hunting of the Snark, Fit the Fourth.

This is a list of ideas for fixing hard problems in interwiki (interlanguage) links, also known as the dreaded Interwiki conflicts.

Update: Now that Wikidata is on, this document is slightly outdated, but only slightly: Wikidata by itself doesn't solve the conflicts, but only makes them easier to resolve. The ways to find the different articles are still quite the same as they were before Wikidata.

Really simple things[edit]

These tips only require you to do very simple things, such as installing a piece of free software or making your worldview more positive.

Be bold and ignore all rules[edit]

Be bold and ignore all rules. It is right for everything you do in Wikipedia, and it is required here a bit more than for other things. When you just browse foreign Wikipedias, you are a tourist; when you fix them, you're a volunteer missionary. Know that you are doing a good thing, helping fellow people. It will be worth your time and effort.

Don't count on the bots[edit]

The bots are helpful, but limited. They don't know human languages, so they can only do their work when humans did it well before them. Sometimes humans make mistakes and then bots are helpless or even harmful.

Remember that in a conflict between a bot and a human editor, the human is right by definition. If you don't like something that the bot does, it is almost certainly a bug or a limitation. "Almost", because the bot may be doing that because of a community decision. In this case, Be Bold and consider fixing the community.

Use Mozilla Firefox[edit]

Please, get Firefox and use it.

This tip is relevant for nearly everything you do on the web, but is particularly useful when browsing foreign wikis and fixing interlanguage links. Mozilla has excellent tabbed browsing, next-to-perfect font support and several free and very useful add-ons, which will make your work more fun and productive.[1]

Use tabbed browsing[edit]

Tabbed browsing is not a new idea, but it is not yet used by all people. For the purpose of fixing interlanguage links it is extremely useful, as it lets you quickly open several web pages simultaneously to compare them.

Mozilla Firefox has particularly good tabbed browsing, but essentially all modern browsers have it too. Learn to use it efficiently:

  • Ctrl-T usually opens a new tab (T stands for Tab)
  • Ctrl-W closes the current tab (W stands for Window)
  • middle click on a link opens it in a new tab
  • middle-click on the tab itself closes it.

Remember: The middle click is your friend![2]

Install all possible fonts, keyboard layouts and writing systems on your computer[edit]

How to do this really depends on your operating system, but do yourself a favor and install absolutely everything you can.

Microsoft Windows supports foreign languages quite well since 2000 (Windows ME being a particularly unfortunate exception). Go to your Control panel, browse through "Regional and language options" and make sure you have all boxes checked in the "Languages" tab.

Mac and GNU/Linux systems also support foreign languages well. Consult the manuals for your version of KDE, Gnome, XFCE or Mac OS for language support installation instructions.

You can find useful free fonts here: http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=FontDownloads

Consider installing Code2000. It includes almost all of Unicode. It's not free-as-in-freedom software, but its licensing terms are less obnoxious than those of many other non-free software and you can essentially use it for free or after paying a very small amount of money.

Try browsing Wikipedias that use less common scripts - Gothic, Tibetan, Amharic, Buginese, Cherokee. If you see squares or question marks instead of letters, look for the font installation instructions on the main page of that Wikipedia.

External editor is your friend[edit]

Firefox has a pretty good editing field and some add-ons may make it even better, but for some tasks you may find an external editor more useful.

On Windows your best options are Notepad and Notepad++. Notepad is the text editor that comes with Windows. It has pretty stable Unicode support and also supports right-to-left very well, but it's not good for almost anything else. For advanced tasks such as sorting, clever search-and-replace, etc., you should try Notepad++. Its only notable weakness is the problematic support for right-to-left languages.

If you use GNU/Linux, you probably already have a favorite text editor. vi and emacs are not great for right-to-left editing, but are editing powerhouses for anything else. If these Unix-style programs are too hard for you, consider some of the editors that come with your KDE or Gnome installation, such as Kate or gEdit. Much like Notepad++, they support Unicode well and have many advanced features, but you need to get used to their flaky right-to-left support.

Word processors such as OpenOffice or Microsoft Word are not supposed to be used as text editors, but you may try using them. Both of them have good Unicode and right-to-left support and do most of the simple tasks needed for interwiki editing right.

(The original author of this article has never seriously used a Mac, so consider writing the paragraph about Mac editors right here.)

Install and use a few Firefox add-ons[edit]

  • Transliterator (formerly ToCyrillic) - this is an unbelievably simple and useful add-on that does two things: First, it lets you easily type in foreign scripts without messing with your operating system's keyboard settings and without having to memorize foreign and confusing keyboard layouts. Second, it lets you transliterate text from Latin to foreign characters and vice-versa. It supports several types of Cyrillic transliteration and also Georgian, Cherokee, Japanese Kana, Chinese Pinyin and several other writing systems.
  • Install spelling dictionaries for a few useful languages. It may come handy when you'll want to edit a few words in a foreign wiki to show friendliness and consideration.
  • Feel free to add more add-ons that you find useful!

Use Unified login and set up a user page[edit]

Merge your accounts and set up your Single-User Login (SUL). See Help:Unified login.

Set up a user page that tells what you're doing. If you don't know the language of the target wiki well enough to write it, use English or whatever seems fit.

Using a unified user account and explaining the editors in the target wiki what you're doing goes a long way towards preventing them from reverting your precious edits and blocking you for vandalism out of silly misunderstanding.

Use edit summaries[edit]

Use meaningful edit summaries. Point to a discussion in Interwiki synchronization if there is one.

Even if you do use meaningful edit summaries, be patient when a resident editor in the target wiki accuses you of vandalism, racism or spam. It happens. If you know that you are acting in Good Faith, it's not supposed to hurt you.

Fix other things on the way[edit]

When fixing interwiki links, you may find other things to fix. Wrong spacing, invalid templates, tables, or images. You may also encounter vandalism - friends of gays sometimes vent their frustration in foreign languages. Be bold and fix it.

When you are fixing an article in a foreign language that you encounter for the first time, read the article on it in the Wikipedia in a language you know well. Consider improving that article. Consider reading a little about its grammar and syntax. If you are not sure that you know what syntax is, read the article about it, and consider improving it, too.

Resolving the conflict[edit]

Here we get to the actual work.

Find yourself an article to work on[edit]

As of this writing, full lists of pages with static interwiki conflicts are ready only for the Wikipedias in Hebrew, in Russian and in Esperanto. You can start working on the Esperanto list right now (and you don't need to know Esperanto to use it):

Lists for other languages are being gradually prepared. Esperanto should be reasonably readable for almost anyone, but if you really want to work with the Wikipedia in your language as the starting point, drop a line to Amir E. Aharoni and ask him to prepare a list for you.

If Amir is lazy busy In Real Life, try contacting interwiki bot operators. These can be easily found in the Recent changes on any Wikipedia. They will gladly provide you a list of conflicts to work on.

Of course, you can also run into an interwiki conflict by sheer chance. In any case, good for you. Start working.

Open all the foreign articles to which a given Wikipedia is linked[edit]

"All" literally means "All". "Linked" means not just "directly linked" - the indirect links are the ones that are causing the conflict!

You can of course use a bot to do that for you, but bots have several disadvantages: they are hard to install for many people that aren't well-versed in programming and their output about interwiki conflicts is very confusing. This is the reality at the time of this writing and the bots' software is constantly improving, but the point is that even with the best bot is you'll have to read some of the foreign article anyway, even if the bot doesn't report that it causes a conflict! So forget the bot and use your browser.

To open all links, open an article in one Wikipedia and middle-click all the interwiki links in its sidebar. This will open directly linked articles in separate tabs. For articles with a lot of interwiki links, such as San Marino that may mean a lot of links and opening too much tabs may make your browser's memory usage go crazy and is not too convenient with all the scrolling back and forth. So be wise and don't open them all at once, but do it gradually - for example, open 10 tabs, close them, open the next 10 etc.

Opening a page adds it to your browser's history and that means that the link to it will be colored differently. In the default settings unvisited links are usually blue and visited links become purple. After you open all the directly linked articles, go over all of them and check whether any of them still has blue links. Middle-click on all the blue links that you encounter.

At this point you may notice that the blue links that you have opened are articles that you already saw earlier. There may be two reasons for this. One is that the link was to a redirection page - it eventually leads to the same article, but the redirection page is not listed in your browser's history, so it remains blue. Another reason may be that the link is not spelled exactly the same as it appears in the foreign Wikipedia, most often because its first letter is small and not capital. Yet again, the link leads to the same article, but your browser doesn't know it. Bots do not handle such things perfectly, so they should be corrected: interwiki links should be spelled precisely and consistently and they should not point to redirects.[3]

Repeat this operation until you have visited the whole "cloud" of interwiki-linked articles.

Sorting the pages[edit]

Every time you open an article, make short notes about it. For example, our starting point is an imaginary article called "Article name" in the English wiki. Your list may look like this:

* en:Article name - is about articles
* de:Artikelname - is about articles
* ru:Заголовок - is about titles in general
* fr:Nom - is about names in general
* ru:Артикль - is not about articles as pieces of text, but about the grammatical article
* en:Title - is both about titles as names and as aristocratic titles
* ru:Название - is about names of things
* he:שם - is about names in general
* de:Artikel - is about grammatical article

When you're finished with this list, sort it by topic:

about encyclopedic articles:
* en:Article name
* de:Artikelname

about titles in general:
* ru:Заголовок

about titles as names and as aristocratic designations:
* en:Title

about names in general:
* fr:Nom
* he:שם

about names of things:
* ru:Название

about the grammatical article:
* ru:Артикль
* de:Artikel

Now this starts to look like a list with which you can start an Interwiki synchronization discussion, but it still can be improved a little. For example - be bold and split [[en:Title]] to [[en:Title (name)]] and [[en:Title (aristocratic designation)]].

At this point you should sort the links in every group according to the language code. This actually contradicts the rules of some Wikipedias - for example the English Wikipedias sorts languages by their names and their codes, Hebrew and Hungarian Wikipedias put English first, etc. (See Interwiki sorting order for a full list.) You should nevertheless sort them this way, because it will help you avoid duplicates - you don't want to put more than one link to the same language in one article. When you actually make the correction, you should try - as much as you can - to sort the links according to the local rules; however it will be done by the bots anyway, so don't bother too hard.

Request for comment[edit]

Now, publish your list at Interwiki synchronization. Even if no significant discussion will follow, it is still useful as a task list for yourself, and as a central page to which you can link when you want to leave a useful edit summary in foreign Wikipedias.

If you are not sure that everyone will agree with your changes, consider writing a comment at least on some of the articles' talk pages saying that you plan to update the interwikis and linking to the Interwiki synchronization that you just created.

Start fixing![edit]

Have you made a sorted list of all the relevant pages? Do you think there is a consensus for it? Then the real fun begins.

This paragraph describes a manual process. If you are very nice, have some spare time and can program a bot that will do it automatically, go on and do it. Furthermore, this process will become easier when the Interlanguage extension will be enabled. (Soon, hopefully.)

Open the first article in the first group of articles in the list that you created. Take a deep breath and replace all the interlanguage links in it with the new links upon which you have decided. Make sure you removed the link to the same language - if you're in the English WP, you don't need an en: link in the article. Now save the page. The new links will appear at the sidebar. Go on and middle-click them and repeat the procedure. To explain the local editors what you're doing, link to the list you created in meta:Interwiki synchronization.

Aftermath[edit]

Consider checking the pages you fixed with an interwiki bot; if you cannot operate a bot yourself, ask a bot operator. You may be surprised at the results.

Check that your changes were not reverted soon after you made them. It happens that the people in the foreign Wikipedia misunderstand your good intentions. Suggested times: check all foreign pages once after an hour, and again after about a week. If you see that your changes were reverted and mixed up, explain the user that did it what happened. If he doesn't understand your language, well... we never told you that it is supposed to be so easy ;)

Language[edit]

Learn to read foreign alphabets[edit]

This doesn't mean "learn a lot of foreign languages" - we'll get to that in a minute. This just means that you should learn to read non-Latin alphabets. (You are reading this essay in English, so it is assumed that you can read the Latin alphabet; in fact, many languages that are written in the Latin alphabet are much easier to read than English.)

Knowing foreign alphabets won't make you able to actually understand foreign languages, but you'll be able to read many foreign words and names, which appear frequently in encyclopedias. This is quite helpful - often it is enough to understand what the article is about.

Knowing Cyrillic is very useful, as it opens for you the large Wikipedias in Russian (ru), Ukrainian (uk), Serbian (sr) and Bulgarian (bg). It is also used by the small, but growing, Wikipedias in Mongolian (mn) and in the languages of the former USSR - Belarusian (be-x-old, be), Chuvash (cv), Ossetian (os), Chechen (ce) and others.

If you master Cyrillic, then the Greek alphabet should be a breeze (and vice-versa). It is only used in the Greek Wikipedia (el), but studying it is easy and you'll be able to brag in front of your friends that you know the Greek alphabet and read street names when you finally stop editing Wikipedia and go to a quiet vacation in Greece or Cyprus instead.

Georgian (ka) and Armenian (hy) are significantly different from Latin and they are only used by these two languages. On the other hand, they are very simple and straightforward, so you may try them. (You can also use the above-mentioned Firefox extension "Transliterator" to read them.)

The Hebrew alphabet is a bit troublesome - it omits most vowels and is written right-to-left. Nevertheless, knowing it is very useful, as the Hebrew Wikipedia is in the top 40 by article count at the time of this writing and is quickly growing. (See below for tips about right-to-left writing.)

Once you master Hebrew, try Arabic - like Hebrew, it omits vowels and is right-to-left; it is also cursive, which means that most letters in a word are connected and many of them have different shapes depending on their position in the word. It sounds a bit tricky, and it really is a bit tricky, but once you get the hang of it, you'll be able to understand some key words in the big Wikipedias in Arabic (ar), Urdu (ur) and Persian (fa).

Japanese (ja) and Chinese (zh-*) are terribly hard. Their writing systems have thousands of characters, so mastering them takes years. The Wikipedias in these languages are very large, however, so you should at least learn a few tricks that may help you. First, learn to see the difference between Kanji and Katakana in Japanese. There are less than 100 Katakana characters and they have a rather simple shape and style which becomes recognizable after very little practice. They are used for transcribing foreign words and names, such as Mozart, Computer and McDonalds, and this piece of knowledge may often help you guess what the article is about.

Chinese doesn't have anything like Katakana. Chinese is all very hard. Sorry. But there's always machine translation (see below).

At first look Korean (ko) appears as hard as Chinese and Japanese, but actually it is very simple. It is not an ideographic writing system with thousands of charactes, but almost an alphabet. Read the article Hangul for more on that. After a few hours' practice you'll be able to read foreign names in it very easily.

The Cherokee Wikipedia (chr) is still small, but you may encounter articles in it on your interwiki fixing trips. Some letters of the Cherokee syllabary appear similar to Latin letters, but actually they have nothing to do with it. To decipher this script, you may use the Transliterator extension for Firefox (see above).

The author of this essay doesn't know much about Indic scripts, such as Devanagari, Tamil, Thai or Tibetan, but if you do, consider replacing this paragraph with a few encouraging words, telling how easy it is to learn them.[4]

Learn a lot of languages[edit]

Well, this is hard.

But what may help you is any knowledge in one language in every group. For example, if you know one Slavic language, then you'll be able to read any other Slavic language and understand what the article is about.

Here's an ultra-simplistic cheat sheet for language groups relevant for Wikipedia:

  • North Germanic, a.k.a Scandinavian languages - Danish (da), Swedish (sv), Norwegian (no, nn). Icelandic (is) is also related, but quite different, although it is similar to Faroese (fo).
  • German (de) is close to its many regional varieties - bar, nds, als, pdc and others. Dutch (nl) and Afrikaans (af) are also rather similar to them. If you know German and bothered to learn the Hebrew alphabet, you'll be able to read Yiddish (yi) easily. (The Yiddish spelling in Hebrew characters is much easier that the spelling of Hebrew itself.)
  • Slavic languages are all very close. Learning one of them is probably the biggest favor you can do to yourself as an interwiki links fixer. In written form the Slavic language are mostly mutually intelligible. This means that learning Russian (ru), for example, will not only make you able to read "Crime and Punishment" and "War and Peace" - a good thing by itself - but will also help you understand Polish (pl), Ukrainian (uk), Serbian (sr), Bulgarian (bg), Czech (cs), Slovak (sk), Slovene (sl), Croat (hr), Bosnian (bs), Macedonian (mk), Belarusian (be, be-x-old) and a few other smaller Wikipedias. This will also help you understand the Baltic languages a little (see below).[5]
  • Lithuanian (lt), Latvian (lv) and Samogitian (bat-smg) are all Baltic languages and learning one will give you basic understanding of the other two. Also note that there are many similarities between Baltic and Slavic languages.
  • Romance languages are all quite similar. The major ones are French (fr), Spanish (es), Italian (it), Portuguese (pt), Catalan (ca), Galician (gl) and Romanian (ro). There are also the smaller Wikipedias in regional Romance languages, among them Occitan (oc), Aragonese (an), Sicilian (scn), Lombard (lmo) and others. Latin (la) has similar vocabulary, but a very different grammar; nevertheless, understanding what the article is about shouldn't be hard if you know any other Romance language. The popular artificial languages Esperanto (eo), Ido (io) and Interlingua (ia) use much the same vocabulary as the Romance languages and their grammar is very simple; learning one of them may also help you get started with the natural Romance languages.
  • Turkish (tr) is similar to other Turkic languages, such as Azeri (az), Uzbek (uz), Tatar (tt), Kazakh (kk), Turkmen (tk) and Uyghur (ug).
  • The two major Semitic languages, Hebrew (he) and Arabic (ar), are quite similar to each other. If you know one of them well, the other one won't seem completely foreign. Their alphabets are significantly different, but once you study the right-to-left tricks in one of them, it will work the same way in another. Besides, you'll be able to read the Bible or the Qur'an in their original tongue. The Maltese language (mt) is very close to Arabic and it is written in the Latin alphabet, which makes learning it even simpler; unfortunately the Wikipedia in this language is still developing.[6]

Basque (eu) is not similar to absolutely anything. Armenian (hy), Albanian (sq) and Greek (el) are Indo-European, like Slavic, Germanic and Romance, but it doesn't help you much. Finno-Ugric languages, such as Finnish (fi), Hungarian (hu), Estonian (et) or Moksha (mdf), are not similar to anything else and, with the exception of Finnish, Estonian and Võro, they are hardly similar to one another. Bummer.

And of course, African, Asian and Native American languages are not similar to anything European, but that's the fun of it.

Strangely enough, English is not very close to any other language, except Scots (sco), whose Wikipedia is unfortunately quite small. It is related to German, Dutch, Frisian (fy) and the Scandinavian languages, but even the most remote members of the Slavic family, such as Czech and Russian, are still closer that English and German.

Use machine translation[edit]

For languages you don't know, you may try using machine translation. Here's a list of popular free machine translation sites:

Feel free to add more sites that may be useful!

Machine translation is imperfect, but it may give you an idea about the content of the article.

Use the embassy[edit]

Most Wikipedias, even the smallest ones, have an embassy, where you can try to get help about their respective language, local editing policy, etc. There are links to all embassies from the English Wikipedia Embassy.

A tip: when you find a list of users who speak your language, check their contributions. Sometimes well-meaning Wikipedians write their name in the embassy, but later retire from the project, but remain listed as ambassadors. The one who made the latest contributions is more likely to reply to you promptly.

Take a look at categories[edit]

If machine translation doesn't help you much, consider taking a look at the categories into which the foreign article is sorted. Just click on all the categories and look at the interwiki links of the categories. It may give you very useful clues! If the categories don't have interwiki links, try looking up their names in a dictionary or a machine translation site.

Take a look at the existing interwikis[edit]

It's a bit of a circular argument, but the existing interwiki links on a foreign may hint at the article's content. Since it's those links that you are fixing, don't trust this blindly, but try it when all else fails.

See also[edit]

References[edit]

  1. Disclaimer: At the time of this writing Mozilla Firefox is not so good at handling the Mongolian script, which is a bummer, but it shouldn't bother you much if you're just fixing interwiki links.
  2. If you are using a Mac, then the-combination-of-some-key-and-your-only-mouse-button-that-opens-a-link-in-a-new-tab is your friend.
  3. That's the current technology. There have been a few minor discussions on making interwiki links and bots behave smarter with redirects and misspelled links, but these are just thoughts. In the meantime, there should not be interwiki links to redirects.
  4. Yeah, right.
  5. Disclaimer: The mother tongue of the writer of these lines is Russian, but don't let that discourage you.
  6. Disclaimer: The writer of these lines speaks fluent Hebrew and can read basic Arabic, but don't let that discourage you.