Talk:Sharing Taxonomic information in a mulitlingual environment

From Meta, a Wikimedia project coordination wiki
Jump to: navigation, search
----> Please leave a new message <-----

German standards differ..

Sorry, but this is a very bad idea that will not work for the german WP. We have another standard of taxobox layout (e.g. we use the new table syntax), we don't use latin but german names for most of our articles (even for the asian elephant), so the taxobox must be translated anyway, and we probably group certain taxons together. It would be very nice to discuss things like that beforehand and not just creating facts. Please don't assume that everything that holds true for the english WP holds true here also. Uli

This idea is maybe not good for the German Wikipedia. But there are many Wikipedia's so some can be interested. This is a demonstration of the system not and attempt of enforcement of a new way of doing things. GerardM, I think you better send this to mailing list so that the other Wikipedia's know about this and can join this system if the like it. Walter 19:42, 23 Mar 2004 (UTC)
I can make you this box with this technique. The only problem I have is that at this moment I have no equivalent for "Reihe". (check out my German user talk page.. There you will find the parts that I can translate. I can take this to the nl: and the en: wikipedia and it will translate true.
So, as long as the rules of taxonomy are the same in each country, (they are), I CAN translate for your environment. This discussion is to come to something that makes life easier for all, particularly the smaller wikipedia's. GerardM 19:55, 23 Mar 2004 (UTC)
The problem is not the name of the level (species, family), but the name of the referred taxon (eg. The class in de:Kaulbarsch would have to reference de:Knochenfische, not de:Osteichthyes. If one has to do this sort of translation, one can also do the taxobox from scratch (or better by copying from a closely related species and just changing the species name... much faster) Uli
The Osteichthyes are the Knochenfische so the information is at least factually correct. As you do, in Holland we prefer to have the Dutch name in the taxobox as well. But being able to start with correct latin information is already a boon. I have translated many English taxoboxes because there WERE no related organisms. It it for those instances that a straight copy across would have been nice.
The benefit of this all will be particularly true for the smaller wikipedia's. With standardised taxonomic info you can still nationalise the information. The thing is to have both Latin and German (or any other language). I hope you can agree with me that it does help and that it is easy to implement. GerardM 21:09, 23 Mar 2004 (UTC)


Hi Gerard, the idea sounds interesting but I am still trying to understand how these things will work. As a precondition, there need to be a redirect from the Latin names to the vernacular names, right? Then Uli is right insofar as we use to have different standards. In en: the taxoboxes just include family, order, class and phylum, while we decided to have tribes, subfamilies, superfamilies etc in the taxoboxes as well (and in order to keep the taxobox small, we have only five to six taxa per box). Of course, these conventions are not carved in granite and may be changed, but it would need some discussion. In the current state German and English taxoboxes are so different that it is the easiest way to create them from scratch.

I had a look at your Elefanten example. There you basically replaced taxonomic level designations with MediaWiki elements. But I think that your idea goes further than that, doesn't it? -- Baldhur 07:28, 24 Mar 2004 (UTC)

for the french project arbre de la vie (tree of life) i try to respect one standard which is in taxoboxes :

  • first Vernacular name of the taxon
  • upper taxonomic levels in latin
  • Taxon in latin
  • lower taxonomic levels in latin with (vernacular name)

generaly the article title is the vernacular name so i need [[Vernacular name|latin name]] in the taxoboxe for upper level. jeffdelonge

The idea sounds good to me and if there is a way to create a latin:german (and other languages) database to get the varnasculare names, there will be not problem for classic systematics. I don´t think, it will work for cladistics too, so there maybe no way or a harder way to change our systematics some day (I hope we´ll do). I think we can try out this project in the german wikipedia, if it will not work well, we can correct the failures. I don´t see any problems in getting only the full levels without sub- or superlevels (including Reihe). How about trying it at Ordos of birds, there we normally have no taxoboxes yet. Greetings, de:Benutzer:Necrophorus

The scope

  • I started this out of frustration; having to translate these taxoboxes time and again.
  • Another thing are the boxes, they are all a bit different and having one standard box would help.
  • The messages are maintained like this on the nl: wikipedia. This can be copied to an appropriate place in any wikipedia; it just needs filling with the correct links.
  • The labels like [[genus (biology)]] can be translated by a robot into {{msg:genus}}.

When you start with taxoboxes for the "Picidae" (in fr: Picidé), you could start of with an all latin box. When you have decided that [[Picidae]] needs to be "[[Picidé]] - (Picidae)" you could have a robot that would check for this and change it. Given the text oriented way of wiki, this is what is achievable.

Another thing that would help is to have redirects from the latin name to the vernacular name. The benefit is that when a taxobox is imported and the higher taxons exist, the links are working from the beginning. I expect this too can be done by robot; if "Picidae" does not exist, it should be possible to creare a redirect for "Picidé".

The idea of having a database with all these links could be in the format of

de: # [[Picidae]] # Specht # [[Specht]] - (Picidae)
en: # [[Picidae]] # Woodpecker # [[Woodpecker]] - (Picidae)
fr: # [[Picidae]] # Picidé # [[Picidé]] - (Picidae)
nl: # [[Picidae]] # Specht # [[Specht]] - (Picidae)

This could also be used by a "Taxobot"; find (2) in wikipedia (1), replace by (4).. And when (2) does not exist in wikipedia (1) create redirect to (4)..

I think this can be done however, it does require organisation, consensus and a lot of work. GerardM 14:11, 24 Mar 2004 (UTC)

How to fill a new wikipedia with taxonomic data

When a wikipedia is interested in imporing relatively raw data, This can be done by adding the taxoboxes with pictures to the latin name as the page name. When the users are ready to add info in their language they can rename the name to the vernacular name or alternatively they make the vernacular a redirect to the latin name.

When the taxobox has in it a begin and end tag, it should be relatively straight forward to export the box. Even pictures can be included. Always one of the more time consuming tasks.

When the taxolist conform to a standard like


  • Order: Name - Vernacular name
    • Family: Name - Vernacular name2

The data can be stripped of the vernaculars, the *order: can be substituted for a local string.

The links to other languages can be entered at the same time; they are known in the exporting language.

The most easy thing to implement is the media wiki element instead of the taxonomic level. I will suggest on de: to change our taxobox design now in order to implement this now. This way a very young Wikipedia could adopt our taxoboxes, if they want to have our design, and they just have to replace the vernacular name.
As for creating large databases containing the names in different languages, I do not think that it is worth the work. Replacing the vernacular names manually is not very time-consuming and can be done in some seconds. -- Baldhur 11:02, 27 Mar 2004 (UTC)
When you translate a taxobox, and it says "Animalia", you may find this for ALL animals. So it amounts to a LOT of time. When "[[Animalia]]" should become: "Animalia - [[Dieren]]" and a robot finds an occurence of "[[Animalia]]" it takes no human effort at all. The only thing needed is a one time effort by filling a three field table with "nl" "[[Animalia]]" and "Animalia - [[Dieren]]" and the robot can do its job at runtime for the nl: wikipedia. Translating to the vernacular has its place. GerardM 18:08, 28 Mar 2004 (UTC)

Usage of the msg technique

Just out of curiosity..

Counterproposal: translation bot

I don't like the msg technique too much because it makes the wiki sourcecode harder to read and moves it even further away from WYSIWYG.

I'm one of the developers of pywikipedia bot which is a series of python scripts users can run locally on their PC. Pywikipedia bot is already being used for multiple purposes on many Wikipedias, e.g. for adding interwiki links, converting tables to wiki syntax, solving disambiguations etc. See Interwiki bot for details.

Today, I started working on a bot that copies and translates tables from one Wikipedia to another. It works this way:

  1. Someone starts script with e.g. Kanisedoj -lang:eo -from:de. This means that the bot should copy a table for the Esperanto article eo:Kanisedoj from the German Wikipedia and add it there.
  2. The bot opens eo:Kanisedoj and finds out that there is in fact an interwiki link to the German Wikipedia: de:Hunde.
  3. It opens de:Hunde and searches the first table inside the article. (The taxobox is usually the first table inside an article.) Of course it also supports cascaded tables (tables inside tables).
  4. It copies this table and replaces [[Familie (Biologie)|Familie]] with [[Familio]] etc. (translate all the stuff that's similar in all these tables).
  5. The translated table is added at the top of the Esperanto article.
  6. The user manually translates the words that are still in German, as he would do in your msg proposal.

All these steps are already working, except for step 5 which I will now work on.

The bot isn't too hard to install, all you need is the freely available Python programming language interpreter and the open source bot which will be available on sourceforge. --Head 12:52, 29 Mar 2004 (UTC)

UPDATE: step 5 is now working, but the translation database needs to be filled up. Until now, it only knows some words in English, German and Dutch. --Head 16:27, 29 Mar 2004 (UTC)

Difference in vision

I know that scripts like the bot proposed by Head exist. The scenario I envision is:
  • xx:wikipedia is the source yy:wikipedia is the target.
  • xx: has x0.000 taxoboxes. yy: has none All taxoboxes including pictures are copied with the latin name as the article name to the yy: wikipedia
  • yy: editor want to create text on "Homo sapiens" in his language. Renames "Homo sapiens" to local lemma.
  • on yy: the wikis to all languages known on xx: are added.
  • on xx: the wiki for the yy: lemma is added
  • when the xx: has a taxolist, the list is stripped from its vernacular content, the latin names are wikified, and included on the yy: article.
  • when the article is renamed, the existing references will be renamed by a bot that checks on new refers. and changes the wikis for yy: on all other language wikipedias.

The aim of my proposal is higher that what is currently on offer.

Thanks, GerardM 13:16, 29 Mar 2004 (UTC)