Talk:General Multilingual Environmental Thesaurus

From Meta, a Wikimedia project coordination wiki

I received some more info:


Dear Gerard,

I am happy for this offer and like to share some issues with colleagues of mine involved. I have actually promoted the idea of including GEMET into "the Wicki world" in an international meeting on environmental terminologies in April in Geneva http://ecoinfo.eionet.eu.int/ (see ECOTERM April workshop) and got very positive feedback from the colleagues.

I paste this message to the follwing colleagues active around Semantic Web, Ontologies, Thesaury, Topic Maps, XML, RDF, OASIS ... - Bernad Vatant at Mondeca www.mondeca.com - Thomas Bandholtz at http://www.bandholtz.info/index_en.html - Alistair Miles http://www.w3.org/2001/sw/Europe/reports/thes/ (You find here links to GEMET in SKOS-RDF - which I guess is the terminolgy format of the sematic web) - Miruna Badescu http://www.finsiel.ro/ is currently building a web service which allows machines to pick up GEMET content (1st version available later in September) - Soren Roug at EEA who has integrated some GEMET contenet into http://cr.eionet.eu.int/search_expert.jsp - a multipurpose RDF harvester

Please let us have an exchange on what to do next

kind regards


Stefan



Hi.

I am very interested at that prospective myself. We have also been contacted in june by the owner of a website http://www.planetecologie.org/Fr_default.html, who is willing to offer us content as well. Though the owner considers most of this content is encyclopedic, I think most of it is rather of dictionnary type and totally in line with the GEMET (see http://www.planetecologie.org/forumdd/Dicdevdur.html for more information). We have already been thinking of ways to integrate this content. I think we will put what is encyclopedic in wikipedia and other content in wiktionnary, after we ensure its neutrality.

Afaik, mondeca is one of the partner of planetecology.

Anthere 17:22, 1 Sep 2004 (UTC)

Parsing the English text file and the layout of an article- RFC[edit]

When the html page is saved as a text file, it is relatively easy to parse things.

The string

*Selected language: English*

defines the language. It is followed by an

*Index* 

and then the keywords follow. Each keyword is on a sperate line and finished by its identifier

access to the courts <#desc13285>

The index is finished by a line:

------------------------------------------------------------------------

Then the definitions follow. The start with the keyword begins with two asterisks and finished with one.

* * abandoned industrial site*

The name of the keyword is followed by the translations: a three-letter indicator indicates the language. Of notice that US-English is considered a separate language.

The definition starts with

*/Definition:/* 

and finishes with a source indicator eg: (Source: LBC / WRIGHT)

It is followed by

*/Broader Terms:/* keywords separated by their identifier and seperated by a comma

These are followed by

*/Narrower Terms:/* witch follow the same pattern.

The textfile for English is currently 4,62 MB.

These strings should be put together so that a bot can create a page. The strategy for me will be:

  1. The language will be indicated by {{-xx-}} where xx will be replaced by the aplicable language ISO 639 code.
  2. The keyword will be like #keyword; followed by the definition.
  3. The translations will start with {{-trans-}}
  4. The three character language identifiers will be replaced by {{xx}} where xx will be replaced by the aplicable language ISO 639 code.
  5. A translated string will be like: :*{{-xx-}}: [[translated string]]
  6. Two categories will be added; one to indicate GEMET and the other to indicate the source of the defenition.
  7. The broader and narrower terms will each have their seperate part.

Comments on Parsing the English text file and the layout of an article- RFC[edit]

Some more discussion pieces on WIKI/GEMET (Sep2 2004)[edit]

Good morning Sabine and everybody, Great to see more people joining into the communication. I am not a frequent user of the Wicki discussion tools which Gerhard proposes to use for this but I'll try to move things over to there. (And I see Bernard is already using it). Some points: a) I understand Gerard wants to start with the Dutch (NL) language part of it. Please make sure you do not loose the link to the (English) definitions. In fact, maybe English - as the nucleus - shall go in first b) If my technical knowhow is carrying me far enough, I would consider to use XML/RDF as technical transfer media and SKOS as the "description model". I think this "streamlining" could be beneficial since the community does not have to handle various technical solutions in parallel. But this might not work with the technology in Wiki - other folks have to decide on that c) Sabine is supporting the open content approach and correctly refers to the reluctancy of other initiative to go along the same way. Why I opt for open content is (besides the fact I like the philosophy in general a lot and we folks in the environment domain are well advised to not sit on our stuff (data, terminologies etc.) if we want to make some impact) is the maintenance issue: While we have an overall good quality standard in GEMET, there are problems here and there - sharing through Wiki would allow for others (users) to comment and edit the stuff. Also they can add and link to other terminolgies (glossaries, thesaury ... whatsoever). I am aware that this can lead to chaos but my experience with Wictionary is that it (sometimes surpriseing for me) provides quality content without getting into this trouble d) Mabe it is you neccessary somebody (Gerard?) starts with something (NL/EN), picking up the content from the GEMET area http://www.eionet.eu.int/gemet in the SKOS format (see also SWAD-Europe for reference) e) By the way there are goodies availbale like definitions in Russian and Bulgarian ... there is also an initiative under way to get the stuff in Chinese ...

>>> I now paste thsi mail into the META discussion area and hoep this is a beneficial move ...

a sunny day from Copenhagen stefan




Original Message-----

From: Sabine Cretella [1] Sent: 01 September 2004 19:26 To: wiktionary-l@Wikipedia.org Cc: wikitech-l@Wikipedia.org; Stefan Jensen; Frankee; ALBatro; urwo@hispeed.ch; glossarplus@yahoogroups.de; Daniele Maso; transref@yahoogroups.com Subject: Re: [Wiktionary-l] GEMET and Wikipedia


Hi, I'd like to answer and at the same time to forward this message to several people involved with "glossary creation and maintenance" as well as interested user groups.

As some of you already know I'll have a similar task with uploading "colours" in as many languages as possible to the Italian wiktionary (list is still under completion) - at the same time I am trying to create other thematic lists to be translated by colleagues and then uploaded to wiktionary and used in the wsi-glossary project as well as the Embedded DICTionary PROJect.

Now I know the GEMET as a very valuable source for translators (I myself use it quite often) - so its release in the OpenContent would be a wonderful achievement and not only for Wiktionary. So I would kindly invite the people of other lists and direct contacts to join this project and combine forces.

The meta page Gerard is talking about can be found here: http://meta.wikimedia.org/wiki/GEneral_Multilingual_Environmental_Thesaurus

If someone of you needs further explanation on how to add comments to the link mentioned above and let us know what he/she can do, please just ask. Maintain such a huge glossary in so many languages is a lot of work and every helping hand (don't forget that there are many simple jobs as well) is needed.

I know, not many of us are convinced about OpenSource and OpenContent, but most of us use it every day sometimes without knowing about how much work is behind.

So again: I invite you to join the project and to make other people aware of it - maybe writing and forwarding this message in other languages as well.

And to Gerard: instead of breaking your teeth, could you please tell us how we can help you? ;-)

Have a great evening!

Ciao, Sabine

Sabine Cretella s.cretella@wordsandmore.it www.wordsandmore.it Meetingplace for translators www.wesolveitnet.com


Gerard Meijssen wrote:

> Gemet is the "GEneral Multilingual Environmental Thesaurus". This > thesaurus is maintained by the European Environment Agency. It > contains a 5200 + glossary with translations of the words to 20+ > languages and descriptions in a few. The EEA wants to have this > information in a wiki format, the data is open content. > > Mr Stefan Jensen, the project manager did sent me a mail as I am > preparing to upload a botanical glossary into wiktionary (I said so on > the wiktionary list). As the GEMET data is already on-line on the > internet, and much better structured than the data that I have, I > would first break my teeth on this one and then progress to my own > glossary. > > However, I received a mail which shows how much he would appreciate > cooperation; all kinds of people who have expertise in area's like > Semantic Web, Ontologies, Thesaury, Topic Maps, XML, RDF, OASIS ... > have had my mail forwarded. I have created an article on Meta > GEneral Multilingual Environmental Thesaurus. I copied the mail to > the talk page. > > My plans are simple I want to upload this stuff into nl:Wiktionary. I > also plan to tag them with a Categorie:GEMET. When I have been > succesfull, I will also be able to upload it to other wiktionaries. > When somebody beats me to it, I will only be pleased. > > As I have seen many a time on Wikitech stuff about XML etc, it might > be a good idea to synchronise what GEMET does and what we do. So > please discuss this preferably on Meta, or on the lists. I have no > good idea about how difficult this may prove to be.. Again, > information on META. > > Thanks, > GerardM > > _______________________________________________ > Wiktionary-l mailing list > Wiktionary-l@Wikipedia.org > http://mail.wikipedia.org/mailman/listinfo/wiktionary-l >