IFAP proposal for the buildup of a many to many dictionary
With specific attention for languages that do not have much presence on the Internet
The Wikimedia Foundation, a non-profit organization based in Florida USA, is known for its multilingual Internet encyclopedia, Wikipedia. The English language version has almost half a million articles and is a popular and highly used reference work. Wikipedia is a resource that can be used by anyone and, anyone can contribute to its information. Wikipedia has proven to be a valuable resource. One noteworthy aspect is the attention that the World Heritage Program gets; it is covered in 23 languages and many heritage sites get attention in these languages.
Wiktionary is a project by the Wikimedia foundation to create a free many to many dictionary. The present incarnation is a system where it is actually a one to many dictionary. There are plans for a software update to make it a many to many dictionary; this will require dedicated programming effort.
Wiktionary was started in 2001, but 2004 saw the addition of many new languages to the project. Over the last year, there has been a steady growth of these new projects. The more interesting aspect is the interest of professional translators for these projects. We have received many lists with words that can be imported into these projects giving them topical interest. There is also some cooperation with the European Community; we have received permission to host the Ecological and the Medical thesaurus.
As the Wiktionary project aims to make it a many to many dictionary, it is the stated aim to have words in all languages. This would mean that the words of a language would be available to any user who has an interest in a language. The Dutch Wiktionary, for instance, boasts that it has words in 253 languages; it has 9750 words and more than 1400 pronunciations of Dutch words that can be listened to by students of the language
The content of the Wiktionary project, like the Wikipedia project, is created by volunteers. This means that well known languages do not need any funding. It is the less well known languages, the languages that have no presence on the Internet, that would be the prime target for this grant.
Many languages do not have modern tools that are taken for granted in the Western World; they do not have a computer system localized for their language, they do not have a word processor with a spell check for their language. This project aims to describe words to fit into the Wiktionary project and to record the pronunciation of these words and make them available on the Internet. A secondary goal for this project will be to create a file that can be used to generate a spellchecker for that language. As the Open Office project aims to localize their software for many languages, we would make certain that the data we provide will be available to this project. The languages that being localized, would be a prime target for our project.
What does Wiktionary hope to accomplish?
- By having a many to many dictionary, ultimately all words and their meaning are linked to either an equivalent or a description in another language. This will result in much better communication between the speakers of languages.
- By including glossaries and thesauri in the Wiktionary, we ensure that there will be less misunderstanding between people that speak a foreign language. Given that even for translation between European languages often an intermediary language is used, this means that the problems translating to less common languages are a much bigger problem. Resouces for languages like Papiamento are hard to find. This does have an economic impact as it increases the cost of translating to these languages.
What would the funding of this project accomplish?
- Creating the software necessary to run a many to many Wiktionary
- Paying people to record the pronunciation of the words and write down the meaning of the words.
- All words need to have either an equivalent in at least two other languages or a description in two other languages. One language would be English, Spanish, French, Russian or Chinese the second would a language known in the area of the language being worked on. The benefit would be that it would help integrate two local languages into wiktionary and connect them to one of the dominant languages.
How would the money be spent?
- The programmers that write for the Wikimedia foundation are either volunteers or they accept a less than market rate for the work that needs doing. For developing the software needed to get an "ultimate Wiktionary" we need for the base functionality EUR 5000, -.
- We would try to make use of the same people that the Open Office project uses, these are also volunteers but they need money to be able to spend their time. The wages these people get paid are reasonable.
- For many language we have fledgling Wikipedia projects, using the knowledge of the people in these projects we should be able to get into contact with people who could be interested to create content for our projects. The people contacted in this way would be paid local wages for the work that is done.
- As the Wikimedia organization is really open and has a flat organizational structure. Many would scrutinize all financial dealings. Many people will have ideas on how to achieve things at the lowest price.
What words would we ask to have included?
- When cooperating with Open Office we would work on the words that are in their spell check list.
- We have a basic list of words in English. All words that are relevant in the target language should be included. Cultural words like "football" may not be relevant.
- We would ask them to include words that are in a newspaper a book that are recognized as being a good example of that language.
- If possible we would try to get people to find the words for the thesauri that we have. It would be really beneficial to add to the medical and environmental thesauri that have been compiled by the EU.
Success / Performance indicators
- Realisation of the software that allows for a many to many dictionary
- Conversion of wiktionary projects to this new database.
- A large increase in the pronunciations recorded in our Commons repository.
- A large increase in the number of words in lanugages that are not specifically targetted under the program
- Effective cooperation with the Open Office project.
Monitoring and evaluation
As the project enables the evolution of the software and allows for payed for contributions, the two parts can be seperately evaluated.
- The software has distinct objectives which will enable us to see how well we are doing.
- The cooperation with Open Office can also easily be monitored; when we achieve it, we will propably also achieve a larger amount of words recorded and noted in the dictionary. It may however also mean that some money is used to pay for the localisation of the Open Office software. This would not be bad, as long as it is quantified, as support of the Open Office localitiation is one aim of this project.
License of the Wiktionary
Wiktionary is published using the GFDL license. When the Open Office license is incompatible with the GFDL, the work done in cooperation with the Open Office organization will be dual licensed.