Wiki resources in semantic technologies: Tunisian experience
Mohamed Ben Aouicha and Mohamed Ali Hadj Taieb
Faculty of Sciences of Sfax, University of Sfax
The Semantic Relatedness (SR) consists of quantifying any type of relationship between two concepts or two words. In this context, the quantification of this distance is based on the semantic information that can be extracted from huge corpora or structured resources such as the knowledge bases. These corpora can be used to extract the concepts or words that are co-occurring. In fact, these can express a certain relationship between them. This type of information will be very useful to determinate the SR between concepts.

Wikipedia is exploited frequently as a resource for extracting the co-occurrence between the different words. In our proposed approach, we are interested in filtering only words having the same part of speech as nouns, verbs, adverbs and adjectives from this encyclopedia. Our approaches are also enriched through the use of Wiktionary to determine words that are in these forms.

The process considers the filtering of the articles from Wikipedia and, then, to design and develop an application to render available a set of services offering statistics on co-occurring words. The first part provides a preliminary study including the presentation of the two exploited resources Wikipedia and Wiktionary. The second part is dedicated to the project design, which consists in presenting a collection of functional and technical needs towards the developed system.

Attendees will learn that Wikipedia and sister projects are not just useful for scientific information seekers. In fact, it can be useful to promote natural language processing of languages supported by WMF wikis.

  • Wikimedia Research
  • Technology, Interface & Infrastructure
30 minutes
