Research talk:Newsletter/2015/September

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

knoWitiary - a brief description[edit]

knoWitiary is a resource that provides a reorganized version of Wiktionary's information in machine readable format, covering word senses and their definitions, etymological information, translations, word relations (synonymy, antonymy, derived from, related, anagrams, etc.). It can be downloaded here. Compared to what would be obtained using JWKTL, this resource also contains etymological information obtained from the free-form etymological paragraphs. It was obtained using the English Wiktionary dump, but like Wiktionary, it contains entries in numerous languages. 29 of the languages represented have more than 10,000 entries each. An analysis of the vocabulary and word relation coverage relative to existing and frequently used resources in Natural Language Processing was performed for English and Italian entries. —The preceding unsigned comment was added by 93.65.94.27 (talk)

Thanks, I corrected. Nemo 06:53, 9 October 2015 (UTC)

About the DBnary dataset.[edit]

Thanks for you interest in our work. Some of the critics (especially quality assessment) are answered in our paper in Semantic Web Journal:

http://www.semantic-web-journal.net/content/dbnary-wiktionary-lemon-based-multilingual-lexical-resource-rdf

Note that we do not use interwiki links in the extractor as it seems to be implied in the review, but only rely on the lexical data that is expressed in the entry.

Our extractor does not adapt itself to changes in the macrostructure, but we have several tools that allow for the tracking of discrepancies between the entry structure and the extractor. See for example the extraction history of french data at: http://kaiko.getalp.org/about-dbnary/dataset/statistics-on-french-extraction/ You'll see that the extractor performance dropped in january 2014 and we then were aware of a change in the structure and adapted our extractor to the new structure.

Finally, there are several usages of the extracted data and it may be used as is, provided that you know how to handle RDF data. —The preceding unsigned comment was added by Dodecaplex (talk) 12:06, 10 October 2015‎

Accuracy[edit]

Regarding this post-publication edit: Nope, accuracy is not "defined as (1-error)". (Or would 147.105.3.100 also talk about "102% accuracy"?) Usage varies and perhaps "precision" would have been a more precise term instead, but it's quite OK to paraphrase this paper's statement as 2% accuracy. It's not about a binary classification task where 98% accuracy = guessing 98% right, but about estimating a value.

Regards, Tbayer (WMF) (talk) 07:27, 19 March 2017 (UTC)