Grants:Project/Rapid/CAWI (Concept sampling & assessment of Wikidata and Wiktionary)

From Meta, a Wikimedia project coordination wiki
statusNot Funded
Marco Ciaramella/CAWI
Assessment of Wiktionary and Wikidata in relation to a sampling of concepts (entities) extracted from ST (scientific and technical) sources.
targetWikidata, Wiktionary
start dateApril 2
end dateMay 2
budget (local currency)1.700 Eur
budget (USD)1,928.82 USD
grant typeindividual
granteeMarco Ciaramella

Project Goal[edit]

Assessment of high-quality dictionaries

Project Plan[edit]


Tell us how you'll carry out your project. What will you and other organizers spend your time doing?

I will work in an individual research activity. The planned elapsed time for the project is 4 weeks, half time (I would be on partial leave from work and I will not paid by my company for this leave). I will use samples of Scientific and Technical (ST) terms extracted from the literature (open documents) and I will analyze the coverage and quality of these sample terms in Wiktionary and in Wikidata. This analysis has the objective of identify differences between Wiktionary and Wikidata. These differences in the future can be used also to implement an automatic procedure to mutually enrich Wikidata and Wiktionary terms, taking also into account Wikipedia: this evolution in any case will be proposed only after this evaluation quick project.

How will you let others in your community know about your project (please provide links to where relevant communities have been notified of your proposal, and to any other relevant community discussions)? Why are you targeting a specific audience?

I will keep constant relations with the research community by the Wiki-research-l mailing list: There are two kind of audiences: a) the research community around Wikimedia, mainly in Natural Language technologies; b) Wikimedia users for Scientific and Technical terms.

What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

The deliverable will be an open publishing paper (also peer-reviewed) to present the results of this project, discussing i.e. the differences for the coverage and quality of terms between Wikidata and Wiktionary for sampled terms in Scientific and Technical literature, which are both relevant for some users and challenging from scientific point of view (large presence of neologisms). The value-added for Wikimedia projects will be a baseline for future evolutions for Wiktionary/Wikidata in Science and Technical area. Such paper will be indexed from Google Scholar and shared and discussed Wiki-research-l mailing list also after the end of the project.


How will you know if the project is successful and you've met your goals? Please include the following targets and feel free to add more specific to your project:

The final report published as a paper will document results for a significant sample of ST terms (at least 1K terms for different ST domains).


What resources do you have? Include information on who is the organizing the project, what they will do, and if you will receive support from anywhere else (in-kind donations or additional funding).

I will work leveraging on my experience on natural language processing engineer (see scholar: and Wikimedia contributor since 2005.

What resources do you need? For your funding request, list bullet points for each expense:

  • The planned elapsed time for the project is 4 weeks, I will work for this project half time. In the project time I will on leave from work and I will not paid by my company for this leave. Hence, I ask a partial reimbursement (1,928.82 USD) for this activity (a total reimbursement in fact would be 2500 USD).