Grants talk:Project/Automatic Extraction of Multi-lingual Text and Concept Similarity
Comments of Glrx
I would decline this proposal. Is this a research project or a software development project? In either case, the proposal does not provide concrete examples of what it will attempt to do. It sounds like it wants to do research, but then it talks about benefiting WP users by taking one article and comparing it to another; failing that, it would find similar articles. The proposal does not follow the given advice about describing contemplated tests and measurements.
What is the similarity metric? What is the training corpus? What are the parameters? See en:Word2vec.
The implication is this project will find similar topics when person A writes one article in en and person B writes a similar article in ru. I need some explanation about how that matching will work and how accurate the matching will be. In practice, person A might write about jellyfish, and person B might write about медуза; in that case, finding similar articles may not be hard; word matching in titles may suffice. I want to see some rationale that argues there are many cases when more sophisticated techniques are needed to find matching articles.
There may be other techniques beyond searching for similar words. If two articles reference the same DOI, then they may cover similar topics.
Furthermore, the goal does not seem like a problem that WP needs solved right now. Say a WP user uses such a tool. What does she do next? If two pages are similar, is the intent to make an interlanguage link? In that case, the user should probably have skills in both languages, and that means she may be able to judge or find the similar pages herself. If there is no matching page, then the user would need multilingual skills to translate the page.
The project does not have a well defined scope.
This is a similarity measure between texts based on a Markov model. It is pretty straight forward to do, the tricky part is aligning the language models. I would say split the project in two, where one part focus solely on generalizing building of aligned language models. The whole thing is a lot more implementation than research. One year on this seems little to me, unless some of the work is already done.
The most interesting use of this is to be able to detect articles diverging in content on different languages, and this is a real problem that needs a proper solution. — Jeblad 13:57, 1 April 2017 (UTC)
Eligibility confirmed, round 1 2017
This Project Grants proposal is under review!
We've confirmed your proposal is eligible for round 1 2017 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through the end of 4 April 2017.
The committee's formal review for round 1 2017 begins on 5 April 2017, and grants will be announced 19 May. See the schedule for more details.
You write "C. Detection of erroneous links between concepts in different languages." We don't link concepts in different languages against each other but linking every concept to the right Wikidata item. The fact that this grant proposal doesn't say the word Wikidata at least a single time suggests to me that the writers of the proposal haven't thought enough about how their proposal interacts with the existing architecture. ChristianKl (talk) 14:54, 29 May 2017 (UTC)
Round 1 2017 decision
This project has not been selected for a Project Grant at this time.
We love that you took the chance to creatively improve the Wikimedia movement. The committee has reviewed this proposal and not recommended it for funding. This was a very competitive round with many good ideas, not all of which could be funded in spite of many merits. We appreciate your participation, and we hope you'll continue to stay engaged in the Wikimedia context.
Next steps: Applicants whose proposals are declined are welcome to consider resubmitting your application again in the future. You are welcome to request a consultation with staff to review any concerns with your proposal that contributed to a decline decision, and help you determine whether resubmission makes sense for your proposal.
Over the last year, the Wikimedia Foundation has been undergoing a community consultation process to launch a new grants strategy. Our proposed programs are posted on Meta here: Grants Strategy Relaunch 2020-2021. If you have suggestions about how we can improve our programs in the future, you can find information about how to give feedback here: Get involved. We are also currently seeking candidates to serve on regional grants committees and we'd appreciate it if you could help us spread the word to strong candidates--you can find out more here. We will launch our new programs in July 2021. If you are interested in submitting future proposals for funding, stay tuned to learn more about our future programs.
Aggregated feedback from the committee for Automatic Extraction of Multi-lingual Text and Concept Similarity
|(A) Impact potential
|(B) Community engagement
|(C) Ability to execute
|(D) Measures of success
|Additional comments from the Committee: