Grants talk:Project/Jeblad/Better support for imported data in wikitext

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

October 11 Proposal Deadline: Reminder to change status to 'proposed'[edit]

The deadline for Project Grant submissions this round is October 11th, 2016. To submit your proposal, you must (1) complete the proposal entirely, filling in all empty fields, and (2) change the status from "draft" to "proposed." As soon as you’re ready, you should begin to invite any communities affected by your project to provide feedback on your proposal talkpage. If you have any questions about finishing up or would like to brainstorm with us about your proposal, there are still two proposal help sessions before the deadlne in Google Hangouts:

Warm regards,
Alex Wang (WMF) (talk) 03:16, 6 October 2016 (UTC)

Comments of Ruslik0[edit]

I have a few comments:

  • I think you should spell out abbreviates (NLP) at the first mention.
  • As I understand you are intending to use only Stanford Core NLP? If this is true then you should state this more clearly in Activities subsection (instead of "a well-supported NLP engine").
  • Can you clarify at which stage of wikitext processing (magic word substitution, template transclusion/substitution etc) the NLP engine is supposed to run?
  • Is the chosen NLP engine capable of processing the wikitext? Will not it be confused by wikitext markup?
  • How will grammatically or otherwise incorrect sentences be processed?
  • Have similar projects been tried before? And what was the result?
  • You should notify relevant communities: mediawiki, enwiki and other large projects. I think it will make much more likely that the project is funded if other Wikimedia develpers provide supportive reviews.

Ruslik (talk) 17:10, 19 October 2016 (UTC)

Some replies below. — Jeblad 21:40, 27 October 2016 (UTC)
  • Done
  • Some example code using Stanford Core NLP made me realize that this problem was in fact solvable. It is although likely that other NLP engines can do the same thing. Chosing the optimum engine could be a project in itself, but I'll only try out one process chain to see if this is feasible.
  • Can be at several places in the pipeline, but most likely after all of them.
  • It will process text with some html-markup.
  • Material values will be replaced if sufficient context is found to be correct. Not replaced markup will generate a tracking category.
  • Similar projects are generated articles uploaded by bot. I have made some of them. They are horrible.
  • I have not done much advocacy of this, …

Comments of Glrx[edit]

I would decline this proposal. The proposal does not lay out the importance of the research effort. The examples suggest more corner-case issues than broad relevance. Populating a paragraph with dull facts about a lake doesn't seem that important. Furthermore, those facts do not change often, so there seems to be little concern about methods to achieve verb-subject agreement or writing complicated plural syntax. Worrying about whether an island will suddenly get one or two inhabitants does not seem to be a pressing issue and may not fit the encyclopedic goal of WP. For example, en:Nikumaroro is currently uninhabited, but in the 1950s the population was about 100. If its population suddenly went to 1 or 2 people, we'd want to know what those people were doing. During WWII, there was a small Coast Guard detachment manning a Loran station. The number of people manning that Loran station is not going to change.

Yes, pulling facts from Wikidata is an admirable thing, but how prevalent is the problem this proposal wants to solve? Yes, it is irksome to see a sentence that says, "Shangri La has 1 inhabitants," but readers understand what that means. How many articles have such bad grammar? When an editor stumbles across bad grammar, it's easy to fix. And maybe fixing that little goof will get the editor engaged in bigger efforts. Also, what about a simpler alternative of running a grammar checker on the HTML output?

The goals are also confused. There is no discussion about the problems that wiki markup presents above and beyond the text that an NLP engine wants. Is an NLP engine going to understand "the boat's top speed is {{convert|32.5|kn|km/h}}"? How much effort is required to parse such text? If such text is not parsed, does the perceived benefit disappear? If one is stating the area of a lake, should the area be given in both acres and hectares using the convert template?

Measuring performance is unclear. The primary and secondary goals are mentioned, but they are not discussed in any depth. There is nothing about the proposer's qualifications and no community notification.

The proposal needs clearer goals and more substance. Glrx (talk) 21:10, 23 October 2016 (UTC)

Thanks for your input, I will take it into consideration. — Jeblad 21:42, 27 October 2016 (UTC)

From proposal:

At its widest, near [[Hamar]], the lake is {wide} wide. It is {{area} in area and its volume is estimated at {volum}; normally its surface is {altitude} above sea level, and its greatest depth is {depth}.

Insert properties (https://en.wikipedia.org/wiki/Wikipedia:Wikidata#Inserting_Wikidata_values_into_Wikipedia_articles)

At its widest, near [[Hamar]], the lake is {{#property:width|from=Q212209}} wide. It is {{#property:area|from=Q212209}} in area and its volume is estimated at {{#property:volume|from=Q212209}}; normally its surface is {{#property:elevation above sea level|from=Q212209}} above sea level, and its greatest depth is {{#property:depth|from=Q212209}}.
At its widest, near Hamar, the lake is 15±1 kilometre wide. It is 369.322128 square kilometre in area and its volume is estimated at ; normally its surface is 123.2±0.1 metre above sea level, and its greatest depth is

Failed to render property depth: depth property not found.

.

Glrx (talk) 18:43, 28 October 2016 (UTC)

Thanks for your findings, which is as expected. It is an example. — Jeblad 20:11, 28 October 2016 (UTC)

Looking at the updated proposal, I would still decline. I do not see a clear proposal but rather a confused collection of ideas. At one point there's a claim about doing research. Then there's a claim that objective 1 is "mostly an engineering problem". There's a goal of taking some stated text and "updating" it to use alternative markup with syntax such as {width}. But then there's a statement that "If we instead could write something like in example 2". OK, who is writing what text? Then we get, "The actual creation of a language model and parsing of the text according to that language model is outside the scope of the experiment." At one point the proposal is talking about matching 365 or 369 to 369.4533±0.0001 km². OK, a machine sees some text with one value in it, a database has some property with a "close" value, so can the machine conclude the text is talking about that property? Then there are some complexity comments about settling on using a single hash key. For research, I'd rather see hand made estimates of the benefits of a program and a hand survey of effective techniques before any coding is done. If I take 100 WP articles at random (press the random article button), how many of them would gain a significant benefit from pulling information from a known datasource? That has significant benefit even outside of NLP. In those 100 articles, how many have significant NLP problems (say the article discusses New York City during the reign of Boss Tweed, and there's a sentence about the city's population being X; we don't want X replaced by the city's current population). The first step should be a paper feasibility study rather than a prototype; that research can investigate ways to mark and tag wikitext. For research, I need evaluation against a null hypothesis; research without measurement is nothing. For engineering, I need an effective method; build it and maybe it will work is not good practice. Glrx (talk) 20:27, 1 November 2016 (UTC)

Thank you for making the comment, I will take it into consideration. (What do you imagine should be a null hypothesis in this case? A null hypothesis would imply there are no connection between the two phenomena, or in this case there are no connection between the text and the available data. Or do you imagine using a null hypothesis for the gain in using a system like this or some derivate, against a system with no such support? That is a hypothesis that there are no gain at all?) — Jeblad 15:34, 2 November 2016 (UTC)

Questions about your proposal[edit]

Hello Jeblad,

Thank you for submitting this proposal. In reviewing your proposal for eligibility, I have some concerns that I'd like to clarify with you before finalizing a decision. In one place, you say that your goal is to run some experiments, but not deliver a finished product. I understand this to meant that you would build a prototype in a labs context in order to supplement a wider discussion. However, in your measures of success, you indicate that by the end of the project you anticipate "a few articles from Wikipedia in some core languages (supported by the chosen NLP engine) that parses correctly, and where the chosen NLP engine provides some core values and/or adaptations from an external source (most likely Wikidata)." This suggests something beyond a prototype, since it would be enabled in Wikipedia. This is beyond an experiment and makes this project unfeasible within the scope of Project Grants. Many things would need to happen before this could be enabled on Wikipedia, including serious code review and security review. Consequently, this project is only eligible in Project Grants as an experiment in labs. The scope is too large.

Given that feedback, can you clarify the intent of your project, both here on your talkpage, and also in the proposal itself?

Another question, not with respect to eligibility, but with respect to impact: It looks as though you are proposing a direct bridge between NLP sources and Wikipedia articles. Is that true? If so, would it make sense to work on feeding better data to Wikidata? Or to explore how to integrate better content from Wikidata into the context of articles? Going straight from NLP to Wikipedia seems to me to miss a key step of Wikidata integration.

For this project to be funded, you will most likely need to demonstrate that there is strong community desire for the experiment you are proposing, since there are many open questions about how volunteer editors will respond to article integration. If it appears the community is not ready to support the outcomes of this project, it may be premature to run this experiment at this time.

At this point, however, the most important next step is to confirm whether your project is eligible, so your prompt response to my first question is the priority.

Thank you!

--Marti (WMF) (talk) 17:33, 27 October 2016 (UTC)

My fault, the project description should have been formulated much clearer. It is possible to read it as an experiment on live articles on Wikipedia, even if it was meant to be just an experiment with articles from Wikipedia. It is a lot missing before it is possible to run something like this on Wikipedia, as this is just a first step to see if it is possible at all. The core question is whether we can create the necessary pipeline, and if the additional load is manageable. To answer that question my proposal is to build a small test system, either as a local Vagrant instance or as a public Labs instance.
What you call NLP sources is in fact the same wikitext as in the Wikipedia articles, but annotated so further analysis and synthesis is possible. If we want to insert data from Wikidata into Wikipedia then we must be able to adjust the wikitext accordingly. Such alignment between words are usually quite simple in English, often just a klitikon s for the plural and enitive case, but in other languages several other alignments may be necessary such as for locative case. Without such alignment the text will be rather awkward. An other explanation (your question rephrased) could simply be "this is an exploration of how to integrate Wikidata into article text on Wikipedia with adjusted alignment". The field is often described as computational discourse. See for example Jurafsky and Martin, Speech and language processing, chapter 21 Computational discourse (p.715-758).
Extraction of information, whether from Wikipedia or some other source, is another field and even if it is very interesting, but I have not done anything in this area. his is described in Speech and language processing, chapter 22 Information Extraction (p.759-798). — Jeblad 16:00, 31 October 2016 (UTC)

Comments of Kevin Unhammer[edit]

This seems quite ambitious; anaphora/coreference resolution is much harder than things like parsing or NE, and the open-domain state of the art F-scores are typically not much over 60 % (e.g. http://nlp.stanford.edu/software/dcoref.shtml#About ). Even a system with 75 % precision will make an error every fourth time (and with any language other than English, the state of the art is much worse). So errors will happen – how do you mitigate against mistaggings? Are the resolutions considered suggestions to be reviewed and accepted by humans? (What then if the text changes?) Or is it a completely "online" system? (How then to mark errors?) --Unhammer (talk) 09:23, 31 October 2016 (UTC)

This is not about creating a new system for anaphora/coreference resolution, I'm only going to use whatever already available, the experiment is whether some existing system can create annotations that we are able to realign with wikitext. The actual creation of a language model and parsing of the text according to that language model is outside the scope of the experiment. If the score of the anaphora/coreference resolution is to low it will create mismatch during lookup of the value materialization methods, but quantifying that is part of what would be tested. In a future setup it would be two models that must be matched, one for the language and one for the semantic data. Matching those two should improve the recall of a valid interpretation. If all interpretations can be rejected, we can do that without attempting to match the models, then editors can be notified. That would pretty much similar to mw:Help:Spec. Still let me emphasize that this experiment is for a basic pipeline for the matching of existing annotations, it is neither about improving the chosen NLP engine nor adding editor tools for extracting or inserting semantic data. — Jeblad 15:23, 31 October 2016 (UTC)
I didn't mean to imply you were creating a new engine; my question was rather what happens when the engine you chose makes an error. Is the answer then that "editors can be notified"? Wouldn't you have to notify editors every time, since you don't know when the system makes an error? (By the way, I think you mean precision, not recall?) --Unhammer (talk) 08:07, 1 November 2016 (UTC)
Editors will be notified if a replacement rule isn't found, but fixing this problem is a step further than this experiment. Lack of precision in the NLP engine will give a failure on recall of the correct method, that is the constructed identifier will give a hash miss. In a next experiment the hash lookup can be replaced by some other method, but then the load will skyrocket.
The NLP engines usually tries to find one optimum interpretation and returns that, while what we want is an interpretation that is valid given the instantiation with the available values. Ie. the interpretation of the text fragment must match the available values, otherwise the interpretation is not valid. With a simple hashing scheme (like the optimum interpretation) it is likely to fail, that is the recall would be low but the precision would be high. In this case the load would be low. With a better lookup the chosen interpretation given an instantiation could give much higher recall, but then the precision will fall. In this case the load will be high.
The experiment can be described as an attempt to check if the simple hashing scheme is sufficient for wikitext, or whether a more fancy schema must be used. I suspect the later, but we need some code to verify that.
Notifying editors when its not possible to instantiate the variables could be similar to mw:Help:Spec, but this isn't really part of the experiment. (The text can be read as this is a core part of the experiment, but this is an error.) — Jeblad 13:39, 1 November 2016 (UTC)
Say you're writing the article about Finland, which contains the fragment "Saimaa is the largest lake in the country Finland. It has an area of {area}." and the engine says "Finland" is the argument of "area", while the author intended "Saimaa" to be the argument. Both referents have areas, and it's likely that an article about Finland would refer to information about Finland. --Unhammer (talk) 15:03, 1 November 2016 (UTC)
You have constructed a text with dual meaning, and assumes the NLP engine will chose the wrong interpretation. Yes the editor writing the text will see the error, but no, there is no automatic means proposed in this experiment to find such errors. This experiment will only use hash lookup and will not be able to distinguish between alternate interpretations. It is although possible to discern between different possible meanings, but that is not part of this experiment. (An interesting solution is to let the editor explicitly set the argument, and then using that information to train a neural network. Check out The Microsoft 2016 Conversational Speech Recognition System [1].)
I wonder if you have a wrong impression about the proposed experiment. This is about a first attempt to match existing annotations with a wikitext, and to act upon that annotation, it is not about fixing all kinds of errors with any given annotation. What you asking about is whether we can run when we can't even crawl. ;) — Jeblad 11:56, 2 November 2016 (UTC)

Eligibility confirmed, round 2 2016[edit]

IEG review.png

This Project Grants proposal is under review!

We've confirmed your proposal is eligible for round 2 2016 review. Please feel free to ask questions and make changes to this proposal as discussions continue during this community comments period.

The committee's formal review for round 2 2016 begins on 2 November 2016, and grants will be announced in December. See the schedule for more details.

Questions? Contact us.

--Marti (WMF) (talk) 20:08, 1 November 2016 (UTC)

Aggregated feedback from the committee for Better support for imported data in wikitext[edit]

Scoring rubric Score
(A) Impact potential
  • Does it have the potential to increase gender diversity in Wikimedia projects, either in terms of content, contributors, or both?
  • Does it have the potential for online impact?
  • Can it be sustained, scaled, or adapted elsewhere after the grant ends?
5.9
(B) Community engagement
  • Does it have a specific target community and plan to engage it often?
  • Does it have community support?
5.9
(C) Ability to execute
  • Can the scope be accomplished in the proposed timeframe?
  • Is the budget realistic/efficient ?
  • Do the participants have the necessary skills/experience?
6.0
(D) Measures of success
  • Are there both quantitative and qualitative measures of success?
  • Are they realistic?
  • Can they be measured?
4.3
Additional comments from the Committee:
  • While the proposal has a potential for online impact it is not clear how significant this impact can be - it is not entirely clear if automatically pulling the information from Wikidata will bring large benefits. On the other hand, it can create problems in the form of errors, which will be hard to find and fix. The proposal needs more work to clarify these issues.
  • This is not a tool project, it's more a research project but I don't believe it will be sustained: NLP is work that is already performed by different institutions that have their own way and we won't be able to build on that.
  • This is a research project on a potential need for easing content creation about specific data. This is clearly in accordance to our priorities. I think the research aspect is the core of this proposal.
  • The approach is innovative but risks are relatively large. Success can be measured but it may be difficult.
  • The idea is good but I am concerned about its application: it could be used for automatic processing of external sites to get info about population, climate and other statistics - it could even create some stubs in multiple languages. However, in this case the grantee concentrates on presenting wiki-text more than generating tables/stubs
  • This could lead to an innovation. But we need to have more of the theoretical context for what is being proposed. Before getting onto developing and releasing proposed code, we should improve research on the need that is potentially being addressed here.
  • The participant has necessary skills and budget is realistic. However it is difficult to say what can be accomplished in 6 months.
  • Community outreach appears to be limited.
  • The project is experimental by nature ; the community engagement is low, but I think it is better to wait to have some actual results to show before engaging the community.
  • The sentence: "Note that this project will mostly be experiments on how to use natural language processing in live wikitext, it will probably not be a final solution, and it will most likely create more questions than answer." does not give me much confidence in the potential impact of the project.
  • The author of the proposal should reach to the community and inquire if what he proposes is actually necessary. The proposal description should better balance possible benefits and problems that the proposed tool will create.
  • This could be revised into a smaller proposal, focusing exclusively on researching the need it appears to be trying to address. This activity if funded should be done in interaction with the WMF tech team.
IEG IdeaLab review.png

This project has not been selected for a Project Grant at this time.

We love that you took the chance to creatively improve the Wikimedia movement. The committee has reviewed this proposal and not recommended it for funding. This was a very competitive round with many good ideas, not all of which could be funded in spite of many merits. We appreciate your participation, and we hope you'll continue to stay engaged in the Wikimedia context.


Next steps: Applicants whose proposals are declined are welcome to consider resubmitting your application again in the future. You are welcome to request a consultation with staff to review any concerns with your proposal that contributed to a decline decision, and help you determine whether resubmission makes sense for your proposal.

Over the last year, the Wikimedia Foundation has been undergoing a community consultation process to launch a new grants strategy. Our proposed programs are posted on Meta here: Grants Strategy Relaunch 2020-2021. If you have suggestions about how we can improve our programs in the future, you can find information about how to give feedback here: Get involved. We are also currently seeking candidates to serve on regional grants committees and we'd appreciate it if you could help us spread the word to strong candidates--you can find out more here. We will launch our new programs in July 2021. If you are interested in submitting future proposals for funding, stay tuned to learn more about our future programs.