Grants talk:Project/DBpedia/GlobalFactSyncRE/Archive 1
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Cover Letter
Since this is a resubmission, we would like to provide an overview of what has happened in the last 6 months in so far as it is relevant for the proposal and also provide a summary of the changes to the proposal itself:
Updates regarding technological capabilities
- Things are already in motion as we envisioned them in the proposal. Compare: phab:T69659 and Grants:Project/DBpedia/GlobalFactSyncRE#Mapping_maintenance. The proposal will speed up this process and support it with technology to gather information in order to help syncing.
- DBpedia has switched to a monthly release cycle of all extractions. This means data is much more available and also it will greatly increase incentives for editing of mappings. This means that data can be synced more easily.
Summary of changes to the proposal itself
- Updated the template mapping statistics. It is +3% statistics page Note: we talked about Freebase inclusion in Wikidata. The main reason why there were so few facts included, seemed to be that it was to much work to map all the data. The DBpedia community has spent an immense effort to map all infoboxes in 41 languages. So we can include the data from Wikipedia's infoboxes much better in Wikidata.
- During 2018, we developed a general DBpedia platform for releases. Developing a software platform always is a trade-off between long-term goals and immediate results, which requires a branch-out. We decreased the DBpedia contribution to the project as we finished some features already. This means, that we don't need to invest in the trade-off so much, i.e. we don't need to branch out so much to guarantee the data is available for GlobalFactSync.
- We removed previous endorsements and linked to the old proposal
Eiffel Tower | |
---|---|
Tour Eiffel | |
General information | |
Type | Observation tower Broadcasting tower |
Location | 7th arrondissement, Paris, France |
Coordinates: | 48.858222°N 2.294500°E |
Construction started | 28 January 1887 |
Completed | 15 March 1889 |
Opening | 31 March 1889 (129 years ago) |
Owner | City of Paris, France |
Management | Société d'Exploitation de la Tour Eiffel (SETE) |
Height | |
Architectural | 300 m (984 ft)[1] |
Tip | 324 m (1,063 ft)[1] |
Top floor | 276 m (906 ft)[1] |
Technical details | |
Floor count | 3[2] |
Lifts/elevators | 8[2] |
Design and construction | |
Architect | Stephen Sauvestre |
Structural engineer | Maurice Koechlin Émile Nouguier |
Main contractor | Compagnie des Etablissements Eiffel |
Website | |
References | |
I. ^ Tower at Emporis |
New prototype and new ideas
On the talk page of the last proposal we made an example how infoboxes could link to GlobalFactSync. We updated the links beside the properties of infoboxes. We are following Agile development and rapid prototyping, which means prototypes are produced more frequently and then changed based on discussion. Here is what changed and still can change:
- We updated the UI a bit, but the data is still only for a handful of languages plus Wikidata and it is old data
- The sync symbols can be shown with a gadget
- At the moment it is on an external page, however, we use a simple mongoDB and JSON API. So this can also be directly included in via a javascript pop-up. Editors would not need to leave the Wikipedia version they are editing same for Wikidata editors. The information about other infoboxes can be inserted seamlessly into the editor workflow.
- We already made some tests and included data from the Dutch and German National library. So inclusion of external datasets for viewing and decision making is also possible.
- A live version might also be feasible, i.e. extracting data from Wikipedia on request and compare it to Wikidata and then display it.
Ping of previous people who took part in the discussion
I used a regex to get all the users who posted something in the previous proposals. `grep "\[\[User:[a-zA-Z0-9\ \.]*|" -o`
. If I got the regex right this should be everybody, i.e. not filtering out critical people.
Dear @BrillLyle:, @Crazy1880:, @Dan scott:, @Donald Trung:, @Ehrlich91:, @GerardM:, @He7d3r:, @Juliaholze:, @Jura1:, @Kiril Simeonovski:, @KrzysztofWecel:, @Lewoniewski:, @M1ci:, @Metrónomo:, @Mgns:, @Mike Peel:, @Multichill:, @Rachmat04:, @Sabas88:, @SebastianHellmann:, @Sj:, @S.karampatakis:, @Slowking4:, @TomT0m:, @Vojtěch Dostál:, @X black X:, @YULdigitalpreservation:, thank you for your input on the previous proposal. We wrote a cover letter on this talk page, so you can easily find the changes we made for the resubmission. All feedback welcome.
Details on reference extraction, relationship with existing tools
Thanks for resubmitting this proposal! I have a few questions:
- What sort of reference extraction process are you aiming for? For instance, how will the following references be parsed, if at all?
<ref>http://example.com/</ref>
<ref>{{Cite journal| last1 = Pape| first1 = Martin| last2 = Streicher| first2 = Thomas| title = Computability in Basic Quantum Mechanics|arxiv=1610.09209 | accessdate = 2016-12-15| date = 2016-10-27}}</ref>
<ref name="zisman1972">Michel Zisman, Topologie algébrique élémentaire, Armand Colin, 1972, p. 10.</ref>
<ref>[[Saunders Mac Lane]] (1998): ''Categories for the Working Mathematician'', Graduate Texts in Mathematics 5, Springer; ISBN 0-387-98403-8</ref>
<ref name="zisman1972" />
- At the moment infoboxes are harvested into Wikidata with tools such as Harvest Templates - do you plan any interaction with these projects? Could they be improved with the technology that you develop (for instance reference extraction)? Would CrossWikiFact provide similar batch editing functionality? Having a tool to manually review the discrepancy of infoboxes on one particular item can be useful, but improving tools such as Harvest Templates would have a much bigger impact on the quality of references in Wikidata, as this tool is already widely used and responsible for a lot of the existing Wikidata statements.
− Pintoch (talk) 12:47, 2 December 2018 (UTC)
- References were already extracted several times. Here is one old dataset: data, preview. I am counting 913,695 for English, but this was 2016. There are these things that need to be done:
- 1. stabilize this extraction so it runs frequently, either once a month or live (on request)
- 2. adapt the format and information extracted, so it is directly useful for Wikipedia/Wikidata/Wikicite. URIs do not need to be citation.dbpedia.org for example.
- 3. references were extracted as a list, i.e. without having a relation to the infobox. We need to make this connection. Like the mock-up infobox above contains 6 references for the individual statements. It is not hard, but some work to get it done decently. SebastianHellmann (talk) 23:24, 2 December 2018 (UTC)
- @Pintoch: Do you know the mappings wiki at http://mappings.dbpedia.org ? Around 300 people worked there for over 6 years to map 80% of all infoboxes in 41 languages to the DBpedia Ontology, which is almost the same as the Infoboxes and therefore also Wikidata P's. Here are statistics: http://mappings.dbpedia.org/server/statistics/en/ With the Mappings we almost know exactly which property in the infoboxes relate to Wikidata properties. As far as I see the Harvest Template tool could benefit in the following way:
- 1. Instead of a one time import the tool can sync, i.e. it would compare the current state of loaded infobox with the current state of Wikidata to show differences. Not sure if a log is created for Template and Parameter. However such a log would be similar to the mappings. We would definitely want the log there to have the mapping from template+parameter to Wikidata property.
- 2. this paper contains a rough description of the parser we use on page 4: http://jens-lehmann.org/files/2009/dbpedia_jws.pdf Over 40 people worked on the software in the last 10 years. It should be quite a bit better than any rewritten ad-hoc parser
- 3. Harvest Templates does not extract the references, I assume?
- If you go here: http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_Company and click on Test this mapping you can see an example of what is extracted form an example infobox. All this tooling has been running for years and it is all open source. SebastianHellmann (talk) 23:24, 2 December 2018 (UTC)
- @SebastianHellmann: thanks for the thorough reply! Yes I agree with you about the possible improvements to HarvestTemplates. The tool does compare the existing Wikidata statements to the value from the infoboxes (your point 1.) - in fact it even uses the Wikidata constraint system to check if the value to be added will violate constraints (which is very useful for quality assurance). Logging the runs to keep track of the mappings sounds like a great idea, and it would be fantastic to add some reference extraction to it. May I ask again which of the examples above would be parsed by your system? Can you extract more than URLs? Extracting structured information from citation templates is doable, either directly from the wikicode or from the COinS generated in the HTML output.
- @Pintoch: May I ask again which of the examples above would be parsed by your system? What do you mean? We parse all articles for 120 Wikipedia languages. Normally, the full information is extracted, not matter whether it is string or url, there are like 20 different parsers and the most appropriate is selected. Quality depends on the Infobox Parameters, some are very cryptic. It would be a systematic approach basically joining the DBpedia and the Harvest Templates Workflow. We also have statistics, which template parameters are not yet mapped/extracted: http://mappings.dbpedia.org/server/statistics/ SebastianHellmann (talk) 08:35, 3 December 2018 (UTC)
- @SebastianHellmann: I am asking specifically about the parsing of the references, not the parsing of the values they support. By reference parsing, I mean extracting various fields (title, publisher, author, URL, retrieved date, identifiers) from a reference represented in Wikitext. This is necessary to store references in Wikidata as Wikibase represents references in structured form. Do I understand correctly that solving this problem is in the scope of your project? There are various reference parsing systems out there, which support various sorts of domains and target bibliographic models. Some just look for bibliographic identifiers using regular expressions. Some others extract information from citation templates. Advanced citation parsers using machine learning can parse free-form citations. For instance, Bilbo can be trained to parse free-form references from Wikipedia (it will transform strings such as Michel Zisman, Topologie algébrique élémentaire, 1972, p. 10 to
<author><surname>Michel</surname> <forename>Zisman</forename></author>, <title>Topologie algébrique élémentaire</title>, 1972, <abbr>p.</abbr> 10
, which could be the basis of a Wikidata reference). So, what sort of reference parser do you plan to use, and could you give an example of the sort of bibliographic fields it would extract on the examples above? − Pintoch (talk) 10:59, 3 December 2018 (UTC)- @Pintoch: The current citation extraction from DBpedia goes at the wikitext level, parses all the citation templates and extracts all the parameter keys & values. Using some heuristics it assigns either a URL from the citation (if there is any) or a hash of the complete citation to be able to identify the same "anonymous" citation accross different pages. For an example see at this file citation data, preview. This contains a lot of metadata about the citation that could potentially be inserted in Wikidata along with the Wikidata reference entry. Additionally, it tracks which references are referenced in which article that can drive many different statistics: citation links, preview. You may find some statistics examples here. − Jimkont (talk) 07:22, 4 December 2018 (UTC)
- @Jimkont: Brilliant! That sounds like a fantastic parser. I can't wait to see these references imported in Wikidata. − Pintoch (talk) 08:34, 4 December 2018 (UTC)
- @Pintoch: The current citation extraction from DBpedia goes at the wikitext level, parses all the citation templates and extracts all the parameter keys & values. Using some heuristics it assigns either a URL from the citation (if there is any) or a hash of the complete citation to be able to identify the same "anonymous" citation accross different pages. For an example see at this file citation data, preview. This contains a lot of metadata about the citation that could potentially be inserted in Wikidata along with the Wikidata reference entry. Additionally, it tracks which references are referenced in which article that can drive many different statistics: citation links, preview. You may find some statistics examples here. − Jimkont (talk) 07:22, 4 December 2018 (UTC)
- @SebastianHellmann: I am asking specifically about the parsing of the references, not the parsing of the values they support. By reference parsing, I mean extracting various fields (title, publisher, author, URL, retrieved date, identifiers) from a reference represented in Wikitext. This is necessary to store references in Wikidata as Wikibase represents references in structured form. Do I understand correctly that solving this problem is in the scope of your project? There are various reference parsing systems out there, which support various sorts of domains and target bibliographic models. Some just look for bibliographic identifiers using regular expressions. Some others extract information from citation templates. Advanced citation parsers using machine learning can parse free-form citations. For instance, Bilbo can be trained to parse free-form references from Wikipedia (it will transform strings such as Michel Zisman, Topologie algébrique élémentaire, 1972, p. 10 to
- @Pintoch: May I ask again which of the examples above would be parsed by your system? What do you mean? We parse all articles for 120 Wikipedia languages. Normally, the full information is extracted, not matter whether it is string or url, there are like 20 different parsers and the most appropriate is selected. Quality depends on the Infobox Parameters, some are very cryptic. It would be a systematic approach basically joining the DBpedia and the Harvest Templates Workflow. We also have statistics, which template parameters are not yet mapped/extracted: http://mappings.dbpedia.org/server/statistics/ SebastianHellmann (talk) 08:35, 3 December 2018 (UTC)
- @SebastianHellmann: thanks for the thorough reply! Yes I agree with you about the possible improvements to HarvestTemplates. The tool does compare the existing Wikidata statements to the value from the infoboxes (your point 1.) - in fact it even uses the Wikidata constraint system to check if the value to be added will violate constraints (which is very useful for quality assurance). Logging the runs to keep track of the mappings sounds like a great idea, and it would be fantastic to add some reference extraction to it. May I ask again which of the examples above would be parsed by your system? Can you extract more than URLs? Extracting structured information from citation templates is doable, either directly from the wikicode or from the COinS generated in the HTML output.
- I do agree with you that a lot of effort has been put into your extraction framework - it would be great to reuse it for Wikidata indeed! My point is that your project would have a much bigger impact if the outcome is a tool that works like HarvestTemplates, where semi-automatic batch extraction supervised by a Wikidata editor can be done. That is the dominant workflow in the community at the moment. The risk with a prototype like your demo is that it is not picked up by the community because it does not match their needs as nicely, even if the extraction algorithm is more principled.
- Ok, I see your point. The prototype has two foci: one is WP2WP and the other WP2WD. So this would be for the Wikipedia2Wikidata. Starting from HarvestTemplates would be a good idea, thanks for the hint SebastianHellmann (talk) 08:35, 3 December 2018 (UTC)
- So I would encourage you to get in touch with Pasleim to coordinate on this, maybe? − Pintoch (talk) 00:21, 3 December 2018 (UTC)
- already did leave a message: https://www.wikidata.org/wiki/User_talk:Pasleim SebastianHellmann (talk) 10:27, 3 December 2018 (UTC)
- Awesome! − Pintoch (talk) 10:59, 3 December 2018 (UTC)
- already did leave a message: https://www.wikidata.org/wiki/User_talk:Pasleim SebastianHellmann (talk) 10:27, 3 December 2018 (UTC)
- I do agree with you that a lot of effort has been put into your extraction framework - it would be great to reuse it for Wikidata indeed! My point is that your project would have a much bigger impact if the outcome is a tool that works like HarvestTemplates, where semi-automatic batch extraction supervised by a Wikidata editor can be done. That is the dominant workflow in the community at the moment. The risk with a prototype like your demo is that it is not picked up by the community because it does not match their needs as nicely, even if the extraction algorithm is more principled.
Connection with Harvest Templates
Hi @Pasleim:, Pintoch suggested to integrate some of the DBpedia tools into the HarvestTemplates UI instead of making a new interface, which makes totally sense. Some questions:
- Do you keep a log of the Template Parameter field for each request?
- What parser are you using?
- In terms of your agenda, what are concrete points that you are working on?
SebastianHellmann (talk) 08:44, 3 December 2018 (UTC)
Lack of community connection and a sample test
I looked at the number of contributions of the proposers:
You don't appear to be the most active members of the Wikidata community. What makes you think you understand the problems and challenges of the community? How did you learn this? How are you going to convince the Wikidata community to actually use your tool?
I took the first couple of records of your birthdate sample how many records it would take to get something good to add:
- d:Q1004132 - "1917-02-19" -Already has date of birth
- d:Q102110 - "1915-01-12" - Already has a date of birth and Wikipedia's don't agree on it
- d:Q1029178 - "1902-11-26" - Sibling duo shouldn't have a date of birth
- d:Q10294096 - "1928-12-17" - Seems to be a fictional character, not sure
- d:Q103082 - "1545-12-23" - German Wikipedia says before 1545-12-23, not that date
- d:Q10310267 - "1970-04-10" - Already has this date of birth since 2016
- d:Q10311057 - "1822-12-26" - Already has this date of birth
- d:Q10312485 - "1780-01-25" - Already has this date of birth
- d:Q10320912 - "1973-06-07" - Already has this date of birth
- d:Q10349945 - "1885-04-29" - Already has this date of birth
- d:Q1038545 - "1473-06-05" - Another fictional redirect human
- d:Q1039362 - "1957-09-20" - Disambiguation pages don't have date of birth
- d:Q10491607 - "1960-06-19" - This one might be correct, but someone set it to unknown explicitly and that's usually indication of problems . Hard to check with the broken source
- d:Q1063111 - "1959-06-06" - Duo shouldn't have a date of birth
- d:Q1063466 - "1963-05-12" - It's a band
- d:Q1063706 - "1992-06-30" - Another band
- d:Q10749122 - "1972-09-27" - List with an interesting edit history
- d:Q10860144 -"1976-11-04" - Already has this date of birth
- d:Q1094052 - "1980-09-29" - Another band
- d:Q11023748 - "1410-12-18" - Already has a date of birth, but this one seems a bit better
So that's 20 records to find maybe one improvement? I would conclude from that our existing tools are pretty good and we cared more about importing date of birth than importing populations. What would the new tool add to this? Multichill (talk) 21:43, 4 December 2018 (UTC)
- You are right, the tool in this form is not usable, but the proposal is about shaping the tool. At the moment, the tool just shows that we have all this data, i.e. pretty much complete information about all infoboxes plus an 80% mapping of values to Wikidata. The tool is also not supposed to fix easy challenges, but the hard ones:
- * Data is pareto-efficient, i.e. the first 80% are easy and then the last 20% are much harder. To be adopted well by Wikipedia, Wikidata needs a lot more completeness. So it is about finding exactly the infobox data and references that is missing in Wikidata and build a power tool to fill it
- * If the Wikipedia's adopt Wikidata for most of the template, the principle of Single Source of Truth applies, i.e. data in Wikidata, loaded and shown in Wikipedia. However, this is not really happening, so there is a co-evolution problem, hence the syncing approach.
- * your list is quite a good input. It pretty much lists all the cases that need to be filtered out in order to be effective. So I would say from the 20 you have listed, we can filter around 16 automatically, then only 4 would remain to be inspected.
- * references is a big plus, if they can be migrated to Wikidata or Wikicite
- * did you see the discussion about Harvest Templates. We made a simple prototype based on an entity-centric view. It seems though that Wikidata community might rather prefer a Template-centric process or narrower a template-parameter-centric approach or even an approach that focusses on the same parameter across multiple templates in multiple languages. We are quite flexible in this and we suggested rapid-prototyping to get feedback on what is the right amount of data. We probably have to find out whether Wikipedians are more focused on articles (like the prototype is now), domains or templates.
- * SJ is more focused on Wikicite and Nicolastorzec was from Yahoo. Both are volunteers/advisors. For myself, I am representing DBpedia, which is a 12 year old open data project based on Wikipedia's infoboxes. Short history: DBpedia and Freebase started around the same time. Semantic Media Wiki (Denny was a member of this) failed to be adopted in Wikipedia. Freebase was bought by Google. Google shut down Freebase and then donated a large sum to Denny/Wikimedia to create Wikidata (that is how I remember it, I might be wrong on the donation part, but there were several millions involved for the first year), which is now free for Google: https://blog.wikimedia.org/2017/10/30/wikidata-fifth-birthday/ There is also a survey comparing the different data projects: http://www.semantic-web-journal.net/system/files/swj1465.pdf So we definitely have a bird's eye perspective, which can seem quite far from everyday editing.
- @Multichill:As a summary here: Maybe you could think of a way the tool should work that would be best for you personally, what would you find practical? SebastianHellmann (talk) 23:42, 4 December 2018 (UTC)
Focus of Tool for Wikidata
The main problem here is that for us the prototype first of all shows that we have the data, which we could exploit in several ways, however for Wikipedians and Wikidata users the process of using it is the main focus. We assumed that for Wikipedians an article-centric view would be the best, i.e. you can directly compare one article's infobox with all other articles and wikidata. However, for Wikidata the article/entity-centric view does not seem practical and we would like to have feedback on this. The options for globalfactsync are:
- entity-centric view as it is now: same infobox across all wikipedias and wikidata for one article/entity
- template-centric (this one will not work, as there are no equivalent infoboxes across Wikipedias or only very few )
- template-parameter-centric: this is the current focus of Harvest Templates, i.e. one parameter in one template in one language https://tools.wmflabs.org/pltools/harvesttemplates/ Note that one improvement DBpedia could make here is the mappings we have parameter to DBpedia to Wikidata. Another is that we can save the logs and manifest the mappings entered by users to do a continuous sync, at the moment it is a one time import
- multilingual-template-parameter-centric or wikidata property centric, i.e. one parameter/one Wikidata P across multiple templates across multiple languages. This is supercharging harvesttemplates, but since it is a power tool for syncing, it gets more complex and overview is difficult.
SebastianHellmann (talk) 09:19, 6 December 2018 (UTC)
Eligibility confirmed, round 2 2018
We've confirmed your proposal is eligible for round 2 2018 review. Please feel free to ask questions and make changes to this proposal as discussions continue during the community comments period, through January 2, 2019.
The Project Grant committee's formal review for round 2 2018 will occur January 3-January 28, 2019. Grantees will be announced March 1, 2018. See the schedule for more details.
Questions? Contact us.--I JethroBT (WMF) (talk) 03:07, 8 December 2018 (UTC)
Questions about the budget
This proposal looks very interesting. Thanks for sharing it. I have two questions.
1. Your budget calls for funding a developer full time for one year. How did you come up with this estimate of time needed for the developer?
- In the previous proposal, we only had 9 months full time, but it was suggested to go 12 to have the extra months for improving the code base to a productive status and clean up. This was given as feedback to us and it made a lot of sense. Following rapid prototyping and agile methods will require much interaction to gather and track feedback. So it is effective to have one person that has the overview and leadership here and that has the time to react fast. If we have somebody half-time here, valuable discussions might go stale, features can be forgotten or implemented too late. SebastianHellmann (talk) 09:17, 2 January 2019 (UTC)
2. Your budget for this full time developer for one year is 63,000€/71,621USD. For a full time developer for one year, that compensation seems low from my US perspective. Experienced developers here can easily make over $100,000 per year plus substantial benefits such as medical insurance, stock options, vacation pay, and retirement savings. How certain are you that you can hire someone who is a good fit for the job for 63,000€/71,621USD for a full year of work?
Thanks, --Pine✉ 03:54, 30 December 2018 (UTC)
- The tariff is this one: TVL 13 (medical insurance, vacation pay and retirement savings are in the 63,000€). It is decent for central Europe. Going further east or south-west in Europe wages are much lower, UK/Ireland, France are a bit higher on average. DBpedia is quite popular and we have a huge network of universities and companies and also very good developers and a lot of students. There is also a small waiting list of people that want to work on topics like this endeavour. It is clear, that any developer can get a higher paid job working in closed source environment doing boring work.
- Regarding the skill level: the work is difficult in some points, but since we have a good team in the background plus the DBpedia community, a person that is just beginning her career is totally fine and can get a lot of help if stuck. I think the most important qualities would be to be open-minded/flexible and communicative/reactive and then be a fast learner. SebastianHellmann (talk) 09:17, 2 January 2019 (UTC)
Question about population sample
We discussed the sample with population data in some detail last time and I don't really see any difference (Grants_talk:Project/DBpedia/GlobalFactSync#Population_number_sample). Last time it was suggested that it's too late to change it at that stage, but what was done since? --Jura1 (talk) 05:23, 2 January 2019 (UTC)
- @Jura1: Meanwhile we have published a Wikidata adoption or better non-adoption report in the second half of the email here: https://lists.wikimedia.org/pipermail/wikidata/2018-December/012681.html
- the report shows conclusive evidence that 584 million facts were in Wikipedia infoboxes across all languages and are not using Wikidata. The report does not show how much of these 584 million facts is in Wikidata already, these numbers could be produced as part of this proposal, let's say we could decrease that number by 30-40% and have it synced across Wikipedias and Wikidata. That would be a great impact.
- I re-read our last discussion. The point of it was that the example was not great. For the actual work we proposed to focus on 10 sync targets to focus on properties that work well. There is still a lot of duplicated work, i.e. the same value maintained at potentially 122+ different places (Wikidata, Commons, 120 Wikipedias) SebastianHellmann (talk) 08:43, 2 January 2019 (UTC)
- The problem is that the other sample with dates of birth isn't that great either. Furthermore, a discussion at Wikidata of references potentially found for such dates in infoboxes didn't suggest that it's a conclusive approach. There are several other tools in place to work on such dates. Obviously, I can't really think of a better team to work on infobox mapping, but I find it somewhat regrettable that Wikidata hasn't had much benefit earlier. Creating another Wikidata now isn't necessarily improving things for the 122+ places. Jura1 (talk) 09:47, 8 January 2019 (UTC)
- I am not quite sure, what you mean with "Creating another Wikidata now". We are proposing a transparent middleware, i.e. DBpedia has the state-of-the-art tools for data extraction and mappings, we can cache all data extracted from 122+ sources to have global information on where, which kind of data is (even in Commons) and then provide power tools to editors to help update sources (like an extended HarvestTemplates or a gadget as a solid foundation for a visual editor extension). Wikidata could have seeded with DBpedia's data in the beginning. This would have saved a lot of edits and botting and the data would have been closer to the data in the infoboxes, thus helping adoption of Wikidata in Wikipedia. SebastianHellmann (talk) 12:01, 8 January 2019 (UTC)
- I have to admit, that we approached this proposal in the wrong way. We wrote a lot of text explaining what we can do and build a bad prototype. If we had to do it again, we would first invest all the work in building a useful prototype and show some value like people using it and good examples (i.e. not birthdate nor populationcount) and then write very little. But we didn't know that when doing starting to propose here. SebastianHellmann (talk) 12:01, 8 January 2019 (UTC)
- The problem is that the other sample with dates of birth isn't that great either. Furthermore, a discussion at Wikidata of references potentially found for such dates in infoboxes didn't suggest that it's a conclusive approach. There are several other tools in place to work on such dates. Obviously, I can't really think of a better team to work on infobox mapping, but I find it somewhat regrettable that Wikidata hasn't had much benefit earlier. Creating another Wikidata now isn't necessarily improving things for the 122+ places. Jura1 (talk) 09:47, 8 January 2019 (UTC)
Aggregated feedback from the committee for DBpedia/GlobalFactSyncRE
Scoring rubric | Score | |
(A) Impact potential
|
8.0 | |
(B) Community engagement
|
8.0 | |
(C) Ability to execute
|
8.6 | |
(D) Measures of success
|
7.2 | |
Additional comments from the Committee:
|
This proposal has been recommended for due diligence review.
The Project Grants Committee has conducted a preliminary assessment of your proposal and recommended it for due diligence review. This means that a majority of the committee reviewers favorably assessed this proposal and have requested further investigation by Wikimedia Foundation staff.
Next steps:
- Aggregated committee comments from the committee are posted above. Note that these comments may vary, or even contradict each other, since they reflect the conclusions of multiple individual committee members who independently reviewed this proposal. We recommend that you review all the feedback and post any responses, clarifications or questions on this talk page.
- Following due diligence review, a final funding decision will be announced on March 1st, 2019.
Comments on the Aggregated Feedback
First of all, we would like to thank the reviewers for the positive rating. We have received quite a lot of comments from reviewers and community over the last years and hope that we have finally managed to accommodate them.
High potential of cross-wiki impact, as high-quality data with references on Wikidata is likely to improve a lot of Wikimedia projects. Minor concern regarding sustainability once the grant ends (tools should continue to be live and maintained)
- Regarding sustainability: DBpedia also has a stake in this project. Like Google, we are also one of the big consumers of Wikipedia's and Wikidata's data. In my opinion, this is a very important aspect as I don't see the project as a form of contractual work, but rather as a way to establish a long-term symbiotic system. At the moment, it is a system, where everybody spends a lot of work on the same things: Editors on infoboxes across languages, Wikidata as an extra place to edit data with slow adoption and DBpedians writing mappings for everything. 6 or 7 years ago, we already tried to bring the DBpedia mappings (e.g. Infobox Ambassador) into the template documentation, which is why they are in template syntax. The main goal here is also not to improve the DBpedia system. If we manage to raise either Infobox adoption of Wikidata in Wikipedia, fill the missing template data in Wikidata or have a better mapping from infobox parameters to Wikidata properties OR mapping of Wikidata properties to infobox parameters, this would be great for us to pick it up in the future. SebastianHellmann (talk) 10:21, 18 February 2019 (UTC)
The community engagement appears to be limited - basically only the Wikidata community - the same as with DBpedia/GlobalFactSync and CrossWikiFact. This lack of engagement led to a negative review from WMF Contributors Product Team last time.
- As written in the other comments, we approached the proposal in a wrong way, i.e. from a high-level, systematic perspective. It would have been easier to start with a working prototype, which shows some value and has some users and then write a much shorter proposal with a generalisation / improvement path. Now, we have too much text and a too less functional mockup prototype. SebastianHellmann (talk) 10:21, 18 February 2019 (UTC)
I supported the previous variants of this proposal - DBpedia/GlobalFactSync and CrossWikiFact and, in fact, I just copied portions of my last review here. I will support this time for consistency although it may fail staff, WMF Contributors Product Team and Wikidata team reviews this time as in the past.
- This point is confusing for us. In general, we have contacts to staff like Denny, Daniel Kinzler (who also comes from Leipzig), Stas, Dario, Tamara in SF and especially we had two brainstorming sessions with Lydia Pintscher, who suggested the Gadget as a pre-prototype for Visual Editor integration. However, we didn't push hard on this end as we didn't know staff is involved here. Lydia was the only one giving us feedback and we incorporated it. We don't know about any other criteria SebastianHellmann (talk) 10:21, 18 February 2019 (UTC)
- Just to clarify, contact to other staff besides Lydia is on-going and sporadically since several years and was not specific to this proposal, but served as indirect input SebastianHellmann (talk) 13:02, 18 February 2019 (UTC)
Round 2 2018 decision
Congratulations! Your proposal has been selected for a Project Grant.
The committee has recommended this proposal and WMF has approved funding for the full amount of your request, 63,000 EUR / $71,577 USD
Comments regarding this decision:
The committee supports the development of GlobalFactSync, and appreciates the applicants’ updates from its last application and significant efforts to respond to feedback. Furthermore, the development of this tool was supported as an opportunity for direct collaboration between DBpedia and the Wikidata community, and an effective approach to reducing unreferenced statements on Wikidata.
Next steps:
- You will be contacted to sign a grant agreement and setup a monthly check-in schedule.
- Review the information for grantees.
- Use the new buttons on your original proposal to create your project pages.
- Start work on your project!
Upcoming changes to Wikimedia Foundation Grants
Over the last year, the Wikimedia Foundation has been undergoing a community consultation process to launch a new grants strategy. Our proposed programs are posted on Meta here: Grants Strategy Relaunch 2020-2021. If you have suggestions about how we can improve our programs in the future, you can find information about how to give feedback here: Get involved. We are also currently seeking candidates to serve on regional grants committees and we'd appreciate it if you could help us spread the word to strong candidates--you can find out more here. We will launch our new programs in July 2021. If you are interested in submitting future proposals for funding, stay tuned to learn more about our future programs.
I JethroBT (WMF) (talk) 15:04, 1 March 2019 (UTC)
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |