Grants:Project/DBpedia/CrossWikiFact

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Unfortunately, this project proposal has not been funded in the Dec 2017 round. We thank all the reviewers for their feedback. Considering all your valuable feedback, we will further improve the proposal for application for a next grant. Please see GlobalFactSync for the follow-up proposal.


statusnot selected
CrossWikiFact
summaryDBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts are referenced in infoboxes across all Wikipedias (2) where Wikidata is already used in those infoboxes. CrossWikiFact will produce a website that detects and displays differences across infobox facts and fills Wikidata with missing information and their references and suggests changes to Wikipedia users.
targetAll Wikipedia language editions + Wikidata
amount45,120€ / 53,420 USD
contact• dbpedia@infai.org
organization• DBpedia Association
this project needs...
volunteer
join
endorse
created on09:39, 7 September 2017 (UTC)


Project idea[edit]

What is the problem you're trying to solve?[edit]

What problem are you trying to solve by doing this project? This problem should be small enough that you expect it to be completely or mostly resolved by the end of this project. Remember to review the tutorial for tips on how to answer this question.


Wikipedians have spent a great effort over the last decade to collect facts in infoboxes. The problems that come along with infoboxes are well known in the Wikipedia community, as infoboxes are tedious to maintain and structure and facts can not be imported from other language versions with richer infoboxes. Furthermore, many different and sometimes conflicting statements are made across languages for the same articles, which are difficult to spot and improve. While these issues have ultimately led to the establishment of Wikidata, the richness of information curated by thousands of Wikipedia editors has not yet been transferred to Wikidata and adoption is slow.

DBpedia has crawled and extracted infobox facts from all Wikipedia language editions for the past 10 years and has continuously refined the freely available extraction software, thus disseminating the great treasure of information in machine readable formats to a plethora of IT systems. As discussed with Lydia Pintscher at the DBpedia Community Meeting in Leipzig 2016, the DBpedia could not be used for Wikidata directly as only facts are extracted from infoboxes, but not the references of these facts.

Problem 1: Manually created facts from Wikipedia infoboxes are not included in Wikidata.

Problem 2: Facts from Wikipedia infoboxes are partially conflicting.

In order to support the magnitude of this proposal, we have created a prototypical website that demonstrated how we envision the CrossWikiFact website, one of the major deliverables of this proposal. We have analysed Wikidata entity Q1-Q10000 and compared core values with infoboxes from 128 language versions:

CrossWikiFact Early Prototype is accessible by adding the Q number to the following URL: http://downloads.dbpedia.org/temporary/crosswikifact/results/qXXX.html, e.g. Earth(Q2) and Berlin(Q64). An index can be found here. Overall, for the first 10000 Q's, we found 156,538 values in Wikipedia Infoboxes, that maybe missing in Wikidata, resulting in approximate 300 million facts for the whole Wikidata (20 million articles). An actual corrected value will be determined as part of this project.

According to the guidelines, "statements on Wikidata should be [...] supported by referenceable sources of information", whereas the use of Wikipedia itself as a primary source is discouraged. Also DBpedia cannot serve as a source as information derives at the moment mainly from Wikipedia. It is, after all, quite common that statements are lacking any reference or Wikipedia is the only given reference (e.g. all existing references for statements of Poznań (Q268) refer to Wikipedia, such as "imported from (P143): Russian Wikipedia (Q206855)" or "reference URL (P854): https://en.wikipedia.org/wiki/Pozna%C5%84").

Problem 3: Wikidata statements are lacking references to specific sources.

The Wikipedia editors already are spending much effort to link references in the articles, e.g. the German Wikipedia article for Posen provides proper reference for the population of the city. While DBpedia is extracting over 14 billion facts from infoboxes in 120 Wikipedia versions twice a year, the primary source references are only partially extracted and not yet in the required format for import into Wikidata.

Problem 4: Currently, DBpedia can not provide a fully adequate and broad coverage of references for its facts.

At the moment, DBpedia is extracting very few references. For the last 10 years, the DBpedia community had many discussions about how to contribute back to Wikipedia and Wikidata. While the general idea is very welcome among DBpedians, we did not find good channels to interact with Wikimedia projects. While we extract many facts from Wikipedia (14 billion facts twice a year), some parts, especially the references, that are relevant to connect the projects are not yet extracted.

What is your solution to this problem?[edit]

For the problem you identified in the previous section, briefly describe how you would like to address this problem. We recognize that there are many ways to solve a problem. We’d like to understand why you chose this particular solution, and why you think it is worth pursuing. Remember to review the tutorial for tips on how to answer this question.


Solutions[edit]

CrossWikiFact will provide three solutions to the mentioned problems:

  1. CrossWikiFact website: a website, that will support editors to spot differing values and thus help them to quickly edit Wikidata or Wikipedia. We envision this to be a power tool, providing direct edit links, integration with the Visual Editor and suggestion of pre-filled forms or Wikipedia templates that use Wikidata directly for infoboxes. The prototype we created is very far from this vision: it only shows the currently available data, it is static (no updates), and all interactive functionality is still missing.
  2. Data and Reference Acquisition: in the coming three months DBpedia will extensively focus on innovating their data cleaning, linking and fusion methods (new identifiers, new ways to edit ontologies and mappings, more automatic data tests). While this serves as a good basis to have more and better data available, this provides no direct benefit to Wikidata/Wikipedia. We apply for this grant to have the resources to bridge the gap between Wikidata and Wikipedia and create a feedback cycle. Finding source references for facts is one of goals here.
  3. CrossWikiFact import: while the main idea is to use the CrossWikiFact website to let editors decide what to import. We believe that a part of the data can be imported in bulk. Prerequisites are of course that the data was thoroughly evaluated and references exist.

CrossWikiFact website[edit]

As a major outcome of the project, we will continuously improve the prototype, which is already online. The website will allow a per page access and give a detailed overview over the available data and therefore serve as an access point to check data completeness, the state of the infoboxes in Wikipedia and also highlight the values that show differences between the Wikipedia versions and Wikidata. Besides the intended effects of improving the factual data, we also see great potential in bringing all communities closer together to work towards the unified goal of providing knowledge to the world.

Data freshness: At the moment, data is loaded static. To be effective, we will try to keep the data updated and as fresh as possible by integrating the Wikidata API and also using live extraction mechanisms ([DBpedia Live http://live.dbpedia.org/]) that analyse infoboxes on the fly. Also further APIs can be integrated such as Diffbot and StrepHit.

Interaction: Wikipedians should be able to interact in several ways. We will include proper edit links, pre-filled forms (if possible), template suggestions and better sorting (ie highlight and move to front most obvious and pressing differences) to allow editors to quickly work through pages. Usability feedback will be collected in several channels, so that we can improve interactivity based on the wishes of the community.

Data and Reference Acquisition[edit]

We propose to tackle the problems mentioned above with several different data acquisition strategies, some which are straightforward and others that require more advanced methods like relation extraction from text. The core of our solution is to provide well-referenced data to Wikidata in an automatic way. We are confident that we can automate this transfer of data with high accuracy as we see it as a transformation of existing human-curated facts. Below, we will described the individual ways to acquire Wikidata-compatible facts (i.e. including reference and qualifier):

  1. As can be seen in the prototype, the DBpedia extraction framework (Github link) is already able to extract most of the facts from infoboxes. As a conclusion of the discussion in Leipzig 2016, we agreed, that it would be straightforward to import these facts via DBpedia as a middle layer into Wikidata, given the existing citations and qualifiers in Wikipedia were extracted as well. Within the last year, however, only minor progress has been made in this direction due to missing resources. We estimate that this can be achieved in the first months of the CrossWikiFact project by extending the Scala software as well as establishing RML mappings of Wikipedia citation templates. The editing of RML mappings is done by volunteers of the DBpedia community.
  2. In addition to 1, references for facts can be found by mining existing authoritative datasets such as the CIA World Factbook, library data (e.g. Virtual International Authority File) or national government datasets (https://index.okfn.org/) or company registers (eg OpenCorporates, BRIS European register inteconnection initiative, and relevant work in the euBusinessGraph project). Previous work by the Spanish DBpedia Chapter exists. Furthermore, DBpedia's agenda is focusing on implementing a fusion and cross-checking pipeline in the coming months. Because this work is at the core of DBpedia, it is, of course, not funded via this proposal. We mention it here as the results of this endeavor can be exploited to improve data quality (accuracy, qualifiers, references) before the transfer into Wikidata. The work in CrossWikiFact will focus on the selection of facts that sufficiently meet the criteria of Wikidata and guarantee that they will be compatible and loaded.
  3. While we expect that 1 and 2 will already significantly increase the availability of data, we further plan to optimize the results by Open Web extraction and reference mining. For this we will integrate StrepHit (created by Marco Fossati, who is a leader of the Italian DBpedia Chapter) and the API provided by DBpedia Community member Diffbot . Both approaches focus on analysing textual references on the World Wide Web that provide additional support for the accuracy of the facts. We expect that this will further improve the amount of facts we can transfer into Wikidata as well as the accuracy and provide rich references. The work in CrossWikiFact will build on the existing work and reuse it in the pipeline.
  4. Many of the citations in the Wikipedia language editions are not directly attached to the infobox, but referenced in the text. DBpedia has recently started to focus on extracting the article text in structured format (cf. text extraction challenge) including links, tables, equations, etc. and then do relation extraction, as the article text is a much richer source for factual information compared to the infobox, albeit harder to understand using machines. The core work for relation extraction will be funded by Diffbot. For CrossWikiFact, we are primarily interested in the results of the sentences from the article text that contain references. In case, the extracted fact from the sentence with a reference matches the infobox facts from this article or equivalent articles from one of the 120 language versions or existing data in Wikidata, the fact will be relevant for CrossWikiFact.

CrossWikiFact import (transfer of data to Wikidata)[edit]

As the data is sufficiently prepared we will in a first step set up an ingestion workflow via the Primary Source Tool. Beyond these manual approaches we will evaluate the possibility to import a part of the data in bulk. Prerequisites for such selected portions are of course that the data was thoroughly evaluated and references exist. We are confident that the quality of the applied references will be higher than those provided by simple bots which so far are importing statements from Wikipedia language editions.

Data flow in CrossWikiFacts

Project goals[edit]

What are your goals for this project? Your goals should describe the top two or three benefits that will come out of your project. These should be benefits to the Wikimedia projects or Wikimedia communities. They should not be benefits to you individually. Remember to review the tutorial for tips on how to answer this question.


Note: we have left out most of the technical details about what our goals and approaches are respective the data and the references, as the proposal is quite long already.

1. Visibility and Awareness[edit]

The CrossWikiFact website will increase visibility and awareness of data consistency across Wikipedia, Wikidata and DBpedia. We envision the website to be an enabler for these three communities to work together better in the future via this common platform. Visitors can quickly see what data is available and where their help is needed the most. Supporting metrics and other data quality measures will allow to judge the overall progress of unification. Information provided will help Wikipedia editors to better judge correctness/completeness of their current infobox values for their Wikipedia edition by comparing with other versions and Wikidata. Furthermore, any edits made in Wikipedia infoboxes will be visible to other editors and thus allow the spread of information among the three systems.

2. Improvement of Wikipedia Infoboxes[edit]

Wikipedia infoboxes are maintained by Wikipedians that know the guidelines and best practices of their Wikipedia language version best. The CrossWikiFact website will leave the final decision on what to include in their infoboxes to these editors. We see the main goal of our project as a support tool that will provide better information to editors. Besides the facts shown in our prototype, DBpedia also has extensive technical information about which template is used with which values on which Wikipedia pages, which can be exploited. Editors can receive suggestions and snippets that they can copy into Wikipedia pages, which will greatly ease their editing effort. Overall, we target a higher degree of automation for infobox edits and an increased usage of Wikidata in Wikipedia.

3. Improvement of Wikidata[edit]

Statements on Wikidata items are primarily edited by Wikidatans, whereas data donations (such as the Freebase dataset) are to be ingested via the Primary Sources Tool. The CrossWikiFact project will contribute to Wikidata in form of a dataset containing verified statements with respective references. These facts can then be ingested via the Primary Sources Tool in order to add missing statements to Wikidata and to add references to already existing claims. Existing statements in Wikidata which already reference DBpedia itself or a specific Wikipedia language edition can be supplemented with more reliable references, e.g. the citations found in respective Wikipedia articles. These additions will increase the completeness and trustworthiness of Wikidata statements. Beyond the data contributions created during this project the software stack will be made available for continuous application and improvement.

Project impact[edit]

How will you know if you have met your goals?[edit]

For each of your goals, we’d like you to answer the following questions:

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)

For each of your answers, think about how you will capture this information. Will you capture it with a survey? With a story? Will you measure it with a number? Remember, if you plan to measure a number, you will need to set a numeric target in your proposal (i.e. 45 people, 10 articles, 100 scanned documents). Remember to review the tutorial for tips on how to answer this question.


1. Visibility and Awareness[edit]

  1. Output: The CrossWikiFact website is already online and will stay online during the whole project lifetime. Measures that will guide the development are community feedback and fulfilled feature requests. Over the project duration, we hope to incorporate over 100 issues (bug fixes / feature requests) from community channels (mailing lists, wiki discussion fora and issue tracker). Another important measure of success is the number of visitors to the website. We target to have over 500 unique visitors per month on the website at project end. We expect these visitors to be core editors that can work with the website effectively as a hub to be redirected and improve the data landscape in Wikipedia and Wikidata.
  2. Outcome: While the development of the website is funded by this proposal, the DBpedia Association is able to provide further hosting and maintenance after project end. We see the data and the tools of DBpedia as instrumental for the website initially. In the long run, DBpedia also benefits from better data and structure in Wikipedia and Wikidata, thus creating an incentive to maintain and further develop the system created here. Overall, we hope that this will bring the communities closer together (we can not give a number for this, however).

2. Improvements of Infoboxes[edit]

  1. Output: On the CrossWikiFact protoype, we are only using Wikidata identifiers as links currently. It is a very step, however, to link to the individual article pages where the values have been found. This allows visitors to be redirected directly to the infoboxes, Wikidata provides such interlanguage links. Visitors might be interested in a specific language only and we can implement filters, that allow to focus on one language. Regarding the found differences, we have to distinguish between values that occur once maximal (examples are birthdates, geocoordinates, married to) and multi-valued properties. Information about this can be loaded from the DBpedia Ontology. For single-valued properties, we can assume that only one value should be predominant in all Wikipedias and thus highlight them specifically as an item needing attendance (There are few exceptions like people with several spouses, but these can be marked). In a later step, we can also link references into the website, so editors can add missing sources or compare and verify sources. As a feature, visitors can also receive Infobox snippets and suggestions that they can copy & paste into the articles.
  2. Outcome: We expect that the CrossWikiFact website will work very well for single-valued properties. Therefore, we assume to reach a very high uniformity measure after 6 months. Normally, for these single-valued properties a majority agreement has been reached, when at least 6 or 7 opinions agree. Uniformity here is then measured as the (number of values that divert from the majority - total number of values) / total number of values. If there is conformity, this value is 100%. We will constantly measure this value on a monthly basis and expect an average value higher than 90% at the end of the project, meaning that 90% of single-valued properties will be consistent in all Wikipedias and Wikidata. Another important measurement is the adoption of Wikidata-based templates in Wikipedia. We can directly measure the amount of Wikidata-infobox snippets generated from our Website and indirectly by the Wikidata usages we find with the DBpedia extraction framework. No projected measures are given now. But progress will be tracked starting month 6 of this proposal.

3. Improvements of Wikidata[edit]

  1. Outcomes: In order to improve the completeness and trustworthiness of Wikidata statements, we will (1) add high quality statements to Wikidata and (2) reference these statements and statements which are already present in Wikidata with appropriate primary sources. This final goal is preceded by several steps. Therefore, we measure the success of this goal with two numeric indicators. The number of statements ready for inclusion is the amount of statements that are extracted and enriched with sufficient metadata to be in suitable condition for inclusion in Wikidata. It indicates to what extend we can improve Wikidata with the described approach. We intend to provide at least 1,000,000 statements with references capable for citation that can be added to Wikidata. The number of statements added finally indicates to what extend we have improved Wikidata. We intend to add at least 100,000 missing curated statements with proper references into Wikidata. A large amount of existing Wikidata statements has missing or insufficient references, which shall be backed by reliable primary sources. On condition that these completions can be made automatically[1], we foresee to add at least 500,000 missing references to already existing statements.
  2. Output: In this project a workflow will be set up, that generates valuable datasets for ingestion to Wikidata. This dataset has to be of high quality and therefore must obey the following data quality rules: facts should be backed by multiple (2+) Wikipedia language editions, there should be no or only slight (<20%) contradiction between different languages editions, facts need a reference in at least one language edition, and the references should be sufficiently described. The software created during this project will be made available for further improvement and application. As DBpedia is continuously improving its data and reference extraction capabilities, the CrossWikiFact tool chain will show its value in the long run as data is curated via the Primary Sources Tool. It is therefore of great importance to take up on the community to the development of the involved processes. We will provide a workflow to provide continuously updated data for future ingestion.

Do you have any goals around participation or content?[edit]

Are any of your goals related to increasing participation within the Wikimedia movement, or increasing/improving the content on Wikimedia projects? If so, we ask that you look through these three metrics, and include any that are relevant to your project. Please set a numeric target against the metrics, if applicable.


See above.

Project plan[edit]

Activities[edit]

Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?

Our proposal involves the following eight tasks:

ID Title Description Month
A1 Data and Reference Acquisition DBpedia will extensively focus on innovating their data cleansing, linking, and fusion methods. It concludes with an initial Fusion dataset. M1-M3
A2 Source References for Facts (1) Identify source references for facts within Wikipedia infoboxes. It concludes with an initial Ingestion ready dataset. M4-M6
A3 Source References for Facts (2) Identify source references for facts within Wikipedia articles using the Diffbot API. It concludes with reference additions to the Ingestion ready dataset. M4-M9
A4 Third-party Data and Reference Integration Integrate facts from and references to external datasets. It concludes with data and reference additions to the Ingestion ready dataset. M6-M9
A5 CrossWikiFact website Extend the current early prototype with new features (language filter, snippet suggestion). It concludes with a user-friendly CrossWikiFact website. M1-M6
A6 CrossWikiFact WikiData ingest Develop the workflow to populate Wikidata via the Primary Sources Tool and through a bot. It delivers data available in the Primary Sources Tool in M6 and concludes with bot-ingestable data in M9. M4-M9
A7 CrossWikiFact Sprints Conduct two CrossWikiFact Sprints with the help of the community. Execute and evaluate one sprint in M6 and a second sprint in M9. M6 & M9
A8 Community dissemination Promote the project and present the CrossWikiFact website on different community events. M1-M9

Budget[edit]

How you will use the funds you are requesting? List bullet points for each expense. (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!

Budget table

The total amount requested is 45,120€/53,420 USD.

Item number Category Item Description Number of units

for 12 months

Total cost notes Source of funding
1 Personnel cost Software developer time 9PM 45,120€ full time position (40 hrs/week), 9 PM This grant
2 Personnel cost Software developer time 4.5PM 24,435€ half time position (20 hrs/week), 4.5 PM spread over 9 months Diffbot
3 Personnel cost Data Acquisition Support 4.5PM 24,435€ 2 developers from the DBpedia association working on extraction and fusion will support the project. We expect that this will require a workload of 10h/week each. DBpedia Association
4 Travel Travel budget and accommodation. 1 2,500€ Travel budget for the developer to go to the Wikimania 2018 (Cape Town). DBpedia Association
5 Travel Travel budget and accommodation. 1 1,000€ Travel budget for the developer to go to the WikidataCon 2018. DBpedia Association
6 Equipment Laptop 1 1,000€ Used by the developer during his work. DBpedia Association
Project support

obtained from Diffbot

24,435€
Project support obtained from DBpedia 28,935€
Project funding requested

from Wikimedia Foundation

45,120€
Total project cost 98,490€
Total amount requested 45,120€/53,420 USD

Community engagement[edit]

How will you let others in your community know about your project? Why are you targeting a specific audience? How will you engage the community you’re aiming to serve at various points during your project? Community input and participation helps make projects successful.

The community engagement strategy aims to provide communication and exchange platforms to discuss the progress of the project and to interface with users and gather their feedback. Especially, data science researcher and developers will be involved and will be asked to give feedback. The community engagement process will include postings on social media and mailing lists as well as presentation of the project results at community events and conferences.

DBpedia Community Meetings

We will present this proposal and our ideas regarding the CrossWikiFact website at the 11th DBpedia Community Meeting in Cupertino, as well as at the following bi-annually held Community Meetings.

Wikidata Conference 2018

We aim to have the preliminary prototype by WikidataCon 2018 and gather feedback for the development.

Wikimania 2018

We will present the tool which supports the editing of infoboxes at the Wikimania 2018 in Cape Town.

CrossWikiFact Sprint

In addition to the community events, we will send out a Call for Contribution: members of the communities are asked to make use of the Primary Source Tool to bring ingestion ready data (statements and references) to Wikidata. DBpedia has good experiences with user involvement when calling for the annual Mapping Sprint where users contributed mappings of Wikipedia templates to the DBpedia ontology.

The following communities (without any claim to completeness) will be notified and will be involved in the project:

Strategic Partners[edit]

We target collaboration with the following list of partners to maximize the outcomes of this project:

Get involved[edit]

Participants[edit]

Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.

DBpedia Association

The DBpedia Association was founded in 2014 to support DBpedia and the DBpedia Community. Since then we are making steady progress towards professionalizing DBpedia for its users and forming an effective network out of the loosely organised DBpedia Community. The DBpedia Association is currently situated in Leipzig, Germany and affiliated with the non-profit organisation Institute for Applied Informatics (InfAI) e.V.

Diffbot

Diffbot is a team of AI engineers building a universal database of structured information to provide knowledge as a service to all intelligent applications. The current size of this knowledge base comprises 1.3 billion entities. The team extract data from web pages by using new and innovative content extraction methods, often achieving better-than-human results.

TBA (Software Developer)

  • Software Development, Data Science, Frontend Development
  • Skills: Scala programming, deep knowledge about DBpedia and Wikidata
  • Developer will be hired/selected from the community.

TBA (Software Developer at Diffbot)

  • Post-doc position concerned with relation extraction in article text, head of the DBpedia NLP department.

Markus Freudenberg (DBpedia Release Manager)

  • Data Extraction and Preparation, Markus has developed DataID as part of his work at DBpedia and is currently integrating SPARK into the UnifiedViews and the DBpedia Extraction Framework

Julia Holze (DBpedia Association)

  • Organization & Community Outreach, support in organizing and spreading the CrossWikiFact Sprints

Sebastian Hellmann (DBpedia Association and AKSW/KILT) has completed his PhD thesis under the guidance of Jens Lehmann and Sören Auer at the University of Leipzig in 2014 on the transformation of NLP tool output to RDF. Sebastian is a senior member of the “Agile Knowledge Engineering and Semantic Web” AKSW research center, which currently has 50 researchers (PhDs and senior researchers) focusing on semantic technology research – often in combination with other areas such as machine learning, databases, and natural language processing. Sebastian is head of the “Knowledge Integration and Language Technologies (KILT)" Competence Center at InfAI. He also is the executive director and board member of the non-profit DBpedia Association. Sebastian is contributor to various open-source projects and communities such as DBpedia, NLP2RDF, DL-Learner and OWLG and wrote code in Java, PHP, JavaScript, Scala, C & C++, MatLab, Prolog, Smodels, but now does everything in Bash and Zsh since he discovered the Ubuntu Terminal. Sebastian is the author of over 80 peer-reviewed scientific publications (h-index of 21 and over 4300 citations) and started the Wikipedia article about Knowledge Extraction.

Magnus Knuth (DBpedia Head of Technical Development) is research member of the AKSW research group at Leipzig University and former member of the "Semantic Multimedia" research group at Hasso Plattner Institute. Magnus has proficient knowledge in data extraction pipelines and Linked Data focussing on data quality and change management.

Community notification[edit]

You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.--> Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?

Wikimedia Community

DBpedia Community

Social Media

Endorsements[edit]

Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).

  • Support Support As a commercial vendor of semantic solutions and text analytics, Ontotext is highly interested in this proposal. Wikidata and DBpedia have complementary strengths (quality vs breadth of data respectively), and effective Data Fusion between them will raise the state of Wikipedia-based LOD significantly. We have again and again struggled with such data fusion, and it would be great if we can use it "from the source". Being a member of the DBpedia Association and active contributor to DBpedia Data Quality and Ontology activities, we'd appreciate being directly involved in this work as well. --Vladimir Alexiev (talk) 11:07, 25 September 2017 (UTC)
  • Support Support This proposal is an important initiative for both Wikipedia and Wikidata, and is critical to supporting outreach with a valuable partner to improve content within our community in both a strategic as well as data-focused way. This project will support the smart forward growth of our communities. Our metadata would become so much more highly usable -- and most importantly this project would establish crosswalks and pathways. -- Erika aka BrillLyle (talk) 13:14, 25 September 2017 (UTC)
  • Support Support This is a very interesting and useful project, that aims to use DBpedia (which is basically built upon Wikipedia) as input to Wikidata and then back to Wikipedia. Gpublio (talk) 15:02, 25 September 2017 (UTC)
  • If this proposal works it will be very useful. Even a partial solution can be useful. However, this is a major proposal with several components - getting sources for more DBpedia facts from Wikipedia, getting sources for more (DBpedia) facts from external sources, integrating DBpedia facts and Wikidata facts (including handling sub-properties, matching literal values against objects, and handling slop in literal values). Each of these by themselves is ambitious. There appears to be some tools that can be used for part of the effort. It would be very nice to see how these tools work (perhaps hand-massaging the inputs to the tool and outputs from the tool) on several prominent pages, e.g., Berlin, Douglas Adams, and Vegetable, showing how they can help in easy cases, e.g., GDP/Nominal for Berlin, in hard cases, e.g., Area and Population for Berlin, and where errors show up, e.g., type for Tree in Italian DBpedia. Peter F. Patel-Schneider (talk) 23:25, 25 September 2017 (UTC)
  • I think it addresses a highly relevant issue in Wikipedias. Jakub.klimek (talk) 15:15, 27 September 2017 (UTC)
  • because it will allow editors to spot differing values and thus help them to quickly edit Wikidata or Wikipedia. Asanchez75 (talk) 20:35, 27 September 2017 (UTC)
  • Support Support - great initiative, for the Wikipedia-Wikidata bridge, Wikidata consistency and the future of fact-checking (!). Valuable partners and fundations. Krauss (talk)
  • Support Support Not only that the proposal is exciting from the research and innovation point of view, but as a professor lecturing on linked open data, I see it as an excellent demonstration of the synergy between different open data initiatives grafted upon Wikipedia, thus making the overall LOD technology more convincing as being endorsed by an agile and coherent community. This is what we have been somewhat lacking for years! svatek (talk)
  • Support Support − it is great to see more collaboration between DBpedia and Wikidata. This project would definitely fill an obvious gap: a lot of effort has been put on DBpedia's side into the extraction of information from Wikipedia, and Wikidata has been re-doing some of this work independently. The proposal looks pragmatic and sound. − Pintoch (talk) 13:22, 4 October 2017 (UTC)
  • Support Support - great initiative aimed at bridging existing gaps between Wikipedia, Wikidata and DBpedia. Improving data quality and provenance will be a major benefit for humans and machines using the available data in these environments. Enno Meijers
  • Support Support -- it sounds like a great idea to have greater ties between DBPedia and Wikidata. --A3nm (talk) 12:29, 31 October 2017 (UTC)
  • Support Support ChristianKl (talk) 14:31, 4 November 2017 (UTC)
  • * Support Support -- GerardM (talk) 11:19, 7 November 2017 (UTC)
  • Support Support -- Jimkont (talk) 09:19, 20 November 2017 (UTC)
  • d:Wikidata:Requests_for_permissions/Bot