Grants:Project/DBpedia/GlobalFactSyncRE/Timeline/Tasks
Appearance
Next GFS Call
[edit]Tuesday, Oct. 22nd @ 1:30pm
Tasks
[edit]Second Release
[edit]- (Johannes) Mapping package/snapshot/protoype
- 1. problem analysis
infobox param -> DBpedia property <->/-> Wikidata property -------------------------------------------------------- infobox param <-> Wikidata property (publish with release)
- https://docs.google.com/document/d/17hvTvIcnlPKe9LREx11ffOjZYtc0JUc9vI7-LHg0pm0/edit
- https://docs.google.com/document/d/1ZtpI1HCdjbBRZWPhjuwEoqmq3Kk9IL9tflRiaiDYYi0/edit
- 2. (later) inclusion of DBpedia into Wikidata (sameaAs and owl:equivalent(P|C))
- Wikidata non-adoption report (count of properties extracted by generic extraction 580 millions) - (Sebastian)
- measures all values of infobox parameters -> this infobox doesn't use wikidata for this parameter
- add template counterexample here (@Lewoniewski:)
- Previous email: https://lists.wikimedia.org/pipermail/wikidata/2018-December/012681.html
- Create a databus-client docker to load GFS and references (Marvin, Johannes)
- integrate parser of Wikipedia Citation template into Python code (Wlodzimierz)
- https://en.wikipedia.org/wiki/Template:Citation/gfs
- (potentially) reach out to WikiCite community (Wlodzimierz)
- reach out to Pasleim / Harvest Template (Wlodzimierz)
- integrate reference information and information about reference/language popularity into the prototype (Johannes, Marvin, Wlodzimierz)
- external data needs to be mapped to infobox
- DONE create statistics about domains and URLs of references in Wikipedia infoboxes and Wikidata (Wlodzimierz)
- find most popular references and check if its data is in a downloadable format (Tina)
- find fast way to integrate external sources
Study / Scouting for good examples
[edit]preliminary study of sync targets
- integration of MusicBrainz:
- mapping of 5 properties (Johannes)
- (potentially) contact user Jc86035 (Johannes)
- deploy web-service that shows mappings (Johannes)
- integration of MusicBrainz into FlexiFusion (Marvin)
- define a set of sync targets to start testing the GFS Data Browser (Sebastian, Tina)
- suitable properties from NBA players (weight, height, birthplace)
- realease data from music albums
- geo-coordinates
- population counts (French or Polish cities?)
- improve mappings for the set of sync targets (Johannes, Marvin)
Dissemination plan
[edit]- Wikidatacon 25 – 26 October 2019 | Berlin, Germany (Marvin, Sebastian)
- Open: Village pumps and Wikicite
Other
[edit]Back-end:
- check out Scala (Johannes, Wlodzimierz)
- Can template extractions in the extraction framework be used with python code?
- new wikidata release (Marvin)
- find best structure of the references
Front-end:
- GFS Data Browser:
- percent-encoded URIs not readable in GFS data browser app
- development of better statistical tool (Marvin/Jan?)
- tool/query to find the most likely errors (Marvin)
Misc.:
- post GFS challenge (Tina)
Completed Tasks
[edit]Getting ready:
- DONE accounts (Tina)
- DONE establish means of communication for the group (Tina, Sebastian)
- DONE make GFS server ready, @JohannesFre: any news on this? (Sebastian)
- DONE Wikimania presentation format specification (Johannes)
- DONE (Johannes/Sebastian) enable more/all extractors - provide list of possible values for extractors
- DONE move Python code from https://git.informatik.uni-leipzig.de/kwecel/infoboxes-refs to github https://github.com/dbpedia/
- DONE web-server (Marvin)
First Release:
- DONE provide Python code for reference extraction (Krzysztof, Wlodzimierz)
- DONE deployment of Mongodb prefusion deployed (Marvin)
- DONE publish reference dump and deploy a micro-service for current Python extraction as is `?article=http://en.wikipedia.org/wiki/Arthur_Schopenhauer` outputs csv as is (Wlodzimierz)
- DONE deploy DIEF (extraction framework) micro-service on the GFS server (Johannes)
- DONE (blogpost) - Mongodb prefusion - example queries (Marvin)
- DONE - Study and Categorization (Tina)
Study / Scouting for good examples:
- DONE - see preliminary study of sync tartgets
- problem: four layers of complexity: Subject variation / fixed vs. varying property / reference (inferred from 1 and 2) / normalisation of values (currency, inch/cm, ...)
- NBA Players and Cloud types (Tina)
- Videogames (easy disambiguations)
- films 100k budget is fixed and revenue parameter varies in language
- Cars & Products (complex)
- organisations (page for a group)
- Sports
- Cities (easy disambiguation)
- Difficult examples:
- subjects/articles are of a different granularity
- city & population: core, close area and county
- subjects/articles are of a different granularity
- integration of MusicBrainz:
- DONE check how well it is mapped (Johannes)
- DONE check NBA sources using google structured data tool - see here (Tina)
Second Release:
- DONE edit FactualConsensusFinder so user can insert Wikipedia URIs (Marvin)
Exploitation/Dissemination:
- DONE draft release note
- DONE news + feedback squad, talk page, email, lists
- DONE Wikimania 16-18 August | Stockholm, Sweden (Johannes will go)
- DONE DBpedia Day | 12 September 2019 | Karlsruhe, Germany (Wlodzimierz)
Other:
Back-end:
- ...
Front-end:
- Factual Consensus Finder:
- DONE UI needs of average user (Tina)
Misc.:
- DONE check your profile and edit if necessary (everyone)
- DONE write project announcement (Sebastian, Tina)