Grants:Project/DBpedia/GlobalFactSyncRE/Timeline/Tasks

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Next GFS Call[edit]

Wednesday August 28, 11am

Tasks[edit]

Second Release[edit]

  • (Johannes) Mapping package/snapshot/protoype
1. problem analysis
 infobox param -> DBpedia property <->/-> Wikidata property
 --------------------------------------------------------
 infobox param <-> Wikidata property (publish with release)
2. (later) inclusion of DBpedia into Wikidata (sameaAs and owl:equivalent(P|C))



  • integrate reference information into the prototype (Johannes, Marvin, Wlodzimierz)

Study / Scouting for good examples[edit]

preliminary study of sync targets

  • integration of MusicBrainz:
    • mapping of 5 properties (Johannes)
    • (potentially) contact user Jc86035 (Johannes)
    • deploy web-service that shows mappings (Johannes)
    • integration of MusicBrainz into FlexiFusion (Marvin)
  • define a set of sync targets to start testing the GFS Data Browser (Sebastian, Tina)
  • improve mappings for the set of sync targets (Johannes, Marvin)

Dissemination plan[edit]

  • news + feedback squad, talk page, email, lists
  • Wikimania 16-18 August | Stockholm, Sweden (Johannes will go)
  • Wikidatacon 25 – 26 October 2019 | Berlin, Germany (open, not Johannes)
  • Open: Village pumps and Wikicite

Other[edit]

Back-end:

  • check out Scala (Johannes, Wlodzimierz)
    • Can template extractions in the extraction framework be used with python code?
  • new wikidata release (Marvin)
  • find best structure of the references


Front-end:

  • GFS Data Browser:
    • development of better statistical tool (Marvin/Jan?)
    • tool/query to find the most likely errors (Marvin)


Misc.:

Completed Tasks[edit]

Getting ready:

First Release:

  • DONE provide Python code for reference extraction (Krzysztof, Wlodzimierz)
  • DONE deployment of Mongodb prefusion deployed (Marvin)
  • DONE publish reference dump and deploy a micro-service for current Python extraction as is `?article=http://en.wikipedia.org/wiki/Arthur_Schopenhauer` outputs csv as is (Wlodzimierz)
  • DONE deploy DIEF (extraction framework) micro-service on the GFS server (Johannes)
  • DONE (blogpost) - Mongodb prefusion - example queries (Marvin)
  • DONE - Study and Categorization (Tina)

Study / Scouting for good examples:

  • DONE - see preliminary study of sync tartgets
  • problem: four layers of complexity: Subject variation / fixed vs. varying property / reference (inferred from 1 and 2) / normalisation of values (currency, inch/cm, ...)
    • NBA Players and Cloud types (Tina)
    • Videogames (easy disambiguations)
    • films 100k budget is fixed and revenue parameter varies in language
    • Cars & Products (complex)
    • organisations (page for a group)
    • Sports
    • Cities (easy disambiguation)
  • Difficult examples:
    • subjects/articles are of a different granularity
      • city & population: core, close area and county
  • integration of MusicBrainz:
    • DONE check how well it is mapped (Johannes)
  • DONE check NBA sources using google structured data tool - see here (Tina)

Second Release:

  • DONE edit FactualConsensusFinder so user can insert Wikipedia URIs (Marvin)

Exploitation:

  • DONE draft release note

Other:

Back-end:

  • ...

Front-end:

Misc.:

  • DONE check your profile and edit if necessary (everyone)
  • DONE write project announcement (Sebastian, Tina)