Grants:Project/DBpedia/GlobalFactSyncRE/Timeline/Tasks

From Meta, a Wikimedia project coordination wiki

Next GFS Call[edit]

Tuesday, Oct. 22nd @ 1:30pm

Tasks[edit]

Second Release[edit]

  • (Johannes) Mapping package/snapshot/protoype
1. problem analysis
 infobox param -> DBpedia property <->/-> Wikidata property
 --------------------------------------------------------
 infobox param <-> Wikidata property (publish with release)
2. (later) inclusion of DBpedia into Wikidata (sameaAs and owl:equivalent(P|C))


  • Create a databus-client docker to load GFS and references (Marvin, Johannes)
  • integrate parser of Wikipedia Citation template into Python code (Wlodzimierz)
  • https://en.wikipedia.org/wiki/Template:Citation/gfs
  • (potentially) reach out to WikiCite community (Wlodzimierz)
  • reach out to Pasleim / Harvest Template (Wlodzimierz)


  • integrate reference information and information about reference/language popularity into the prototype (Johannes, Marvin, Wlodzimierz)
  • external data needs to be mapped to infobox
  • DONE create statistics about domains and URLs of references in Wikipedia infoboxes and Wikidata (Wlodzimierz)
  • find most popular references and check if its data is in a downloadable format (Tina)
  • find fast way to integrate external sources


Study / Scouting for good examples[edit]

preliminary study of sync targets

  • integration of MusicBrainz:
    • mapping of 5 properties (Johannes)
    • (potentially) contact user Jc86035 (Johannes)
    • deploy web-service that shows mappings (Johannes)
    • integration of MusicBrainz into FlexiFusion (Marvin)
  • define a set of sync targets to start testing the GFS Data Browser (Sebastian, Tina)
    • suitable properties from NBA players (weight, height, birthplace)
    • realease data from music albums
    • geo-coordinates
    • population counts (French or Polish cities?)
  • improve mappings for the set of sync targets (Johannes, Marvin)


Dissemination plan[edit]

  • Wikidatacon 25 – 26 October 2019 | Berlin, Germany (Marvin, Sebastian)
  • Open: Village pumps and Wikicite


Other[edit]

Back-end:

  • check out Scala (Johannes, Wlodzimierz)
    • Can template extractions in the extraction framework be used with python code?
  • new wikidata release (Marvin)
  • find best structure of the references


Front-end:

  • GFS Data Browser:
    • percent-encoded URIs not readable in GFS data browser app
    • development of better statistical tool (Marvin/Jan?)
    • tool/query to find the most likely errors (Marvin)


Misc.:

Completed Tasks[edit]

Getting ready:

First Release:

  • DONE provide Python code for reference extraction (Krzysztof, Wlodzimierz)
  • DONE deployment of Mongodb prefusion deployed (Marvin)
  • DONE publish reference dump and deploy a micro-service for current Python extraction as is `?article=http://en.wikipedia.org/wiki/Arthur_Schopenhauer` outputs csv as is (Wlodzimierz)
  • DONE deploy DIEF (extraction framework) micro-service on the GFS server (Johannes)
  • DONE (blogpost) - Mongodb prefusion - example queries (Marvin)
  • DONE - Study and Categorization (Tina)

Study / Scouting for good examples:

  • DONE - see preliminary study of sync tartgets
  • problem: four layers of complexity: Subject variation / fixed vs. varying property / reference (inferred from 1 and 2) / normalisation of values (currency, inch/cm, ...)
    • NBA Players and Cloud types (Tina)
    • Videogames (easy disambiguations)
    • films 100k budget is fixed and revenue parameter varies in language
    • Cars & Products (complex)
    • organisations (page for a group)
    • Sports
    • Cities (easy disambiguation)
  • Difficult examples:
    • subjects/articles are of a different granularity
      • city & population: core, close area and county
  • integration of MusicBrainz:
    • DONE check how well it is mapped (Johannes)
  • DONE check NBA sources using google structured data tool - see here (Tina)

Second Release:

  • DONE edit FactualConsensusFinder so user can insert Wikipedia URIs (Marvin)

Exploitation/Dissemination:

  • DONE draft release note
  • DONE news + feedback squad, talk page, email, lists
  • DONE Wikimania 16-18 August | Stockholm, Sweden (Johannes will go)
  • DONE DBpedia Day | 12 September 2019 | Karlsruhe, Germany (Wlodzimierz)

Other:

Back-end:

  • ...

Front-end:

Misc.:

  • DONE check your profile and edit if necessary (everyone)
  • DONE write project announcement (Sebastian, Tina)