Jump to content


From Meta, a Wikimedia project coordination wiki

Next GFS Call[edit]

Friday August 2, 11am


Getting ready[edit]

First Release[edit]

  • publish reference dump and deploy a micro-service for current Python extraction as is `?article=http://en.wikipedia.org/wiki/Arthur_Schopenhauer` outputs csv as is (Wlodzimierz)
  • deploy DIEF (extraction framework) micro-service on the GFS server (Johannes)
  • Mongodb prefusion - example queries (Marvin)
  • DONE Study and Categorization (Tina)
  • Wikidata non-adoption report (count of properties extracted by generic extraction 580 millions) - (Sebastian)

Second Release[edit]

  • (Johannes) Mapping package/snapshot/protoype
1. problem analysis
 infobox param -> DBpedia property <->/-> Wikidata property
 infobox param <-> Wikidata property (publish with release)
2. (later) inclusion of DBpedia into Wikidata (sameaAs and owl:equivalent(P|C))

Study / Scouting for good examples[edit]

  • DONE preliminary study of sync targets
  • integration of MusicBrainz:
    • check how well it is mapped (Johannes)
    • mapping of 5 properties (Johannes)
    • contact user Jc86035 (Johannes)
    • integration of MusicBrainz into FlexiFusion (Marvin)


  • Wikimania 16-18 August | Stockholm, Sweden (Johannes will go)
  • Wikidatacon 25 – 26 October 2019 | Berlin, Germany (open, not Johannes)
  • DONE draft release note



  • check out Scala (Johannes, Wlodzimierz)
    • Can template extractions in the extraction framework be used with python code?
  • new wikidata release (Marvin)
  • find best structure of the references


  • Factual Consensus Finder:
    • development of better statistical tool (Marvin/Jan?)
    • tool/query to find the most likely errors (Marvin)


  • DONE write project announcement (Sebastian, Tina)
  • post GFS challenge (Tina)

Completed Tasks[edit]

Getting ready:

  • DONE accounts (Tina)
  • DONE make GFS server ready, @JohannesFre: any news on this? (Sebastian)
  • DONE Wikimania presentation format specification (Johannes)

First Release:

  • DONE deployment of Mongodb prefusion deployed (Marvin)

Study / Scouting for good examples:

  • DONE - see preliminary study of sync tartgets
  • problem: four layers of complexity: Subject variation / fixed vs. varying property / reference (inferred from 1 and 2) / normalisation of values (currency, inch/cm, ...)
  • NBA Players and Cloud types (Tina)
  • Videogames (easy disambiguations)
  • films 100k budget is fixed and revenue parameter varies in language
  • Cars & Products (complex)
  • organisations (page for a group)
  • Sports
  • Cities (easy disambiguation)
  • Difficult examples:
    • subjects/articles are of a different granularity
      • city & population: core, close area and county


  • ...



  • ...



  • DONE check your profile and edit if necessary (everyone)