Grants:Project/DBpedia/GlobalFactSyncRE/Timeline

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Project Grants This project is funded by a Project Grant

proposal people timeline & progress finances midpoint report


Timeline for DBpedia[edit]

Timeline Date
Study (choose two initial sync targets and analyse the lack of references in Wikidata) Day Month Year
GlobalFactSync tool (extend the current prototype with new features) Day Month Year
Mapping Refinements Day Month Year
GlobalFactSync WikiData ingest Day Month Year
GlobalFactSync Sprints Day Month Year


Monthly updates[edit]

Please prepare a brief project update each month, in a format of your choice, to share progress and learnings with the community along the way. Submit the link below as you complete each update.

Current tasks[edit]

A log of current tasks is kept here. Ongoing discussions should be held using the corresponding discussion page.

(Preparation) April/May[edit]

June 2019 (official start)[edit]

July 2019[edit]

First Release Report: A first release containing detailed information about our micro-services is published on the DBpedia Blog

Containing:

  • First success story
  • Deployment of first micro-services on the server
  1. Initial User Interface here
  2. PreFusion JSON API here (user: read, pw: gfs)
  3. Reference Extraction Service here
  4. Reference Data Download here
  5. Infobox Extraction Service here
  6. ID service here
  1. definition of a set of problems with different layers of complexity
  2. analysis of various groups of subjects with respect to these synchronization problems

August 2019[edit]

  • Continuing improvements of the first deployments, which will be an ongoing process. Especially the GFS Data Browser is being worked on:
    • users can now insert any Wikipedia URL into the subject search field
    • overall layout improvements
    • reference information is being added
  • Johannes Frey presented the GFS project at Wikimania
  • We created a news page within our Meta-Wiki project page framework for volunteers to keep them in the loop and encourage exchange. So far this has lead to three more volunteers signing up for our 'GFS Feedback Squad' and two users leaving feedback about our sync target study.

September 2019[edit]

  • more work towards sync target study, focus on targets that were brought up by Wikidata users (e.g., geo coordinates, employer, nobel price)
  • intensive work on creating the complement to Wikidata and Wikipedia by collecting and providing data that is currently missing in both

October 2019[edit]

November 2019[edit]

  • re-extraction of GFS data and fusion
  • some work on the UI
  • identifying and testing ways to generate lists of the Wikipedia articles related to selected topics: categories, infoboxes, Wikidata queries and other articles (lists).

December 2019[edit]

  • extraction of reference data for Polish cities; studied sources: BDL - Bank Danych Lokalnych, Wikipedia, Wikidata
  • analysis of available mappings between various geographical identifiers for Polish administrative units
  • GFSre - reference datasets for Polish cities.pdf
    showing current understanding of the fusion challenge

January 2020[edit]

February 2020[edit]

March 2020[edit]

  • experiment prototype for improved harvesttemplate
    • index Infoboxes / Templates

April 2020[edit]

  • experiment prototype for improved harvesttemplate
    • index Infoboxes / Templates

May 2020[edit]

  • watch for feedback of new mockup

June 2020[edit]

  • incorporate demo (hard-coded) references view into GFS browser using the novel JSON references dump

Planned Next Steps for July, August and September 2020[edit]

  • incorporate demo (hard-coded) references view into GFS browser using the novel JSON references dump
  • GFS browser features
    • include mapping management to allow search for properties of new external sources


Is your final report due but you need more time?



Extension request[edit]

September 30, 2020[edit]

In the last month output of our project was quite invisible as we 1. worked a lot on the data 2. had to deal with corona and all its consequences like missing child care. On the good side, we have quite a lot of budget (9000€) left and would like to stretch the project for four months like a budget-neutral extension. We still need time until end of September 2020. Project-wise we found this dump: enwiki-20200401-wbc_entity_usage.sql.gz

- Tracks which pages use which Wikidata items or properties and what aspect (e.g. item label) is used. So we see it realistic to provide the following:

- We have one of the best infobox parsers and we have full information about all properties there. This means we can produce a reliable Wikidata adoption report, which show how much Wikidata is adopted, where it is well adoption in Wikipedia and where it can be improved.

- We can use this to calculate "good imports" from Wikipedia to Wikidata, i.e. where data in WP infoboxes is especially plentiful and well referenced, but missing in Wikidata

- With the improvements on https://tools.wmflabs.org/pltools/harvesttemplates/ we would have a powerful User Interface to exactly tackle these spots

In addition, we started to index authoritative datasets that are often referenced in WP and WD. Taking this data from the source, we can build an interface, e.g. a user script to suggest relevant data points from these data sets to users for inclusion. This part might be experimental, but it would work like this: On https://pl.wikipedia.org/wiki/Pozna%C5%84 Populacja (30.06.2019) • liczba ludności 535 802[3]

[3] is the population count from stat.gov.pl holding the official census for Poland. If this gets updated, we might be able to autodetect that a change is required either in the infobox or on Wikidata (that is up to the community policy).

This will not be complete, but it will probably work for 10-50 million entries in Wikipedia and Wikidata, depending on the quality of the source and how official it is. In the next few month we need to work on the following topics:

- incorporate demo (hard-coded) references view into GFS browser using the novel JSON references dump

- GFS browser features

- include mapping management to allow search for properties of new external sources

@Juliaholze: Hi Julia, thanks for this request and context over your remaining budget as well as the disruptions you experienced due to the pandemic. We can appreciate that work on the project needed to be paused in order to focus on other, more important priorities, as we have experienced these same needs at the Wikimedia Foundation as well. This extension until 30 September 2020 to complete the above activities is formally approved. Your final report will be due on 30 October 2020. I JethroBT (WMF) (talk) 21:25, 6 July 2020 (UTC)
@JethroBT (WMF): Hi Chris, many thanks for your reply. We will complete the above activities and tasks.

Extension request[edit]

November 30, 2020[edit]

We would like to request another budget-neutral extension. The main reason is very similar to the previous one. We are currently in the process of adding many authoritative datasets to the GFS browser, which will then enable to have "official" data from the appropriate sources to be included into Wikipedia/Wikidata. In the next two months we need to work on the following topics:

  • GFS browser features
  • include mapping management to allow search for properties of new external sources

Please also see our email to the WMF Grants Administrator.

Extension request approved[edit]

This request is approved. Your new Project end date is November 30, 2020, and your Final Report is due on December 30, 2020.

Marti (WMF) (talk) 19:08, 15 October 2020 (UTC)