Grants:Project/ContentMine/WikiFactMine/Planning

From Meta, a Wikimedia project coordination wiki

This page is for rough planning of the broken down tasks that need to be completed. Eventually this will become the Timeline once we've also also got a better handle on the order and length of time it may take to complete the individual tasks. This page has particular emphasis on the software development work as opposed to the Wikimedian in residence or outreach to the scientific community.

Workflow of creating a pool of facts that is updated daily[edit]

  • Port the existing workflow (called canary) to tool labs if possible
    • Gain access to the elasticsearch cluster on tools lab
    • Alter canary to run on the grid. Either:
      • Secure the unsecured web interface and run on the web grid
      • Rewrite components into commandline tools
    • Ensure it runs daily without intervention
  • Build a Fact pool
    • decide if it will at first be either an ES DB or MySQL

API to return facts by day loaded into fact pool[edit]

  • Build API that can take a date/date range and return list of facts

Tool to present and browse facts by day[edit]

  • Adapt factvis tool so it can also load facts by AJAX from api rather than disk or static http place.
  • Build interface for selecting date/date range to pass to API

API to return relevant papers when queried with a Wikidata ID[edit]

  • API to take Wikidata ID and return all facts in papers that include this Wikidata ID
  • Sort these facts by number of facts per paper

Gadget to suggest papers relevant to article/item on Wikipedia/Wikidata[edit]

  • Small tool in the sidebar to suggest top n papers by occurrence of Wikidata item

API to return papers which have a co-occurance of Wikidata IDs[edit]

  • API to take two Wikidata IDs and return fact in all papers that contain both IDs
  • Decide a way to rank this co-occurrence

Tool to suggest References for Unreferenced Wikidata Statements[edit]

  • Distributed Wikidata the Game extension for suggesting references
    • It should auto include the reference from the metadata we have downloaded
    • It should show the relevant sections of an open access paper to aid editor decisions

API to return Wikidata items related to papers[edit]

  • API to take Wikidata id of paper
    • Further development would allow selecting paper by external ID such as DOI/PMCID etc..
  • API to return ranked list of facts from that paper
    • Ranking initially by number of facts within that paper