WikiCite 2016/Proposals/Generation of referenced Wikidata statements with StrepHit

From Meta, a Wikimedia project coordination wiki



Data quality in Wikidata is crucial and references to trustworthy third-party sources are a way to ensure it. Lots of Wikidata statements are either unsourced or sourced to Wikimedia sister projects (typically Wikipedia via bots). Adding references to such small units of information may be a cumbersome task for human editors.

StrepHit wants to relieve this effort: it is a Natural Language Processing system that reads documents across reliable Web sources and produces referenced Wikidata statements.


  1. Play with the current StrepHit dataset: biographies in English;
  2. create and fill a Request for Comments;
  3. encourage referenced data donations through the primary sources tool:


Install the primary sources tool gadget to check out the StrepHit dataset: instructions at wikidata:Wikidata:Primary_sources_tool#How_to_use

Skills needed[edit]

  • Basic understanding of how Wikidata works;
  • communication strategies for community engagement, in order to:
    • raise awareness of StrepHit's potential impact;
    • attract new primary sources tool users.

Phabricator task[edit]

None yet.

See also[edit]