WikiCite 2016/Report/Group 6

From Meta, a Wikimedia project coordination wiki

Group 6: Make Wikidata the central hub on license information on databases[edit]

Room 129
Etherpad: Group 6

Attendees[edit]

  1. Andra Waagmeester (Micelio)

Notes[edit]

Last week Wikidata listed 366 biological databases of which 2 contained statements on licenses: https://twitter.com/andrawaag/status/732503294510129152, after a short shoutout on twitter we managed to increase those numbers. This shows that with a little effort we can make Wikidata the central hub on license metadata.

Original proposal: WikiCite_2016/Proposals/Making_wikidata_a_central_hub_on_data_licensing

Steps involved to add license information[edit]

Steps to add license metadata on biological resources to wikidata:

  • Select a wikidata item from this query, which does not contain a statement on Licenses (P275).
  • Browse the web to find the applicable license and add that as statement to Wikdiata.
  • If no Website is listed please add that as well (website property (P856)

Steps involved to add a metadata on a database to Wikidata

  • Pick a database listed in one the following repositories:
  1. NAR: https://www.oxfordjournals.org/our_journals/nar/database/cap/
  2. Miriam: http://www.ebi.ac.uk/miriam/main/collections/
  3. Biosharing: https://biosharing.org/databases/
  • If the resource is not listed, create that item in Wikidata and add at least that it is a instance of (P31) of Biological database, the official website (P856) and its license (P275)
  • It the license is not listed that item needs to be created as well.
  • Please add references to the source stating the applicable license. jkkhnh

Can a bot do this for, e.g. all DOAJ listed journals? https://doaj.org

Intermediate results[edit]

On June 7th 2016 Wikidata listed 409 biological databases of which 88 lists license information. On May 17th, Wikidata contained statements on 366 biological databases. This increase in less then month, shows the potential Wikidata has to become a central source on data licensing. Harvesting license information remains challenging since it mainly implies manually browsing official websites. Having the license information at a central place such as wikidata, changes that. i.e. accessing license information can be done through WIkidata's query engine. Next steps include (1) continuing enriching wikidata with license information through crowd-sourcing, which mainly is done by reaching out through social media (e.g. twitter) and (2) trying to engage data owners to directly add their meta data to Wikidata. There is a direct benefit. Quite some resources do not explicitly state a data license, which implies a proprietary license (which explains in part the large proportion of proprietary licenses in the figure below). Having licenses listed in Wikidata could act as templates to pick from for prospective data owners.

Bubble chart on licenses listed for biological databases in Wikidata