From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Project name[edit]

What is a short name for the activity/project you are proposing?

Improve the quality and coverage of main topics for medical papers in Wikidata and the MeSH (Medical Subject Headings) controlled vocabulary this utilises

Contact information[edit]

your name and/or Wikimedia username

Jo Brook (User:Jkcm)

your contact e-mail

"Allow other users to email me" is set

your nearest city and country

Cambridge, Cambridgeshire, UK

Details of team members (optional)
If this application is for a team project, each additional team member (up to 5 total) should have their name/username, project role, location, and contact details here:

The activity/project[edit]

Tell us about your proposed project. What will you get achieved with this time?
Describe why you think this project is important, and how the project is different from your normal volunteer contributions.

Preliminary investigations have identified cases which existing software tools such as Mix'n'Match do not cover. Therefore we propose to build two pieces of software:

  1. a small bot to significantly increase coverage of MeSH descriptor text taken directly from NLM
  2. an interactive web tool to allow manual review, corrections and additions of main topics of medical papers (P921)

This SPARQL query shows that (as of 30th Sept 2020) 18897 of these descriptions are currently missing from Wikidata. Completing requires adding a qualifier to an existing Wikidata item which we will do programmatically.

Medical papers are published in great numbers and the MeSH vocabulary is updated annually. These tools would apply to the current backlog of missing items but would help reduce manual work for future updates.

The following volunteer activities would be carried out as a preliminary step to update, cleanse and improve completeness of MeSH data:

  1. Resolve existing single and unique value P486 constraint violations for D-numbers (diseases) As of 28 August there were 1179 items, (including potential false positives)
  2. Complete matching into Wikidata for disease (D-number) catalog

Creation of a single new 2018-present Mix'n'Match catalog of D-numbers

Your qualifications[edit]

Describe how you (and, if applicable, your team)are able to achieve this project. What skills, expertise, and motivation do you have which will enable you to succeed? I am an experienced software developer with over 20 years' professional software development experience and 3 years' current experience in text and data mining, web frontend technologies based on Javascript, node.js, PHP etc. Previous experience of working with Wikidata tools such as Quickstatements and federated Wikibase installations. Previous experience of working with PubMed's APIs and data sources, including processing and importing over 10000 MeSH tree code items into Wikidata (P672).

A local volunteer (not included in the grant) currently working with MeSH on Wikidata and Wikipedia would assist with ingestion of datasets created in the initial phase of the software, using Quickstatements, Mix'n'Match etc. and provide additional guidance specific to biomedical data.

Proposed activity dates[edit]

When will you undertake this project? (This may be two, three or four days. Not necessarily consecutive). The latest allowable date is 1 May 2021. The work would take place across four Fridays in November 2020


Optional: Community members are encouraged to endorse your proposal and leave a rationale here.

  • This project would be a valuable addition to our efforts to help editors identify the most suitable sources for medical content on all wikipedias. As a programmer with over 40 years of experience and the Chair of Wikimedia Medicine, I'd be happy to give any assistance, pro bono, that Jo might find useful. --RexxS (talk) 14:15, 5 October 2020 (UTC)[reply]
  • ...