What is a short name for the activity/project you are proposing?
Improve the quality and coverage of main topics for medical papers in Wikidata and the MeSH (Medical Subject Headings) controlled vocabulary this utilises https://www.nlm.nih.gov/mesh/meshhome.html
- your name and/or Wikimedia username
Jo Brook (User:Jkcm)
- your contact e-mail
"Allow other users to email me" is set
- your nearest city and country
Cambridge, Cambridgeshire, UK
Details of team members (optional)
If this application is for a team project, each additional team member (up to 5 total) should have their name/username, project role, location, and contact details here:
Tell us about your proposed project. What will you get achieved with this time?
Describe why you think this project is important, and how the project is different from your normal volunteer contributions.
Preliminary investigations have identified cases which existing software tools such as Mix'n'Match do not cover. Therefore we propose to build two pieces of software:
- a small bot to significantly increase coverage of MeSH descriptor text taken directly from NLM
- an interactive web tool to allow manual review, corrections and additions of main topics of medical papers (P921)
This SPARQL query shows that (as of 30th Sept 2020) 18897 of these descriptions are currently missing from Wikidata. Completing requires adding a qualifier to an existing Wikidata item which we will do programmatically.
Medical papers are published in great numbers and the MeSH vocabulary is updated annually. These tools would apply to the current backlog of missing items but would help reduce manual work for future updates.
The following volunteer activities would be carried out as a preliminary step to update, cleanse and improve completeness of MeSH data:
- Resolve existing single and unique value P486 constraint violations for D-numbers (diseases) As of 28 August there were 1179 items, (including potential false positives)
- Complete matching into Wikidata for disease (D-number) catalog
Creation of a single new 2018-present Mix'n'Match catalog of D-numbers
A local volunteer (not included in the grant) currently working with MeSH on Wikidata and Wikipedia would assist with ingestion of datasets created in the initial phase of the software, using Quickstatements, Mix'n'Match etc. and provide additional guidance specific to biomedical data.
Proposed activity dates
When will you undertake this project? (This may be two, three or four days. Not necessarily consecutive). The latest allowable date is 1 May 2021. The work would take place across four Fridays in November 2020
Optional: Community members are encouraged to endorse your proposal and leave a rationale here.
- This project would be a valuable addition to our efforts to help editors identify the most suitable sources for medical content on all wikipedias. As a programmer with over 40 years of experience and the Chair of Wikimedia Medicine, I'd be happy to give any assistance, pro bono, that Jo might find useful. --RexxS (talk) 14:15, 5 October 2020 (UTC)