Cochrane Collaboration-Wikipedia Initiative/Article and citation matching research 2018
Wikipedia editors seek to summarize the best available sources of information in Wikipedia articles. An early bottleneck in Wikipedia article development is identifying reliable sources of information and matching them to likely Wikipedia articles into which editors may summarize the source. Traditionally the suggestion of sources into Wikipedia articles has been a process in which a volunteer with domain knowledge uses their time and labor to present sources in a Wikipedia article as the first step toward integrating information from that source into the article.
Cochrane journals have been favorites among Wikipedia editors in medical articles. As of June 2018 Cochrane offers 7000 up-to-date medical articles. English Wikipedia cites 2000 of these. The Wikipedia community since 2011 at least has sought to summarize and cite most or all Cochrane publications in Wikipedia articles.
This project seeks to use machine learning to make recommendations on which Wikipedia article would be a good place to summarize each Cochrane publication.
The researchers are
Lane Rasberry (user:bluerasberry) is representing the Data Science Institute at the University of Virginia as the graduate student researchers do their project. For more information about the university's Wikimedia projects see University of Virginia.
Community outreach plans
Persons involved in this project will present it at the following events:
- The medicine series at Wikimania in Cape Town, July 2018
- WikiConference North America in Columbus, Ohio, October 2018
- Cochrane Colloquium 2018
- WikiCite in San Francisco November 2018
- Cochrane Colloquium 2019 (J. Dawson to share tool prototype)
- Submitted for publication in IEEE
Cochrane papers in Wikidata
Wikimedia data sets
- https://dumps.wikimedia.org/, "A complete copy of all Wikimedia wikis, in the form of wikitext source and metadata embedded in XML."
- d:Wikidata:Data access
- d:Wikidata:How to use data on Wikimedia projects
- Research:Quarry, a tool for doing SQL queries on Wikimedia projects
- collection of miscellaneous datasets
- What are the ten most cited sources on Wikipedia? Let’s ask the data.
- proposal draft
- sort out ticket:2019051010004861 (private OTRS)
- Yang, Jingnan; Ward, Justin; Gharavi, Erfaneh; Dawson, Jennifer; Alvarado, Rafael (22 May 2019). "Bi-directional Relevance Matching between Medical Corpora (preprint)". Xenodo (Xenodo). doi:10.5281/zenodo.3155624.
- Yang, Jingnan; Ward, Justin; Gharavi, Erfaneh; Dawson, Jennifer; Alvarado, Rafael (13 June 2019). "Bi-directional Relevance Matching between Medical Corpora". 2019 Systems and Information Engineering Design Symposium (SIEDS) (Institute of Electrical and Electronics Engineers). doi:10.1109/SIEDS.2019.8735639.
Article and citation matching tool: http://www.mmmapp.org/
presentation at SIEDS Conference 26 April 2019
presentation at Wikimania 2019