Research talk:Investigating Wikipedia's role as a gateway to medical content
This is a fascinating project and data I would be quite interested in seeing analyzed myself.
For this project to be implemented, significant engineering effort would be needed for (1) instrumenting all outgoing links (2) collecting and streaming click data into our Hadoop cluster. We would also need a thorough review by our security team about our ability to store the data in a way consistent with our privacy and data retention policies.
There might have been some progress I am unaware of (I believe I was copied on a potential data model for instrumenting the links) but the main issue is where to find engineering support to do the work and making sure this is kosher from a security and legal perspective.
I am afraid these are pretty serious blockers: we are finding it quite challenging to set up researchers with access to standard request log data that we passively collect (for some high priority projects and collaborations). Instrumenting the code to collect new data at a large scale is a whole different story and I'm afraid not something my team can help with.
I am sorry for the disappointing news. If there's any update on the engineering front that you would like me to look at and help with, I'd be more than happy to.--Dario (WMF) (talk) 18:01, 29 April 2016 (UTC)
Access to data about clickthrough
If a research participant voluntarily agrees to be tracked then we can get medical clickthrough rates.
This project has data. I am not sure if you connected already.
Open access status
The article notes:
For determining the number of pages, length of pages, the number of external links, and the number of “freely accessible” links added by editors as sources, a single day’s worth of database and XML dump files were captured from late in the study period (April 20th, 2019). As the database and XML dump files had only 0.5% more external links than on April 1st, 2019, the sample from April 20th was felt to be sufficiently representative to serve as the source for all static data counts.
While citation bot has been adding OA URLs and PMC IDs from Unpaywall since 2017 or so, the addition has proceeded slowly. The OA status icons have been reliable only after OAbot has been allowed to add PMC IDs automatically: as seen from https://xtools.wmflabs.org/ec/en.wikipedia.org/OAbot and contributions, this backlog was erased only in August 2019, although a large part of the work had been done in May–August 2018. I don't think it's a coincidence that pageviews for w:en:PubMed Central significantly increased in September 2018. Nemo 15:20, 22 April 2020 (UTC)