Research talk:Investigating Wikipedia's role as a gateway to medical content

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search

Feasibility[edit]

This is a fascinating project and data I would be quite interested in seeing analyzed myself.

There is one major blocker to this project and that is, put it simply, the availability of the data. We don't collect data on clicks on outgoing links at all: our traffic data collection is limited to standard HTTP request logs. We do have some instrumentation in place to collect additional data on user interaction with the site via Javascript on the client side, but this happens primarily in the context of individual feature engineering, not by default across the site.

For this project to be implemented, significant engineering effort would be needed for (1) instrumenting all outgoing links (2) collecting and streaming click data into our Hadoop cluster. We would also need a thorough review by our security team about our ability to store the data in a way consistent with our privacy and data retention policies.

There might have been some progress I am unaware of (I believe I was copied on a potential data model for instrumenting the links) but the main issue is where to find engineering support to do the work and making sure this is kosher from a security and legal perspective.

I am afraid these are pretty serious blockers: we are finding it quite challenging to set up researchers with access to standard request log data that we passively collect (for some high priority projects and collaborations). Instrumenting the code to collect new data at a large scale is a whole different story and I'm afraid not something my team can help with.

I am sorry for the disappointing news. If there's any update on the engineering front that you would like me to look at and help with, I'd be more than happy to.--Dario (WMF) (talk) 18:01, 29 April 2016 (UTC)

Access to data about clickthrough[edit]

@Willinsky and Lauren maggio:

If a research participant voluntarily agrees to be tracked then we can get medical clickthrough rates.

This project has data. I am not sure if you connected already.

Blue Rasberry (talk) 12:09, 11 February 2019 (UTC)

Posted link to published paper to Wikiproject Medicine[edit]

Blue Rasberry (talk) 22:45, 5 April 2020 (UTC)

Open access status[edit]

The article notes:

For determining the number of pages, length of pages, the number of external links, and the number of “freely accessible” links added by editors as sources, a single day’s worth of database and XML dump files were captured from late in the study period (April 20th, 2019). As the database and XML dump files had only 0.5% more external links than on April 1st, 2019, the sample from April 20th was felt to be sufficiently representative to serve as the source for all static data counts.

While citation bot has been adding OA URLs and PMC IDs from Unpaywall since 2017 or so, the addition has proceeded slowly. The OA status icons have been reliable only after OAbot has been allowed to add PMC IDs automatically: as seen from https://xtools.wmflabs.org/ec/en.wikipedia.org/OAbot and contributions, this backlog was erased only in August 2019, although a large part of the work had been done in May–August 2018. I don't think it's a coincidence that pageviews for w:en:PubMed Central significantly increased in September 2018. Nemo 15:20, 22 April 2020 (UTC)