Grants:IEG/Open Access Reader/Midpoint

From Meta, a Wikimedia project coordination wiki


Welcome to this project's midpoint report! This report shares progress and learnings from the Individual Engagement Grantee's first 3 months.

Summary[edit]

Open Access Reader (OAR) is a project to systematically ensure that all significant open access research is cited in Wikipedia by

  • developing a tool that identifies missing citations
  • nurturing a volunteer community to use this tool

We'd like to apply for more funding to:

  • Commission the CORE team to produce the backend functionality we need ($15,000)
  • Build a basic UX ($?)
  • Start growing a test community ($?)

The long term goals for the project should be:

  • growing a large, productive product user group, from within the existing WP contributor community and other aligned communities
  • Sustaining stable product development support

Methods and activities[edit]

How have you setup your project, and what work has been completed so far?

Describe how you've setup your experiment or pilot, sharing your key focuses so far and including links to any background research or past learning that has guided your decisions. List and describe the activities you've undertaken as part of your project to this point.


In the initial 3 month exploratory period, we validated and socialised our concept and produced a small but promising proof of concept, as well as doing the research necessary to build the first functioning prototype.

More specifically, we produced:

  • Strong recommendations to use CORE as the source for OA metadata.
  • A proof of concept generated from a static metadata dump from CORE.
  • Discovery of research outlining a method of matching OA articles from CORE to Wikipedia categories.
  • A set of wireframes for a desktop UI, with mockups coming soon.
  • A proposal from the CORE team to produce and support:
    • the backend required to supply open access metadata in the form we require for OAR by augmenting their existing API.
    • a considered and justified ranking methodology.
  • Discovery of Citoid, a tool to automatically generate correct citation links.
  • A press list for a campaign to develop a crowdsourcing community.

Overview & Motivation[edit]

In organising and programming Wikimania, I got a very wide overview of the current developments and challenges going on inside the Wikimedia movement as a whole. This project was inspired by a few strands of that experience, particularly the following observations:

  • The Open Access movement in academia is only a few years old, but already millions of papers a year are being released under open licenses.
  • Academic output is a significant source of content for Wikipedia.
  • Due to the sheer volume, discoverability of academic papers is very poor, despite modern tools like Google Scholar. This is a challenge even for seasoned academics, and a major reason why research tends to gather in topical silos - itself a pattern Wikipedia could help address in the long term.
  • Academics don't typically engage with Wikipedia much, though this is starting to change.
  • In contrast, OA publishers find Wikipedia to be a major source of traffic - i.e. people find academic work largely through Wikipedia (!)
  • Most academic papers are surprisingly intelligible to a layman, however most laymen don't know how or where to find them (and they're historically behind paywalls).
  • The data ecosystem around Wikipedia (Wikidata and Labs) is maturing quickly, but the potential isn't widely known within the WP community.
  • New (Wikimedia) contributors find it difficult to imagine tasks that are simultaneously within their competence, and that feel significant enough to be motivating.

Open Access Reader is based around the core functionality of taking an Open Access library, and removing from it papers that are already cited in Wikipedia. This functionality could be used:

  • As a tool for experienced editors, allowing subject or even article level discovery of new papers to integrate => better quality articles
  • As a tool for new volunteers, giving them motivating yet well defined editing tasks => new contributors
  • As part of a campaign to strengthen the link between the academic community and Wikimedia => expert contributors / increased legitimacy for Wikimedia
  • As an example of a project using open data, both from the Wikimedia ecosystem, and from the Open Access ecosystem, that can be understood by people that aren’t particularly familiar with either, perhaps inspiring similar projects from the open data community in general => more technical contributors

Furthermore, the project seems to:

  • have a quickly achievable MVP, or at least impressive proof of concept demo.
  • have an output that’s would be easy to measure and evaluate (users, citations added)
  • not replicate any similar work
  • be useful in perpetuity (if supported)

The main steps in the project are:

  • Source OA metadata
  • Remove existing citations
  • Rank papers
  • Filter papers
  • Provide a UX

Doing these things in a basic way is quite straightforward, but quality of the output will be low. Most of the exploratory period was spent researching and communicating with subject matter experts online and at events, to try and find out exactly what resources or methods already exist, and generally which parts of the project would be the most challenging.

Midpoint outcomes[edit]

What are the results of your project or any experiments you’ve worked on so far?

Please discuss anything you have created or changed (organized, built, grown, etc) as a result of your project to date.

Scoping out project path[edit]

Given the amount of funding my primary aim was to invest time in reviewing relevant expertise available to inform good decisons and a viable proof of concept demo. Relevent activities were:

  • Attending conference
  • Met with Petra from CORE
  • Corresponded with project advisors to seek feedback on feasibility and potential pitfalls
  • Research on bibliometrics to inform decisions on significance filter

Proof of Concept Demo[edit]

We decided to prioritise some kind of example output. It would mean that we'd have to tackle every step of the process at some level, and stop us getting distracted by difficulties in a particular area. Additionally, a convincing demo helps grow support for a project, attracting talent, funding and volunteers.

  • Identify best Open Access Aggregator - https://tools.wmflabs.org/oar/samplemetapretty.json
  • Find simplistic but quick-to-implement significance filter for the sample - number of citations in other literature in that sample
  • Explore correspondence between aggregator papers and Wikipedia citations
  • Publish static output & elicit feedback

Creation of wireframe for UX[edit]

This is a cosmetic exercise, but was important to demonstrate the User Interface in order to the applicability of the project to different audiences. It helps to illustrate how the API can be implemented as part of a workflow and the potential for a relatively inexperienced user to rapidly and efficiently improve quality of articles with highly relevant, academic research.

Development of productive partnership with CORE[edit]

This is central to the ongoing viability of the project. The most cost effective way of delivering the functionality of OAR is by piggybacking onto an established community and resource with aligned goals. The fact that Petra was prepared to provide the projected cost for the API augmentation and is invested in the future successful integration is good value and far lower risk than attempting to set up our own team from scratch. It is highly likely that this partnership can continue and support the complicated task of creating a functional topic filter.

Finances[edit]

Budget spent up to the midpoint was 3275 USD => 1971 GBP

On 10 Nov 2014:

Learning[edit]

The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you are taking enough risks to learn something really interesting! Please use the below sections to describe what is working and what you plan to change for the second half of your project.

What are the challenges[edit]

What challenges or obstacles have you encountered? What will you do differently going forward? Please list these as short bullet points.

What is working well[edit]

What have you found works best so far? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.

  • Your learning pattern link goes here

Next steps and opportunities[edit]

Simple Citation Hitlist Tool[edit]

Taking our proof of concept to something that we can deploy to users.

  • Design a robust significance ranking

The current system is the most simple and convenient but isn't accurate or reliable. The CORE team are experts in bibliometrics and have invited us to commission them to create a better one.

  • Produce a system that generates a live list of most significant papers.

The current system runs off a static and partial data dump. We will build a product that runs off the live CORE and live Wikipedia databases.

  • UX design
  • Design metrics

We'll decide which actions the tool will measure and create an analytics dashboard, allowing us to evaluate results and continue to improve the design.

  • Share functional tool around:
    • General Wikimedia communities

Via mailing lists, community spaces (village pump), active wikiprojects, etc.

    • Open (Access) communities

Via mailing lists, publications, influencers.

    • Potential volunteer communities

Via PR.

Mature tool with topic filtering[edit]

Create topic filtering functionality, making the tool more useful for specific topic communities (e.g. wikiprojects, university courses)

  • Assess paper metadata
  • Research, Design & implement topic filtering
  • Update UX
  • publicise within topic communities (wikiprojects, etc)

Grantee reflection[edit]

We’d love to hear any thoughts you have on how the experience of being an IEGrantee has been so far. What is one thing that surprised you, or that you particularly enjoyed from the past 3 months?