Grants:IdeaLab/Open Access Reader
This is now proposed as a grant.
What is the problem you're trying to solve?
There's lots of great research being published in good quality open access journals that isn't cited in Wikipedia. It's peer reviewed, so it should count as a reliable source. It's available for anyone to read and probably comes with pretty decent metadata too. Can we set up a process to make it super convenient for editors to find and cite these papers?
What is your solution?
Roughly speaking, my proposed solution works like this:
- Pick a respected major repository, e.g. PLOS, to trial this with. More can be added over time. Ideally, piggyback on another project that is trying to aggregate open access repositories, e.g. CORE.
- Create a notability filter that helps decide whether a given academic paper is likely to be notable, e.g. a minimum number of (academic) citations. There are various sources for this type of stat. We can make this filter more or less strict depending on community ability to cope with the output.
- Create a dictionary somewhere in Wikimedia that matches paper metadata to Wikiprojects e.g. documents in PLOS with the metadata keyword "Paleontology" will probably be relevant to Wikiproject Paleontology. If the metadata is fine enough, it may even be possible to match keywords to specific article talk pages. This dictionary will be open and editable so it will be possible for the community to help populate/correct it. If there is no obvious match, default behaviour will be to skip, i.e. papers with no obvious category will be ignored, so it will be very easy to start with a completely empty dictionary and begin slowly by adding one keyword at a time.
- Set up a process (i.e. bot) that regularly checks for new papers that pass the notability filter, and, using the dictionary, suggests them on the relevant Wikiprojects or talk pages as worth adding to articles. This can be a neat template that gives the abstract and a pre-formatted reference tag, making life very easy for editors. It could be appended to a pre-existing page or talk page (determined by the dictionary) or automatically create and append to a new page specifically created for the purpose, e.g. https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/PaperSuggestions or similar. This would in effect be a regular newsletter of "latest research".
Academic work is some of the best content out there, but even when it's open access, its discoverability is poor. However, one thing the Wikimedian community is great is distributed categorisation. Let's put this to work and get cutting edge academic work cited in our encyclopaedia!
This project will be primarily about adding extra content to existing articles, rather than creating new articles based on academic work. Therefore in fact notability is less important than finding articles with high relevance to the work, as the existence of an article pre-supposes notability of the topic. The key will be sufficiently specific meta-data supplied by the repository.
The first step will be checking the feasibility of each of these steps by socialising the proposal. The steps above will need elaboration and refinement, but the basic premise - systematic suggestion of academic materials for citations - is hopefully sound. This would lead to a more detailed specification.
Next, I'd try and get a minimal functioning end-to-end solution:
- One source repository, ideally a friendly one who are aware of and support the initiative.
- A very simple filter using the easiest available metadata, built in a modular way. Begin with emphasis on avoiding false positives.
- A small manually populated dictionary covering just one subject area.
- A single wikiproject on en-wp who are aware of and support the initiative.
Then we can test the workflow in a controlled way.
I'll publicise this experiment at Wikimania, and try and get volunteers to improve and expand the scope of each step:
- Adding additional source repositories.
- Better filters that
- more accurately identify notable papers
- filter out work that's already cited in wp to prevent duplicated suggestions
- A dictionary that usefully copes with
- Multiple source repositories
- Different levels of precision of metadata (medicine vs surgery vs angioplasty vs percutaneous coronary intervention)
- proposing citations to various targets: Wikiprojects, talk pages, other places I haven't considered yet, perhaps to multiple destinations at once.
- collaborations with more Wikiprojects from a variety of topic areas (including humanities & arts as well as sciences, as Open Access in these areas improves).
- Improvement to the suggestion template:
- Improve workflow for editors to cite suggested papers.
- Editor feedback buttons built into the template ("This article is not relevant to this topic", "Already cited here", "Too many suggestions"). These could feedback directly into the dictionary as red flags.
- http://www.consortium.io/ - a group that seem to be working on Open Publication Taxonomies and other useful infrastructures.
- CORE - "CORE harvests, maintains, enriches and makes available metadata and pdf full-text content from many Open Access repositories. This makes it a useful access point for those who would like to develop applications making use of this content." Has an API.
- The Library Project (currently seeking extension) Grants_talk:IEG/The_Wikipedia_Library/Extensionrequest
- https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Open_Access/Signalling_OA-ness OA signalling project
Welcome, brainstormers! Your feedback on this idea is welcome. Please click the "discussion" link at the top of the page to start the conversation and share your thoughts.