Talk:Open Access Reader

There is a deeper fundamental problem[edit]

First, this is a fantastic idea. The implications of it are too much for me to consider, but just on its face, projects of this sort need to be attempted.

Fundamental to the execution of this project is the idea that communities gather at WikiProjects and that WikiProjects can be forums for getting news to people. This project has as a premise that this is so. Unfortunately, WikiProject structure has a lot of problems and historically has had little development, and even though lots of people want to use WikiProjects as community hubs, a lot of time and effort is wasted in the deep inefficiency of the WikiProject interface. I talked a bit about this in a proposal at Grants:IdeaLab/WikiProject management suite. My perception of your project is that it is intended to be an application for increasing access to certain kinds of sources which is overlaid on an existing communication platform, but my criticism is that since the communication platform is underdeveloped, your ideas for an application are hampered.

I have no interest in participating in the management of either the project I proposed or this project, but I would join conversations about either. This project proposal trivializes the idea that there is a way to have messages regularly go to WikiProjects, and just presumes that this happens and that the messaging system can be used for all the applications you describe. Actually, the messaging system is undocumented, problematic, and inaccessible for most people.

If you find success in your project, my view is that you would have to develop basic WikiProject infrastructure. Should you wish to do this, I would join in conversations about the user experience of participating in WikiProjects. In the course of you developing the tools you want, perhaps you would also commit resources to being mindful about how other people could replicate the model you make so that communication could be done in other contexts, and so that WikiProjects can share infrastructure more generally.

I am very grateful for your proposal. I want this to be done and I want lots of other people to have similar projects, because this is replicable and scalable. Beyond this, I am also a big fan of open access ideology so I even like the subject matter of this proposal. I want to support this. Blue Rasberry (talk) 16:24, 10 March 2014 (UTC)[reply]

Thank you for your support. I agree that WikiProjects are not all they're cracked up to be, and perhaps are a red herring in this case; however I do think that editors will tend to gather around topic areas that interest them, and creating a research newsletter may be motivating for editors, both new and old. As for the delivery mechanism specifically, there are various examples of such things around, e.g. The Signpost. I'm sure the documentation is terrible, but I don't expect to have this project up and running in a weekend :) EdSaperia (talk) 09:49, 11 March 2014 (UTC)[reply]

I wanted to follow up by saying that I have been talking about your proposal with others. It is provocative and I hope that someone pursues the issues it raises. Blue Rasberry (talk) 19:20, 31 March 2014 (UTC)[reply]

Other relevant resources[edit]

Hi Ed, just to add another relevant resource or two. The Open Access Button http://blog.openaccessbutton.org/ is a bookmarklet used to indicate when you stumble onto a paper that is not open access, these pins are then mapped to try and increase pressure for OA. Apparently they're querying CORE (possibly as a first line in a sequence) to check whether the record exists there first. Chatting to someone on CORE yesterday it seems they're prepared to work with people to get data out, provide a list of articles, and then a feed for updates, etc. The Open Citations project http://opencitations.net/explore-the-data/ may also be relevant, although there are other datasets available, and the problem is there are massive variations e.g. between microsoft academic and google scholar (two of the biggest indexers). Citations are obviously one means to identify potential 'notability'. Sjgknight (talk) 09:10, 14 March 2014 (UTC)[reply]

I'm aware of the OAButton, in fact one of the early hackathons for it was hosted at Top Office Machines. Certainly it partly inspired this project.
CORE looks like a very good option, but it's probably worth looking around to see if anyone else is doing something similar before we commit.
Citations are definitely the most obvious method of filtering. It doesn't matter that much if the data is a bit noisy - at first we'll be looking for the top few percent of papers with hundreds, perhaps thousands of citations, so we can be pretty sure that these are saying something important.

Thanks for your input :) EdSaperia (talk) 00:20, 16 March 2014 (UTC)[reply]

This grant is also relevant Grants:IEG/find_sources_2.0 Sjgknight (talk) 09:38, 19 April 2014 (UTC)[reply]

This may be useful re: assessing impact of OA resources http://www.researchinformation.info/news/news_story.php?news_id=1575

also https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia Sjgknight (talk) 21:49, 29 April 2014 (UTC)[reply]

New paper from Mendeley http://ceur-ws.org/Vol-1143/paper6.pdf on filling Wikipedia Citation Needed using altmetrics Sjgknight (talk) 16:32, 27 January 2015 (UTC)[reply]

More discussion[edit]

I appreciate the big vision of this project and intuitively everything I see here seems right, but also this is a big project and I wonder how to break it into multiple smaller projects which would be easier to understand and manage piecewise.

One longstanding problem on Wikipedia is technical generation of citations on Wikipedia. There are existing tools which work well enough to constitute an attempt at a citation and help someone find a hyperlink to a source, but beyond that, citation tools could be better in various ways if only there was a bit of discussion around them. One proposal is that of Daniel Mietchen who for a long time has talked about how to indicate that a reference is open access.

Right now at en:WikiProject Medicine there is talk about how citation templates are causing problems when huge numbers of people review the same articles and especially when they are translated. In short, templates which abbreviate a citation with just a doi or PMID make it difficult for editors to handle the content on the back end. It would probably be more motivating for publishers to get behind it if the citation practices on Wikipedia were more developed and standardized. I am referring only to the tools which make citations, and not the citation style itself. If anyone wants to jump into a citation problem with some momentum behind it, now would be a great time to say hello to the people at WikiProject Medicine for this issue. Blue Rasberry (talk) 00:35, 17 March 2014 (UTC)[reply]

success metrics[edit]

Just putting some obvious ideas here:

Something around article states pre and post citation insertion (e.g., the article was tagged requiring citations, etc. and this is no longer the case, no. of citations in article, etc.).
Something around additional content (supported by the new citations)
Number of "significant" articles moved from list of OA articles in
Something to indicate use of the OA list to support other projects (e.g. those looking to mark articles as OA)

Sjgknight (talk) 18:22, 27 March 2014 (UTC)[reply]

I'm thinking the easiest thing will be to have a button that marks papers as newly cited, so we can literally see if people are using it or not :). Otherwise, #4 is definitely something I'm considering, how to support other projects building off this work. EdSaperia (talk) 03:34, 28 March 2014 (UTC)[reply]

Presenting the idea at conferences[edit]

It would be interesting to present the Open Access Reader at conferences that are relevant but outside of the wiki-community, like 3rd International Workshop on Mining Scientific Publications. - Lawsonstu (talk) 14:59, 8 May 2014 (UTC)[reply]

Below are the talk pages from the four overview subpages:

Sourcing Pages[edit]

Versions[edit]

Something to think about when considering using an OA repository or repository aggregator such as CORE for this project is that traditionally people would only cite the 'version of record', i.e. the print version of an article or the final post-peer review published online version. Much of the content of repositories consists of pre-prints, and other versions of articles which differ from the article of record.

In subjects which now have a strong culture of archiving pre-prints, such as high energy physics, citing those pre-prints rather than a final published version may have become more acceptable within the field (I don't have any data to hand to support that statement). However, in some disciplines it is still not the norm to archive pre-prints, and people may generally be reluctant to cite them.

We'd also have to look closely at the current guidelines for which kinds of sources are considered most suitable (I'm guessing this might vary between different language Wikipedias). - Lawsonstu (talk) 21:42, 17 March 2014 (UTC)[reply]

One way of dealing with this is to mark the proposal template with what status the paper is (e.g pre-print, version of record), so the editor can make the call. EdSaperia (talk) 23:39, 17 March 2014 (UTC)[reply]

Now I think about it, repository metadata often contains a url and/or doi for the version of record, which could be extracted and displayed if editors did prefer to reference it over a pre-print. - Lawsonstu (talk) 13:37, 18 March 2014 (UTC)[reply]

Citation data[edit]

Open Citations could be very useful. It now looks to be sustainable. - Lawsonstu (talk) 14:50, 9 April 2014 (UTC)[reply]

Wikidata:WikiProject Source MetaData is working on citation data, and will help get large amounts of it into Wikidata. - Lawsonstu (talk) 10:48, 14 August 2014 (UTC)[reply]

Prioritising Significant[edit]

Judging significance[edit]

As an astrophysicist, I usually start by judging papers' significance on whether they appear in one or more review papers. E.g., for solar physics, I might read reviews published in http://solarphysics.livingreviews.org (an open-access review journal), and keep a list of who is cited for a given topic.

Citation number is of course a good first guess. One must be careful of certain biases: papers that are wrong, but controversial, can be cited a lot; citation number can be strongly dependent on which journal the paper was published in; the benchmark for a 'large' citation number is heavily dependent on the field of research- a citation number of 20-50 is pretty good for solar physics, but a citation number of hundreds-thousands is good for a materials science, or nano-science paper.

Another way to judge whether papers are important might be to consider whether they are cited in decadal surveys. Here is one for astrophysics: http://aas.org/resources/decadal-surveys But I'm sure other fields have them as well. --Pohuigin (talk) 23:32, 31 March 2014 (UTC)[reply]

Thank you for your thoughts. With regards to disciplinary differences, that's something we'll definitely be thinking about when looking at metrics. Your comments about review papers and decadal surveys raise an interesting point, because we absolutely want to include those papers which shown to be significant by being highly cited over a long period, but also I think it would be best not to leave out the cutting edge of newer research. So perhaps we could build in ways to compensate: to invent a scenario, an article published in 2002 that has been cited 200 times may have a similar level of significance to an article published in 2010 that has been cited 40 times. We'll have to think carefully about any such weightings. - Lawsonstu (talk) 20:04, 1 April 2014 (UTC)[reply]

Always notable topics[edit]

There are some areas in which every single topic is always notable on the English Wikipedia, and some of these intersect well with scholarly publishing. For instance, all biological species fall in this category, so it would be possible to scan the literature - both old and new - for descriptions of new species (e.g. occurrences of strings like "sp. nov."/ "nov. sp."/ "sp. et gen. nov." etc. in the respective abstracts or titles) and then check whether Wikipedia entries for these taxonomic units already exist (keeping in mind article naming conventions, e.g. no species-level article for monotypic genuses).

If the Wikipedia article already exists, it should just be checked whether the scholarly article describing the species is properly cited (many Wikipedia articles on new species are started based on reports in the popular media, without citing the scholarly source).

If it does not exist, then notifications should go to talk pages of articles or WikiProjects on higher taxonomic units. To identify which these units are, one could consult places like ZooBank, IPNI and MycoBank that assist with that in an increasingly automated fashion. If this does not work, then it becomes relevant as to whether the source article is actually open (and thus minable). If it is, a script could go into the nomenclature section (which is fairly standardized) and identify the relevant higher taxonomic units (which are often also in the title) - for which the likelihood is higher that an entry on the English Wikipedia will already exist - in order to post the news on the respective talk pages.

The majority of new species and higher taxa are still being described in non-open articles, even though this is changing. Because of that, editors working in the area would likely prefer to have such a notification system for any relevant publication, open or not. Openness would then kick in in terms of whether a free-to-read copy is available, so that Wikipedia editors can at least read about the new taxa first hand (rather than through news reports, which are available for only a tiny portion of new taxa, and often very inaccurate), or in terms of whether text bits or media can be reused in the Wikipedia article.

I would be interested in prototyping such a workflow for new species for the English Wikipedia, as it intersects well with other activities by WikiProject Open Access, as well as with my interest in biodiversity. -- Daniel Mietchen (talk) 12:35, 10 September 2014 (UTC)[reply]

Hi EdSaperia, I wonder if you've given any further thought to prototyping a Wikipedian workflow along these lines, or some other? Daniel Mietchen brought this up to me in conversation today. It makes good sense to me to focus on low-hanging fruit to test an initial workflow, perhaps you can come up with something with further discussion together. Cheers, Siko (WMF) (talk) 00:05, 23 October 2014 (UTC)[reply]

Hi Siko (WMF), while this is related in terms of broad approach I don't think our technology, research or methods are at all replicated here, so I don't think it would make sense for us to explore this. However, we've now managed to produce some example output which is quite promising! EdSaperia (talk) 23:31, 3 November 2014 (UTC)[reply]

Possibly relevant papers[edit]

Scientific impact evaluation and the effect of self-citations: mitigating the bias by discounting h-index http://arxiv.org/pdf/1202.3119.pdf

Recruiting Editors[edit]

Talk:OpenAccessReader/RecruitingEditors

Editor Workflow[edit]

Talk:OpenAccessReader/EditorWorkflow

UNcited[edit]

Ok I was looking at the page at wmflabs - https://tools.wmflabs.org/oar/bestuncited.html

And some of these seem to be cited on-wiki, or in one case doesn't look free

Bowling Alone: The Collapse and Revival of American Community is a book review $6 to rent from http://onlinelibrary.wiley.com/doi/10.1002/pam.1035/full (I added it to w:Bowling Alone: The Collapse and Revival of American Community as an external link
Coefficient alpha and the internal structure of tests is cited on w:Cronbach's alpha
Chunking mechanisms in human learning is cited in w:Chunking mechanisms in human learning
How conscious experience and working memory interact is cited in w:Artificial consciousness and w:Stan Franklin
The ERA-40 re-analysis I added to w:ERA-40 as an external link

Rich Farmbrough 21:34 11 November 2014 (GMT).

Also the doi for chunking seems to be wrong, http://dx.doi.org/10.1016%2FS1364-6613 Rich Farmbrough 13:32 12 November 2014 (GMT).

It is actually 10.1016/S1364-6613(00)01662-4 see http://dx.doi.org/10.1016/S1364-6613(00)01662-4 Rich Farmbrough 13:34 12 November 2014 (GMT).

Re: the paywalled article - CORE searches repositories, rather than publisher websites, so it will include some articles which are paywalled in the published version but free to read in a repository. This could cause issues regarding DOIs because they will resolve to the Version of Record rather than open access repository version. To complicate things further, when I searched for Bowling Alone in CORE, that review didn't come up, so not sure what's going on there! - Lawsonstu (talk) 16:09, 19 November 2014 (UTC)[reply]