Wikicite/grant/WikiCite addon for Zotero with citation graph support
- Project Name
- WikiCite addon for Zotero with citation graph support
- Start/End dates
- 01 Nov 2020/30 Apr 2021
- Amount requested (and the currency you wish to receive it in)
- 7,920 USD$
- Amount requested (in US$ equivalent)
- 7,920 USD$
- Contact person name/Wikimedia username
- Diego de la Hera/Diegodlh
- Contact person e-mail address
[Alternatively, confirm that you have "Allow other users to email me" enabled in your account preferences]
- Organisation (optional)
If this grant is for an organisation (for example a Wikimedia Affiliate), name it here
- Project participants
- Who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.
Diego de la Hera: scientist and developer.Web profile - Resume
I am a scientist myself, so I have long thought of the tool I am proposing here, which I think would be really useful to the scientific community.
I am also a developer committed to libre software, with a focus on web technologies, and would like to continue doing so. My experience as a developer ranges from designing and developing software platforms for experimental data collection and analysis, through the development of web-based mobile apps, to engagement in open source projects (such as the Hypothesis web annotation project), including experience with RESTful APIs, Firefox extensions (relevant to the development of the Zotero plugin proposed here), among others (see Resume for full details).
My committment to libre software is just an aspect of a more general advocacy for free access to knowledge. In this context, I have experience working with bibliographic data, such as my contributions to the autores.ar Database of Argentine authors, a recent yet unpublished automatized bibliographic audit for Creative Commons, among others (again see Resume for full details).
Describe the project or event.
The goal of the project is to develop a WikiCite plugin for the open source reference management software Zotero.
The plugin will provide citation support for Zotero. It will retrieve this information from WikiData's "cites work" (P2860) property. In addition, for cases where citation data may not be available from WikiData, it will offer users the possibility to add this information, either manually or via automatic extraction from their PDF attachments, and upload it to WikiData, hence enriching both their local and WikiData's citation graph. Finally, the plugin will also provide the user with visualizations of the citation graph for their local collections.
The plugin will adapt and integrate ideas from different FOSS projects to provide a convenient straightforward way to both leverage WikiData information to aid literature understanding and discovery, and to contribute back to WikiData.
It will comprise separate interacting modules that will be released in stages to engage the community in testing before the first stable release is published. At the end of the project, an online event will be organized to present the plugin. Attendees will be invited to install the plugin, download citation data for their Zotero libraries with it, add missing citation information for some publications, and contribute them back to WikiData.
The code will be released under a libre software license compatible with Wikimedia and MediaWiki practices.
Why is this project needed? What will it solve or improve?
Researchers and writers often use reference management software to organize their literature review. These collections tend to grow fast and it is often unclear how the different items relate to one another, and whether some essential piece may be missing. This is particularly true in a context of increasing rate of new academic publications.
Citation networks can help writers (including Wikipedia editors) understand how publications connect with one another and discover new works, in line with one of WikiCite goals of leading to better research discovery. There are projects that provide large scale networks (such as Eigenfactor's Maps of Science), but there is value in limiting this network to a researcher's local collection.
Zotero lacks proper citations support
Zotero is the most popular open source reference management software tool. However, it does not support citations natively.
Items can be related to one another using the relations field. However, this relationship is unlabelled and bidirectional. As mentioned by David Lesieur on the Discussion page, a possible workaround would be to state this information in a Note, as supported by the Kerko tool. Either way, relations would be restricted to items which are both in the Zotero library already.
It has also been discussed that Zotero should fetch citation information from external sources. The Zotero Plugin for Open Citations does import citations from OpenCitations COCI and saves them as a Note attachment. However, it imports incoming citations only. Adding features such as importing from WikiCite and EPMC is in their to-do list, but the project has not been updated since 2018.
Therefore, a plugin that adds citation support to Zotero would be relevant.
WikiData citations coverage is limited
WikiData already collects citation information in its "cites work" (P2860) property. However, coverage is not perfect. WikiData relies on CrossRef and PubMed Central as source of this information (and there is an application for this grant to support OpenCitations as well), but this information is again not complete in these sources either. Other sources, such as Microsoft Academic Graph, Semantic Scholar and CiteSeerX may help close this gap, but these datasets are available under licenses that may not be compatible with WikiData.
An alternative is to rely on user contributions, promoting the collaborative nature of WikiData. The possibility of having Zotero users contribute data to WikiData has already been mentioned, and it could complement work done by bots (e.g., Citation graph bot 2). In particular, the idea to easily add citation information to Zotero (either manually or automatically) and contribute it to open citation databases has been proposed already, but never implemented.
A similar crowdsourced open citations initiative is OpenCitations' CROCI, but the upload procedure is not straightforward and it is restricted to scholars and publishers with ORCID.
Manually extracting this information can be tedious. Therefore, providing ways to easily input this information, to get it automatically from PDFs, and to easily upload it to external sources, is essential for the success of the endeavor.
Citation graph visualization
To ensure that users see value in having citation information in their collections, an easy way to represent the citation graph should be provided. This tool should use information from the local Zotero library, including both information obtained from external sources, and information entered by the user which has not been uploaded, either because the user rejected the offer, or because the item does not belong to the external source (for example, a draft paper, a personal essay, etc). The following tools are already available, but none of them fits this use case perfectly:
- The Better BibTex plugin provides a Citation Graph export, using citation information in Zotero's extra field. However, this data has to be entered manually, and there is no user-friendly GUI to do so.
- ZotNet plugin creates network maps of Zotero items. However, it uses the relations field which, as stated above, is not ideal for citations.
- The jaks6's citation_map tool allows to build a citation graph for the local collection, but it doesn't rely on external sources. Instead, it simply scans the PDF attachments looking for the titles of the other articles in the collection.
- A similar idea has been discussed as the result of an Open Research Data do-a-thon in 2017, but it seems to have taken a different direction.
- The standalone tools Citation Gecko and Local Citation Network are both very good and they both have their own approach (as discussed here). However, none of them relies on WikiData (they use a combination of CrossRef, OpenCitations, and Microsoft Academic Graph) and their integration with Zotero is not straightforward.
- VOSViewer is a very robust freeware tool. It uses information from WikiData already. It is very complete, but can be overwhelming for newcomers.
A WikiCite plugin for Zotero
Right now there are two ways in which Zotero and WikiData exchange information. On the one hand, the Zotero's Wikidata translator allows to import WikiData information into Zotero, but the user has to know WikiData already and be browsing it to import its information into Zotero. On the other hand, the Zotero's ZotKat plugin allows to export Zotero information in QuickStatements format to allow for easy batch editing of WikiData.
Therefore, a WikiCite plugin for Zotero that provides citation support, that fetches this information from WikiData, that lets the user easily fill in the gaps (either manually or automatically) and upload this information back to WikiData, and that uses this information to easily show how the items in the user's collection connect to one another, would expand how both projects talk to each other and could be an interesting solution to the scenario described above.
Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project?
The project will comprise 90% development, and 10% communication and community engagement.
The idea is to adapt and integrate elements from different open source tools already available as much as possible. Code will be released under a FOSS license, and it will be internationalized to enable community translation to other languages. Translation contributions will be accepted using non-technical tools such as MediaWiki's Translate, Weblate or Crowdin.
The plugin will comprise separate interacting modules. Modules will be released in stages to engage the community in testing and providing feedback before the first stable release is published:
- It will iterate over library items or run automatically upon import or creation of new items.
- Data retrieved will be saved to Zotero's "Extra" field. Saving to a plugin-specific database may also be considered.
- QID and citation count will be shown in two additional columns in Zotero's main pane.
Citation editor module
This module will add an additional "Citations" tab to Zotero's item pane, providing a user-friendly interface to review and edit citations:
- If the item is found in WikiData, its QID will be shown here. Otherwise, it will be offered to create a new WikiData entry (see Data upload module below).
- If a QID is available, it will be offered to populate the citation data from WikiData (if not already done so by the Fetcher module).
- A list of citations will be shown. For each work cited, there will be:
- Work title, authors and publication year.
- Work DOI, if available
- Work QID. If not available, it will be offered to create one.
- It will be shown if citation exists in WikiData. If not, and if both item and cited work have a QID, it will be offered to sync this data to WikiData.
- If cited work is available in Zotero, a link to it will be provided.
- Citations may be added, edited or removed at will from the user's collection.
- A button will be provided to try and extract citation data automatically from PDF attachments.
Citation extraction module
This module will provide automatic extraction of citation information from PDF attachments. Among tools considered (CERMINE, Grobid, LOC-DB, and ParsCit), Grobid was selected for its out-of-the-box performance, popularity, and project activity.
Grobid comprises a service and a client. The plugin will provide communication with the Grobid service via the node.js client. The user will have to run the service on their computer, or connect to an external service. If possible, an instance may be hosted on Wikimedia's Toolforge.
In addition, submission to Scholarcy Reference Extraction API will be offered as well. This service is file size and rate limited.
Data upload module
This module will upload user contributions to WikiData.
- It will handle user login using OAuth. For this, registration of a new OAuth consumer will be requested.
- For items without QID, it will handle creation of new WikiData entries upon user request, making sure no duplicate items are created.
- For citations not retrieved from WikiData, it will handle upload to WikiData upon user request.
- In addition, it will allow exporting in CROCI format too, enabling submission to the Crowdsourced Open Citations Index.
Citation graph visualization module
This module will provide visualization of the citation graph in the user's collection
- Local Citation Network will be adapted to work with offline data from Zotero.
- In addition, an export translator will be provided to export to a file format that can be used in VOSviewer too.
Translation and documentation
The project also involves:
- Translation of the user interface from English to Spanish.
- Publication of documentation under an open access license.
At the end of the project, an online event will be organized to present the plugin. The event will be targeted to both Zotero and WikiData communities. Before the event, new Zotero users will be asked to install Zotero and to add some publications to it. During the event, attendees will be invited to:
- Install the plugin.
- Use the plugin to fetch citation data for their collections.
- Create a WikiData account if they do not have one already, and to log in to it with the plugin.
- Identify items in their collections without QID and select 5 of them.
- Use the plugin to create a new entry on WikiData for each of the 5 items selected.
- Use the plugin to either manually or automatically extract and input 5 citations for each of these items.
- Use the plugin to upload the citation information just entered to WikiData.
- Use the plugin to obtain a citation graph of their local collection (or subset), and to identify relevant works not yet in their collections.
- If desired, upload their graph and discoveries to their social media accounts, using predefined hashtags such as #WikiCiteZotero or #OpenCitations.
The event will take place twice: once in English and once in Spanish, to involve the Spanish-speaking communities of Zotero and WikiData as well.
Measures of success
What are criteria you will define success for your project, and how do you intend to measure for them? What are your targets for these measurements?
- Development success criteria
- Publish first pre-release of the plugin including the Fetcher module.
- Publish second pre-release adding the Editor module.
- Publish third pre-release adding the Uploader module.
- Publish fourth pre-release adding the Citation graph module.
- Publish fifth pre-release adding th Extraction module.
- Publish first stable release of the plugin.
- Event success criteria
- Involve at least 20 people in the presentation events.
- Creation of at least 20 new WikiData accounts.
Assuming some overlap (30%) among user collections:
- Creation of 20 * 5 * 0.7 = 70 new WikiData entries during the events.
- Creation of 20 * 5 * 5 * 0.7 = 350 new WikiData "cites work" claims during the events.
Who is your target audience for this project, and how will you ? How will you engage the community you’re aiming to serve at various points during your project?
Both WikiData and Zotero communities are the target audience for this project. Also members of Wikipedia community that may be interested in using this new plugin to help them write their Wikipedia articles.
Prereleases of the plugin will be published in stages, one stage per plugin module. Members of the communities involved will be notified of new pre-releases. They will be invited to test them and to provide feedback. This will make sure bugs and suggestions can be addressed before the first stable release is published for the presentation event.
For the presentation event, both WikiData and Zotero communities will be involved.
Community engagement will continue after the project has ended. Once released, the plugin will provide wider visibility of the WikiData project, contributing to the project's goal of giving more people more access to more knowledge. In addition, the plugin may foster engagement of researchers, writers and other Zotero users with the WikiData community, by simplifying and encouraging data contributions. These contributions will benefit WikiData users in general, regardless of whether they use Zotero or not.
In addition, translators from Zotero and WikiData communities may be engaged into translating the plugin to other languages. Developer communities may be involved too in contributing extra features such as:
- Adding support for citation reasons. These are included in the Citation Ontology (CiTO) and enable characterization of the nature or type of citations, both factually and rhetorically. They would help better describe the connections between works, and they would particularly benefit from the kind of user manual input that this plugin would allow. This has been proposed as a feature addition to the Zotero Open Citations Plugin, but not implemented. The WikiData community would be involved here as well, as it would require the addition of a "reason" qualifier to the "cites work" P2860 property.
- Adding support for "cited by" links. The pluging to be developed in this project mimics WikiData and includes "cites work" links only. Future work may include the inverse "cited by" property as well.
How you will use the funds you are requesting? List bullet points for each expense. (You can create a table or link to a separate (public) document if needed.
- Development [240h]: 7680 USD$
- Fetcher module [45h]
- Editor module [70h]
- Uploader module [45h]
- Citation graph module [35h]
- Extraction module [35h]
- Internationalization [10h]
- Translation [3h]: 60 USD$
- Documentation [9h]: 180 USD$
- Event organization [4h]: ad-honorem
- Event host [4h]: ad-honorem
COVID risk assessment (for in-person events)
If the project is for an in-person event, you must complete the risk assessment tool and checklist, and provide a link to copies of these documents here. Events must not include any international travel, and must follow all applicable local health guidelines.
You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.
Please provide links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions.
- WikiData project chat page
- Wikicite talk page
- WikiData's "cites work" property talk page
- Relevant Zotero forum threads here, here and here.
- Zotero subreddit
In addition, I have also notified the following people by email:
- Tim Wölfle, developer of Local Citation Network
- Maxime Lathuilière, developer of wikibase-sdk and wikibase-edit
- Aaron Tay, author of the blog Musings about librarianship
- Ivan Heibi, Silvio Peroni and David Shotton, authors of the Crowdsourcing open citations with CROCI paper
- Colleague scientists
Optional: Community members are encouraged to endorse your proposal and leave a rationale here.
- Support I think this is a great idea, particularly because it is based on widely used open source tools. Also, this tool will be really useful for the entire academic community. --Guadarf66 (talk)
- Support Strong support, please fast-track this. The functionality discussed in the proposal will fill major gaps in Zotero at present. This is especially important for scholarly materials from developing countries / underrepresented languages, where any added functionality (huge gap with that "published in") will connect to specific bibliographic outcomes. Prburley (talk) (Northwestern University)
- Support Citation support is key in all academic fields, this plugin will benefit an enormous amount of users in the field. --Drizztango (talk)
- Support Zotero-Wikidata is an integration we should improve. As mentioned by Prburley, this proposal goes in the direction of addressing some pitfalls people who rely on Zotero to feed Wikidata have encountered. Thinking freely here, it'd be nice to map other developments to be worked on to improve the Zotero-Wikidata integration. This project, that plans to bring together the Wikidata and Zotero communities, could eventually spark this broader agenda. --Joalpe (talk) 18:22, 29 September 2020 (UTC)
- Support I’d love to see better integration between Zotero and Wikidata across the board. Strong support. - PKM (talk) 18:32, 29 September 2020 (UTC)
- Support I think a Citation Editor alone would already be very useful to many researchers. Integration with WikiCite would be a huge asset. --David Lesieur (talk) 20:56, 29 September 2020 (UTC)
- Support yeah, citations change and zotero / generation yields social media links in author fields. it would be nice to facilitate zotero update via wikidata. plus we should send some love to our friends at the Roy Rosenzweig Center. Slowking4 (talk) 22:09, 29 September 2020 (UTC)
- Support Sounds great - more new linked items at WikiData, more use for existing items at Wikidata.Jklamo (talk) 12:04, 30 September 2020 (UTC)
- Support Excellent idea that would surely help many users. I'm very interested in getting the citation graphs from my local libraries. --Nicosarbia (talk)
- Support Zotero is a very popular free and open source research tool and tying it and Wikicite together has been a community wish for some years. Give it a go! Blue Rasberry (talk) 13:06, 30 September 2020 (UTC)
- Support Excellent idea and design. --Csisc (talk) 22:06, 30 September 2020 (UTC)
- Support This is a fantastic idea, and something absolutely needed - especially useful for the academic community! --Zoltera (talk)
- Support Very valuable extension of current capabilities. Zotero and Wikidata are an ideal match for the project. T.Shafee(Evo﹠Evo)talk 00:31, 2 October 2020 (UTC)
- Support I've worked with Diego in the past in several projects and he's a very reliable, hard-working person. I endorse his application and I know he'll be able to deliver what he's setting out to do. --Scann (talk) 15:36, 2 October 2020 (UTC)
- Support I strongly support this. This might be the missing link in incentivizing researchers to contribute to free the remaining citations that remain closed. I4OC https://i4oc.org/ have done amazing work but now seems to have mostly stalled at 50% open citations mark due to hold outs from remaining publishers e.g. Elsevier who are unlikely to relent. So incentizing resarchers to free the citations themselves seems to be a good way to close the gap. Even if this doesn't happen at scale, the individual researcher can still benefit from the plugin. --Aarontay (talk) 16:52, 2 October 2020 (UTC)
- Support Happy to support this, you're very welcome to use the Scholarcy Reference Extraction API, it's a great use case. Phil at Scholarcy (talk) 17:58, 2 October 2020 (UTC)
- Support Great proposal, proimsing for creating new information links and encouraging for researchers
- Support Great feature, love it.
- Support Great idea. In the future I hope you will be able to add CiTO support too. With the Journal of Cheminformatics we're piloting adding citation intentions with CiTO terms and adding annotations to Wikidata. --Egon Willighagen (talk) 06:42, 28 November 2020 (UTC)
- Support --So9q (talk) 05:28, 3 January 2021 (UTC)
Any questions about this proposal and feedback from reviewers should be placed on the associated discussion page.
- Can you increase the budget and the allocation to documentation? Currently there is $180 requested for 9 hours. The wiki community thrives on accessibility and documentation. People may translate or adapt the text you produce now, so it is important to get this right. This project has a multiplatform software component (wiki + zotero), and already has global and multilingual community use, and which has some big technical, social, and ethical issues embedded in it. The WMF should provide some guidance on what kind of documentation it would like to fund and how much time to spend on this, but I see 9 hours as not enough. The proposer is from a Spanish speaking country, can you do English + Spanish documentation from the start? Do you both Zotero and the Wikimedia platform? Can you do documentation of "connecting Zotero to the wiki community" and "Wiki for the Zotero community"? I do not have great or high expectations for documentation, but if this project goes forward, I do not want people to see the tool and not be able to understand its basics. To throw a number out: if there room to increase the budget then I think 80 hours of documentation could be worthwhile. Document the code you develop, how to use it, and in English and Spanish user guides from perspectives of both Zotero and wiki. Truthfully, I care less about the actual tool than finding someone to write about how this should be done and what users need to know. We can always hire a developer to make a tool, but we cannot always find someone who understands the user communities well enough to explain what features matter. Blue Rasberry (talk) 13:18, 30 September 2020 (UTC)
- @Bluerasberry: Thanks for your comment. Please find my reply on the Discussion page. --Diegodlh (talk)
- ↑ Global scientific output doubles every nine years : News blog
- ↑ 67% of 100k sampled scholarly articles had a "cites work" property
- ↑ Heibi, Ivan; Peroni, Silvio; Shotton, David (2019-06-21). "Crowdsourcing open citations with CROCI -- An analysis of the current status of open citations, and a proposal". arXiv:1902.02534 [cs]. Retrieved 2020-09-25.
- ↑ Crossref as a new source of citation data
- ↑ "More about open citations — Citation Gecko, Citation extraction from PDF & LOC-DB".
- ↑ "OpenCitations' CROCI repository in GitHub".