Grants:Programs/Wikimedia Community Fund/Rapid Fund/Zotero to Wikibase bibliographical data export app (ID: 22209513)

From Meta, a Wikimedia project coordination wiki
statusFunded
Zotero to Wikibase bibliographical data export app
proposed start date2023-09-15
proposed end date2024-03-15
grant start date2023-09-15T00:00:00Z
grant end date2024-03-15T00:00:00Z
budget (local currency)3500 EUR
budget (USD)3826.9 USD
amount recommended (USD)3851.51
grant typeIndividual
funding regionNWE
decision fiscal year2023-24
applicant• DL2204
organization (if applicable)• N/A
Review Final Report

This is an automatically generated Meta-Wiki page. The page was copied from Fluxx, the grantmaking web service of Wikimedia Foundation where the user has submitted their application. Please do not make any changes to this page because all changes will be removed after the next update. Use the discussion page for your feedback. The page was created by CR-FluxxBot.

Applicant Details[edit]

Main Wikimedia username. (required)

DL2204

Organization

N/A

If you are a group or organization leader, board member, president, executive director, or staff member at any Wikimedia group, affiliate, or Wikimedia Foundation, you are required to self-identify and present all roles. (required)

N/A

Describe all relevant roles with the name of the group or organization and description of the role. (required)


Main Proposal[edit]

1. Please state the title of your proposal. This will also be the Meta-Wiki page title.

Zotero to Wikibase bibliographical data export app

2. and 3. Proposed start and end dates for the proposal.

2023-09-15 - 2024-03-15

4. Where will this proposal be implemented? (required)

Spain

5. Are your activities part of a Wikimedia movement campaign, project, or event? If so, please select the relevant project or campaign. (required)

Other (please specify) Wikibase Scholia Wikicite Wikibooks

6. What is the change you are trying to bring? What are the main challenges or problems you are trying to solve? Describe this change or challenges, as well as main approaches to achieve it. (required)

Zotero is a widely used cloud-based tool for storing and organizing publication metadata; it also offers full text PDF storage and TXT extraction by default, an API for interaction with Zotero cloud data, and a local GUI working with a synchronized local data storage.

The tool to develop in the framework of this grant, a python app, well documented so that non-programmers will be able to run it, will allow it to anybody with bibliographical data on Zotero to enrich a Wikibase with content, where it can be made ready for Wikidata.

This allows author, publisher and place names' literals reconciliation using Open Refine, other necessary manual or script-based curation tasks, and validation using entity schemes, to be performed on the custom Wikibase, and, when finished, and entities on the custom Wikibase are aligned to Wikidata entities, federation with or upload to Wikidata.

I have successfully implemented a workflow for bulk export into a custom Wikibase (see https://lexbib.elex.is). Now I propose to build a generic tool, which would export any Zotero collection to any Wikibase. Zotero record and Wikibase entity are connected to each other, storing the Zotero record URL in the Wikibase, and the Wikibase entity URI as attachment to the Zotero record. The content is synced one-way for the first version of the tool, while two-way sync is possible through the Zotero API. A related tool has been previously developed (Philipp Zumstein, https://github.com/UB-Mannheim/zotkat/blob/master/Wikidata%20QuickStatements.js), that is meant for bulk export of literal values directly to Wikidata through Quickstatements, without these literals being handled any further, and with very limited features. For example, the Zotero ID and the ID of the Wikidata item to create is not kept or stored in both databases, as will the proposed tool.

Thanks to previous work (see https://github.com/dlindem/wikibase), I am familiar with data models for bibliographical data in general (formats used in libraries, the different LOD models, including FRBR/LRM, BIBO, and entity schemes used on Wikidata, and with the Zotero data model in particular. This will allow me to propose a generic method to map Zotero fields to Wikibase/Wikidata properties, and to resolve datatype and format conversion needs. Thanks to this grant, I will be able to allocate the necessary working time on developing the described app, starting from existing own python scripts made ad-hoc and centered in a single use case (hard-coded mappings, configurations, etc.), and on its documentation, and dissemination in the community.

7. What are the planned activities? (required) Please provide a list of main activities. You can also add a link to the public page for your project where details about your project can be found. Alternatively, you can upload a timeline document. When the activities include partnerships, include details about your partners and planned partnerships.

September to December 2023

1. Revision of existing own scripts
2. Setup of a test Wikibase
3. Coding of a prototype for a generic Zotero-to-Wikibase export app

January to March 2024

4. Dissemination in community, collection of feedback, issues tracking on github
5. Fixing issues
6. Documentation writing
7. Release

In more detail:

Regarding (1), own scripts for interaction with Wikibase, which also include a workflow for export from a Zotero collection (see [1], [2] and [3]) have been developed from 2019 to 2022 in an ad-hoc and use-case centered way: Solutions for data normalization and validation, datatype handling, mapping to Wikibase properties, Zotero and Wikibase API connections, etc., appear hard-coded, see https://github.com/dlindem/wikibase

(2) We will employ an instance on https://wikibase.cloud (Wikimedia Germany) for test and demo purposes.

(3) Goal is well documented python code, which should install and run as self-explanatorily as possible. Dennis Priskorn, expert in all fields relevant to this proposal (see section 8), is commited to review all code and give support in this phase.

(4) and (5). We will use Github for tracking of and work on issues. As many people as possible shall be able to participate in this phase.

(6) We will set up dedicated documentation pages, first on Github, then on a place found suitable by the Wikibase user community.

References: [1] LexBib Wikibase: https://lexbib.elex.is [2] Paper about Zotero to Wikibase export workflow: https://www.zotero.org/groups/1892855/lexbib/collections/E65TS5DR/items/QWMXXCBM/collection [3] WikiDataCon 2021 contribution about bibliographical data LOD-ification using Wikibase: https://pretalx.com/wdcon21/talk/KPSXKN/


8. Describe your team. Please provide their roles, Wikimedia Usernames and other details. (required) Include more details of the team, including their roles, usernames, Wikimedia group, and whether they are salaried, volunteers, consultants/contractors, etc. Team members involved in the grant application need to be aware of their involvement in the project.

This grant will enable the grantee to allocate sufficient working time on completing the envisaged tasks.

In addition, the following individuals and communities are committed to support the project:

1. Dennis Priskorn (https://www.wikidata.org/wiki/Q111016131, Wikimedia user "So9q"), software developer, currently working for the Internet Archive related to a Wikibase storing bibliographical data, and active volunteer in the Wikidata and Wikibase communities (also related to lexemes, dictionary linking, and related tooling, see https://github.com/dpriskorn/), will on a volunteer basis actively support the code development, testing, review, and its dissemination in the community.
2. The DARIAH Working Group "Bibliodata" (see https://www.dariah.eu/activities/working-groups/bibliographical-data-bibliodata/), which the grantee is an active member of, will support the project in the following way: the proposed tool will be included as the first in a planned series of workflow documentations for LOD-ification of bibliographical data using Wikibase, to be published as SSHOC workflow (see https://sshopencloud.eu/store-scientific-workflows-data-sshoc-repository-0). This will support the DARIAH Working Group in its effort to raise funding for a common project, of which "LOD-ification of bibliographical data using free software" shall be one of the prominent working packages. Project proposals are currently being prepared within EU research infrastructure and Digital Humanities grant calls. Such project will not start until 2025.
3. The Wikibase User Group, and the Wikibase Cloud community will get posted on progress in this project and being invited to participate.
4. We will reach out to the communities around LOD in Libraries, and Wikidata. From a range of colleagues we know that they are interested, we will request direct feedback. They will help as pivot to their respective communities, all of them being able to reach out to large amounts of Zotero users with interest in bibliographical Linked Open Data:
5. I will reach out to the Zotero community, as I have done before (see https://forums.zotero.org/profile/772439/david_lindemann), in order to discuss any emerging technical issue, and for calling to participation.
9. Who are the target participants and from which community? How will you engage participants before and during the activities? How will you follow up with participants after the activities? (required)

The target group of this proposal is any Zotero user that might want to populate a Wikibase with bibliographical data. Zotero is the most widely used open source and free tool for harvesting, storing, and curating publication metadata and related (full text and other) attachment files. However, all data, apart from dates, are kept as literal string values. Wikibase is the best solution for the LOD-ification task, which means, conversion of literals to typed literals or entities identified by URI. The recommended tool for the entity linking is Open Refine, which can be connected to any Wikibase, and to Wikidata, of course. Standard use-case for the proposed tool would be preparation of bibliographical datasets for inclusion in Wikidata.

Participants in the action will be the individuals cited above, and any Zotero and Wikibase user wanting to test the tool and contribute with feedback. It is foreseen that the discussions that will arise in this context, as a continuation of this project, will lead to a collaborative development of a public tool hosted on toolforge, for Zotero and Wikibase synchronization, which will work entirely cloud based, i.e. by connecting to Zotero and Wikibase API.

10. Does your project involve work with children or youth? (required)

No

10.1. Please provide a link to your Youth Safety Policy. (required) If the proposal indicates direct contact with children or youth, you are required to outline compliance with international and local laws for working with children and youth, and provide a youth safety policy aligned with these laws. Read more here.

N/A

11. How did you discuss the idea of your project with your community members and/or any relevant groups? Please describe steps taken and provide links to any on-wiki community discussion(s) about the proposal. (required) You need to inform the community and/or group, discuss the project with them, and involve them in planning this proposal. You also need to align the activities with other projects happening in the planned area of implementation to ensure collaboration within the community.

There are several on-wiki discussion threads about bibliographical data on Wikidata. About the need for the proposed tool, we have had exchange in the community frameworks described above. In general, open bibliographic data LOD-ification workflows accessible to anybody is regarded an emerging issue, and the use of custom Wikibase instances as intermediate step towards the enrichment of Wikidata, in particular.

12. Does your proposal aim to work to bridge any of the content knowledge gaps (Knowledge Inequity)? Select one option that most apply to your work. (required)

Socioeconomic Status

13. Does your proposal include any of these areas or thematic focus? Select one option that most applies to your work. (required)

Culture, heritage or GLAM

14. Will your work focus on involving participants from any underrepresented communities? Select one option that most apply to your work. (required)

Socioeconomic status

15. In what ways do you think your proposal most contributes to the Movement Strategy 2030 recommendations. Select one that most applies. (required)

Innovate in Free Knowledge

Learning and metrics[edit]

17. What do you hope to learn from your work in this project or proposal? (required)

I want to use this grant as opportunity to dedicate work time on the described actions; a research question relevant in this section could be: How can a micro-grant boost community driven tooling?

18. What are your Wikimedia project targets in numbers (metrics)? (required)
Number of participants, editors, and organizers
Other Metrics Target Optional description
Number of participants 30 Main responsible and developer of this project will be the grantee.

The above cited 3 individuals will be closely following the project. The number of early adopters of the tool that volunteer in testing and reporting issues may very easily reach 30 persons.

Number of editors 1 Documentation writing is part f the planned activities to be carried out by the grantee.
Number of organizers 1
Number of content contributions to Wikimedia projects
Wikimedia project Number of content created or improved
Wikipedia
Wikimedia Commons
Wikidata 1000000
Wiktionary
Wikisource
Wikimedia Incubator
Translatewiki
MediaWiki
Wikiquote
Wikivoyage
Wikibooks
Wikiversity
Wikinews
Wikispecies
Optional description for content contributions.

The tool to develop is the missing bit for anybody with a bibliographical data collection on Zotero to shift it to Wikibase, curate it there, check for existing Wikidata items, and then send it to Wikidata: It bridges Zotero and Wikibase.

19. Do you have any other project targets in numbers (metrics)? (optional)

No

Main Open Metrics Data
Main Open Metrics Description Target
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
N/A N/A N/A
20. What tools would you use to measure each metrics? Please refer to the guide for a list of tools. You can also write that you are not sure and need support. (required)

How many Wikibase / Wikidata items are created can easily be tracked using SPARQL queries or the edit histories of the respective Wikibase.

How many users get involved in the software testing - review - updating cycle, can easily be tracked on github.

Financial proposal[edit]

21. Please upload your budget for this proposal or indicate the link to it. (required)
22. and 22.1. What is the amount you are requesting for this proposal? Please provide the amount in your local currency. (required)

3500 EUR

22.2. Convert the amount requested into USD using the Oanda converter. This is done only to help you assess the USD equivalent of the requested amount. Your request should be between 500 - 5,000 USD.

3826.9 USD

We/I have read the Application Privacy Statement, WMF Friendly Space Policy and Universal Code of Conduct.

Yes

Endorsements and Feedback[edit]

Please add endorsements and feedback to the grant discussion page only. Endorsements added here will be removed automatically.

Community members are invited to share meaningful feedback on the proposal and include reasons why they endorse the proposal. Consider the following:

  • Stating why the proposal is important for the communities involved and why they think the strategies chosen will achieve the results that are expected.
  • Highlighting any aspects they think are particularly well developed: for instance, the strategies and activities proposed, the levels of community engagement, outreach to underrepresented groups, addressing knowledge gaps, partnerships, the overall budget and learning and evaluation section of the proposal, etc.
  • Highlighting if the proposal focuses on any interesting research, learning or innovation, etc. Also if it builds on learning from past proposals developed by the individual or organization, or other Wikimedia communities.
  • Analyzing if the proposal is going to contribute in any way to important developments around specific Wikimedia projects or Movement Strategy.
  • Analysing if the proposal is coherent in terms of the objectives, strategies, budget, and expected results (metrics).

Endorse