Web2Cit

From Meta, a Wikimedia project coordination wiki
Jump to navigation Jump to search
Web2Cit: Visual Editor for Citoid Web Translators

This page is to share news, updates and information about Web2Cit.

The Web2Cit project is supported by a grant from the Wikimedia Foundation.

The problem[edit]

WikiCite 2017 - Citoid performance for news article citations - Research by Fuzheado & Gamaliel showing only 32% of the 90 most popular news sites cited in English Wikipedia could be successfully extracted using Citoid/Zotero, for the four basic reference fields: headline, date, author, and publication name.

The Citoid extension in Wikipedia's visual editor uses the Citoid API to resolve a URL, DOI, QID, etc, into a citation template. To do so, the Citoid service relies (in part) on Zotero web translators to get citation metadata from a website.

Websites which embed metadata appropriately are understood by generic translators. However, this is often not the case, and site-specific translators are needed, most of which rely on web scraping techniques. For instance, an evaluation by Fuzheado and Gamaliel done in 2017 found that even popular websites in English Wikipedia weren't displaying metadata properly.

Most of these site-specific translators seem to be for English sources (see here, here, or here). Contributions to the Zotero's translators repository are open, but they require programming skills.

Zotero developers have always shown willingness to help with new translator requests, but the demand may be too high (currently there are ~40 open issues with the "New translator" tag), and sometimes translators become broken. Although they will hire a specific person to work on translators starting from May, which may shorten review times, some translators may pose cultural and language challenges. For example, a translator for a mainstream Argentinean newspaper has recently been created by one of Zotero developers, following a request from a non-technical user in their forums. In spite of the developer's good will, it was created on the wrong cultural assumption that most last names in Argentina have two parts (in addition, the translator seems to be no longer working already).

Lack of Zotero web translator coverage forces editors to fall back on manually transcribing citation metadata. For the majority of editors using visual editor, this is a cumbersome process that may deter them from adding references to their contributions, bias references toward those whose sites expose metadata appropriately, or leave broken citations.

Web2Cit solution[edit]

Based on a comment by User:Strainu, the idea is to develop Web2Cit: a visual translator editor, that would enable non-technical users collaboratively create and edit web translators, and define test cases.

Web2Cit would have an API that the Citoid service could use as an additional source (i.e., in addition to official Zotero web translators, Crossref, Worldcat) to resolve URLs provided by Wikipedia editors using community-created translators.

Workflow would be:

  1. A user enters a source URL into the Citoid Extension of Wikipedia's Visual Editor.
  2. The URL can't be resolved, or the user is unhappy with the results (i.e., retrieved metadata has errors).
  3. A separate "community generated" section is shown, with citations formatted after results returned by Web2Cit API, using community translators. "Edit" and up/down-vote buttons may be available next to each of these citations (only for logged-in users).
  4. The user can choose one of these community-generated citations, or open Web2Cit to create a new translator or edit existing ones.

Web2Cit would also act as a web proxy server, adding structured metadata to websites using one of the community translators. This way, the proxied web site will be available for translation with official generic translators by any service relying on them; including the Citoid service (until they add Web2Cit as an additional source), Zotero's browser connectors, Zotero's ZBib, etc.

Who we are[edit]

Diego de la Hera is Web2Cit's Project Manager and Lead Developer. He is a biochemist and psychology PhD student, passionate about information networks and emergent phenomena, from biology to information science, including insect societies, free software ecosystems, Internet, the web, Wikipedia, and more. Interested in computers and engineering since he was a child, he started more serious programming for his PhD, then shyly contributing to open source projects, such as the web annotation platform Hypothesis, and more recently earning further experience with Cita, a Wikidata addon he developed for Zotero. In love with music, choral singing, biking, and the wind of summer night storms.

Evelin Heidel is Web2Cit's Communications & Community Manager. She is a digital rights activist and community strategist, who has worked at the intersection of digital rights, digital heritage and copyright in Latin America for over ten years. Most recently, she revitalized the internationally-focused Open GLAM initiative at Creative Commons, currently being funded by a $5 Million grant by Arcadia. While working on the Open GLAM project she was a Fellow at the Harvard Library Innovation Lab and International Visiting Scholar at Washington College of Law, American University. She currently is a community organizing consultant for several non-for-profit organizations. She holds a BA in Literature (University of Buenos Aires, Argentina) and is currently enrolled in a Masters Program in Law & Economics of Climate Change at the Latin American Faculty of Social Sciences (FLACSO).

Research team

Gimena del Rio Riande is a Web2Cit's Research Group Member. Nidia Hernández is a Web2Cit's Research Group Member. Romina De León is a Web2Cit's Research Group Member.

Contribute[edit]

  • Right now, we are in the process of collecting problematic URLs. You can help us by identifying URLs that don't work in your Wikipedia editing process. (We will be making a proper page in Meta eventually).

News[edit]

  • September 2021: The first meeting of the Advisory Board will be held! There, we will be presenting some of the timeline, the mockup and other key documents that will help the Advisory Board give feedback and comments.
  • July 2021: We are issuing a Call for members to be part of an Advisory Board. This Advisory Board will help with providing feedback and advice over the development of the project and contributing to its long-term sustainability!

Advisory Board[edit]

Resources[edit]

  • Web2Cit basics (draft). Slides presented at Advisory Board's first meeting.
  • Basic mockup walkthrough. Shows the proposed workflow to (1) define translation goals and a translation recipe for a target URL, (2) define translation goals for another target URL based on the translation output of translating this URL with the translation recipe defined for the first one, (3) define a separate translation recipe for a third target URL, and (4) define separate translation compartments using URL Path patterns. Draft walkthrough notes available here.
  • Advanced mockup. Shows diverse mock screenshots representing complex scenarios.
  • Technical specifications (work in progress).
  • Development plan. See Technical specifications.
  • Minimum Viable Product specifications. See Technical specifications.