This project is funded by a Project Grant

Report under review

This Project Grant report has been submitted by the grantee, and is currently being reviewed by WMF staff. If you would like to add comments, responses, or questions about this grant report, you can create a discussion page at this redlink.

To read the approved grant submission for this project, please visit Grants:Project/Diegodlh/Web2Cit: Visual Editor for Citoid Web Translators.

Review the reporting requirements to better understand the reporting process.
Review all Project Grant reports under review.
Please Email projectgrantswikimedia.org if you have additional questions.

Welcome to this project's final report! This report shares the outcomes, impact and learnings from the grantee's project.

Part 1: The Project

Summary

Wikipedia is an encyclopedia and references are one of its main pillars. Because inserting citations can be tedious, Wikipedia's visual editor includes a tool, Citoid, that automatically generates citations given a URL or other unique identifier. But this tool sometimes does not work as expected.

Citoid relies on third-party open-source software (Zotero) to generate citations automatically. If webpages embed citation metadata appropriately, citations can be generated smoothly. However, many websites do not do so. In these cases, specific algorithms must be defined to extract metadata, requiring programming skills. And webpages may change, sometimes breaking these algorithms, which then must be fixed.

But how often does Citoid fail? And is there a way to fix it without knowing how to program? These are the questions that we have been working with at the Web2Cit project.

On the one hand, our research team created a script that automatically extracts references and their metadata from featured articles in different language Wikipedias, presumed correct after long curation by the community, and compares them against Citoid's responses. This way we were able to estimate that, on average, 60% of fields returned by Citoid for a given URL are correct. In addition, research results were used to automatically generate a series of test cases that Web2Cit collaborators may use to help improve automatic citations.

On the other hand, our development team created a series of tools to collaboratively improve automatic citations in Wikipedia. These tools include a translation server that returns citation metadata for a given URL using translation procedures collaboratively defined.^{[Notes 1]} These procedures are created and maintained using a visual editor that significantly lowers the technical barrier needed to participate. It is worth noting that these procedures do not neglect Citoid results, but rather complement them. In addition, a user script integrates these collaborative automatic citations into Wikipedia, which may also be used from other projects as well. Finally, a monitor constantly compares test cases defined by collaborators against actual results, and notifies interested users of any changes worth their attention.

Finally, because Web2Cit is a set of tools to collaboratively improve automatic citations, and because its integration with Wikipedia currently depends on users installing a user script, its value would be none without a community. For this reason, the project included a community and communications branch which made sure the project was developed considering the communities' needs, and trying as much as possible to communicate the project via conferences, workshops, documentation, etc.

Web2Cit is already up and running! We encourage you to join the young Web2Cit community in identifying automatic citation flaws and collaboratively fixing them!

Project Goals

By mid-2022, up-to-date figures of the current Citoid coverage gap (i.e., Wikipedia sources not understood by Citoid) will be available.

The project's research team developed a script that:

identifies and extracts references and their presumably correct metadata from high-quality featured articles in different language Wikipedias;
gets Citoid's response for the URLs extracted above and compares them against the extracted metadata to estimate Citoid's performance.

The script and the results of these analyses are available publicly from the research project's subpage. Interestingly, the script may be run again any time in the future to re-evaluate Citoid's performance and, with minor modifications, to understand Web2Cit's impact.

Results have also been used to automatically generate Web2Cit tests, which may guide the Web2Cit community with collaboratively defining translation procedures.

By mid-2022, there will be an open source tool, Web2Cit, that enables Wikimedia, Wikipedia, Zotero and other communities to easily, non-programmatically, create and edit web translators, to collaboratively increase website compatibility with the citation metadata retrieval service, Citoid.

A set of open-source software tools have been developed, including:

a translation server that returns citation metadata for a given URL, using translation procedures collaboratively defined by the Web2Cit community. These metadata can be used from Wikipedia (see below) and other projects relying on Zotero translators, such as Zotero Connectors and ZoteroBib;
a custom JSON editor that enables the Web2Cit community to collaboratively set Web2Cit configuration using a form interface requiring much less technical skills than those previously needed to change automatic citation results;
a Wikipedia user script that integrates Web2Cit into Wikipedia, complementing Citoid results with those from Web2Cit.

By mid-2022, the tool will be known to and understood by as many Wikimedian communities as possible, across different languages.

To promote a community around the Web2Cit project, we:

Issued a call for and created a diverse Advisory Board, with which we communicated continuously throughout the project for feedback around different aspects of it.
Presented the project at different conferences, including Wiki Conference North America 2021, and WikiWorkshop 2022.
Organized 5 English and Spanish workshops, and made slides and recordings available, where possible.
Created detailed user and technical documentation.
Created a translation project in Translatewiki, to encourage translation of the software to as many languages as possible.

Project Impact

Important: The Wikimedia Foundation is no longer collecting Global Metrics for Project Grants. We are currently updating our pages to remove legacy references, but please ignore any that you encounter until we finish.

Targets

In the first column of the table below, please copy and paste the measures you selected to help you evaluate your project's success (see the Project Impact section of your proposal). Please use one row for each measure. If you set a numeric target for the measure, please include the number.
In the second column, describe your project's actual results. If you set a numeric target for the measure, please report numerically in this column. Otherwise, write a brief sentence summarizing your output or outcome for this measure.
In the third column, you have the option to provide further explanation as needed. You may also add additional explanation below this table.

Planned measure of success (include numeric target, if applicable)	Actual result	Explanation
Goal 1: up-to-date Citoid gap figures
Output: We will conduct research on different language Wikipedias to understand what the current Citoid coverage gap is; that is, which Wikipedia sources are not understood by Citoid.	91,000 references from 10,500 featured articles in four Wikipedias (English, Spanish, French and Portuguese) were analyzed. Citoid performance was estimated to be 60%: that is, on average, Citoid would return correct metadata for between 3 and 4 of the 6 fields considered in our study. Our results complement and expand on previous research by colleague Wikimedians.	The research team developed a script that automatically identifies and extracts references from featured articles in different language Wikipedias and compares them against the corresponding Citoid responses.
Outcome: This research will provide a series of sources that the community can create site-specific web translators for, using Web2Cit.	The research report provides a general overview of what sources and citation fields the Web2Cit community may focus on. On the other hand, the interactive notebook provides finer detail. Finally, Web2Cit tests have been automatically generated from research results and may be used directly via the Web2Cit monitor and server.
Outcome: In addition, these up-to-date figures will provide a baseline value to compare against in the future, after Web2Cit has been available for some time.	The research script has been developed in such a way that running it again in the future to get up-to-date performance estimates should be trivial.	Minor adaptations may be made to include additional Wikipedia languages, citation templates, parameters, and Web2Cit results.
Goal 2: Web2Cit development
Output: We will develop Web2Cit front-end, API and web proxy, and make the source code available under free (libre) software licenses.	Web2Cit server (formerly API and web proxy) and custom JSON editor (formerly front-end) are now available from https://web2cit.toolforge.org/. These are part of a set of software components that have been developed and made available on Wikimedia's GitLab repositories under the GPL v3 license, including: w2c-core, w2c-server, w2c-gadget, and w2c-monitor.	Note that the Web2Cit front-end is now the Web2Cit custom JSON editor, and that the API and web proxy are now the Web2Cit server.
Output: We will propose Citoid service, API and extension enhancements to more seamlessly support Web2Cit (optional; see Citoid enhancements below).	A Wikipedia user script has been developed that integrates Web2Cit into Wikipedia's visual editor Citoid extension. So far this user script has been installed 33 times. In addition, Web2Cit main functions are implemented as a JavaScript library (Web2Cit core) available as a npm package, making it easier to be included into the Citoid service in the future, if that ever is considered appropriate, or any other JavaScript project.	Note that user script installations count per user and per Wikipedia instance. For example, the same user installing it on the English and the Spanish Wikipedia counts as two separate installations. See installation details here.
Outcome: By enabling non-technical users create and edit web translators, Web2Cit will help increase the coverage of websites supported by Citoid, hence encouraging the insertion of a higher diversity of references to Wikipedia articles. This would especially benefit non-English Wikipedias, since most site-specific translators currently available are for English sources.	So far, Web2Cit configurations have been defined for 45 domains (as a reference, there currently are 645 Zotero web translators), 19 of which are non-English (including Spanish, Portuguese, Romanian and Bengali). These have been collaboratively defined by 15 Web2Cit collaborators. See corresponding Catanalysis report here.	Note that initial versions of Web2Cit tools have been released just 5 months ago, in May 2022, and they have been improved since then. We expect the Web2Cit community to continue contributing Web2Cit configurations from now on.
Outcome: In addition, most popular community translators may be identified and submitted to Zotero's translators repository, hence more widely benefiting all services relying on Zotero translators as well.	Other projects relying on Zotero translators, such as Zotero Connectors and ZoteroBib, may already benefit from Web2Cit collaborative automatic citations because results returned by the Web2Cit server include embedded metadata understood by Zotero generic translators. A conversor from Web2Cit JSON configurations to Zotero JavaScript translators should be possible, but has not been developed yet; see T302693.
Outcome: The release of the source code under free (libre) software licenses will enable continued improvements by the Wikimedia developer community.	The Web2Cit project has already attracted attention from other developers. For example, 15 users show as subscribed to one or more tasks tagged as Web2Cit or any of its subproject tags in Phabricator (excluding Web2Cit project members and Phabricator's administrator). In addition, we have provided detailed technical documentation to simplify and encourage participation of potential contributors (e.g., Web2Cit core documentation).	See the output of this PAWS notebook for the list of Phabricator users subscribed to one or more tasks tagged with the "Web2Cit" project tag or one of its subproject tags.
Goal 3: spreading the voice
Output: We will continuously communicate with the communities, to provide updates about the status of development, and to get their feedback.	We put together an Advisory Board with 9 people from different community and knowledge backgrounds, who helped us from the beginning of the project providing valuable feedback on all of its aspects: development, research and community. We had 6 online meetings and created a mailing list with 23 subscribers where we have exchanged emails around 35 topics. By the end of the project we created a News page that people can add to their watchlists to stay up to date with Web2Cit's latest news. In addition to our workshops (see below) we presented the Web2Cit project at WikiConference North America 2021, and preliminary research results at Wiki Workshop 2022. We will also give a presentation about the project at WikiConference North America 2022. We met online with key community and Foundation members, including Giovanna Fontenelle, Diego Sáenz-Trumper, and Peter Coombe.
Output: We will create written and video documentation and training materials.	We published a quick start introduction to Web2Cit on the project's homepage. In addition, we published over 10 detailed user and developer documentation pages. In sum, written documentation currently adds up to over 142 kilobytes (as a reference, the Argentina article in the Spanish Wikipedia is 113 kilobytes long). We published documentation videos, such as the Web2Cit ecosystem or the Web2Cit core architecture videos, linked from our written documentation and listed at the Project resources section below. We published slides and recordings (where possible) of our workshops.
Output: We will set up mechanisms to engage the community in translating the tool to other languages.	All software components were developed with internationalization in mind. The server is already internationalized and is currently being translated collaboratively (see below). On the other hand, internationalization of the JSON editor and of the monitor is planned according to T316951 and T321606. Finally the user script was designed such that translation should not be necessary. A Translatewiki translation project was created to collaboratively translate Web2Cit tools. This way the Web2Cit server has already been translated (25% coverage or more) to 14 languages in addition to English and Spanish. Collaborative translations of our early documentation pages were preserved (with the kind help of translation administrators Pppery and Pols12) when part of their contents were moved to the current documentation.	Translation of documentation pages has not been enabled yet because doing so too early caused trouble in the past, as discussed in the Learning section below. We would rather wait until Web2Cit is more widely known and documentation pages have stabilized before enabling collaborative translation on them.
Output: We will organize a set of public workshops to present and explain the tool.	We organized 5 public workshops: Independent English workshop on May 11, 2022: 12 participants English workshop at Wikimedia Hackathon 2022: approximately 17 participants Spanish workshop at Wikimedia Argentina's and Wikimedistas de Uruguay's Wikiherramientas: 130 views on YouTube to date Spanish hybrid workshop (online + in person) co-organized with Wikimedia Colombia: 7 participants English workshop at LD4 Wikidata Affinity Group Call: more than 30 participants Slides, notes and recordings have been openly published (where available) on our Workshops page.
Outcome: Engaging different language Wikipedia communities and providing documentation, training materials, translation tools and workshops, will help ensure wide and continued adoption of Web2Cit.	As mentioned above, so far we know that Web2Cit configuration files have been collaboratively defined by 15 people, and that the Web2Cit user script has been installed 33 times. In addition, 14 people have contributed translations via Translatewiki, and 15 people have subscribed to Web2Cit tasks in Phabricator (note that there may or may not be an overlap among these groups of people).	When considering these adoption and engagement metrics it is worth noting that the first versions of the Web2Cit tools were published 5 or less months ago, and that written and video documentation resources are even more recent. We hope that Web2Cit adoption will continue growing in the months to come.
Participation/Content goals
Total [workshop] participants: 10-20 for each workshop, between 30-60 people engaged through the workshops	As noted above, we organized 5 workshops. The average number of participants per workshop was around 15 people, resulting in more than 70 people participating live.	Note that 70 is the approximate number of people participating live. In addition, some workshops were recorded and people may have watched these recordings and continue to do so afterwards. For example, the recording of the Spanish workshop hosted at the Wikiherramientas cycle has been watched 130 times on YouTube so far.
User-contributed [Web2Cit] translators: 15	So far 84 tests, templates or patterns configuration files have been collaboratively defined by 15 Web2Cit contributors, corresponding to 45 website domains.	See the details in the aforementioned Catanalysis report.
Citations added using these community translators: 20	Unknown	Our workshops focused on collaboratively configuring Web2Cit for community relevant domains, including translation tests and templates. Although we did show how to insert citations using Web2Cit, we did not focus on this metric during our workshops. In the future, we may use requests to the Web2Cit server from the Web2Cit user script to quantify Web2Cit citations generated from Wikipedia (see T302696). However, this would be just a rough approximation, as we cannot know if these citations are actually inserted in the end. See T321568 for an alternative approach.
Content pages improved: 20	Unknown	As mentioned above, we did not focus on tracking citations added (nor pages improved) during our workshops. In the future, logging the HTTP referrer on Web2Cit requests from the Web2Cit user script may help us understand from which Wikipedia articles Web2Cit citations are being requested (see T302696). But, as noted above, we cannot be sure if these citations are actually inserted in the end. See T321568 for an alternative approach.
Languages in which the tool will be translated: 5 new languages aside from English and Spanish	The Web2Cit server was made available for collaborative translation on Translatewiki a month ago, on September 28, 2022, and has already been translated to 14 languages with 25% coverage or more, excluding English and Spanish.	See the community translation goal above for more information.

Story

Looking back over your whole project, what did you achieve? Tell us the story of your achievements, your results, your outcomes. Focus on inspiring moments, tough challenges, interesting anecdotes or anything that highlights the outcomes of your project. Imagine that you are sharing with a friend about the achievements that matter most to you in your project.

This should not be a list of what you did. You will be asked to provide that later in the Methods and Activities section.
Consider your original goals as you write your project's story, but don't let them limit you. Your project may have important outcomes you weren't expecting. Please focus on the impact that you believe matters most.

Development

The main goal of the Web2Cit project grant was to develop a set of tools to collaboratively improve automatic citations in Wikipedia.

This development branch involved two sequential stages: (1) the first one of research and design of the Web2Cit way to web extraction, and the second one of design and implementation of the ecosystem of tools that would implement this Web2Cit way.

The Web2Cit way to web extraction

The Web2Cit way to web extraction: an overview of how Web2Cit works.

Web2Cit is based on a series of principles carefully designed during the first phase of the development process. These principles are aimed at providing automatic citations powered by extraction procedures based on specific webpage samples or templates, collaboratively defined by contributors with varying technical skills. Coming up with a coherent design may be considered one of the project's first achievements.

Among these principles, basing procedures on specific webpage samples or templates was decided to minimize abstraction, which may be hard for some less technically oriented contributors. In addition, template applicability rules and control fields (see Templates and Fields documentation, respectively) are meant to remove the need of conditionals, which would have added complexity to the relative simple design of sequential selection and transformation steps nested inside field procedures (see Web2Cit Basics documentation).

Although this design phase took a bit longer than originally expected, we think it was worth it to make the Web2Cit way a relatively simple, rather intuitive and (now) obvious approach to web extraction.

The Web2Cit tool ecosystem

Web2Cit is now an ecosystem of tools working together to provide community configured automatic citations for Wikipedia and other projects. In addition to the software components shown in the video, the Web2Cit community collaboratively defines configuration files, and Web2Cit research results feed directly or indirectly our translation tests.

Web2Cit is implemented as an ecosystem of interrelated software components. All source code has been available on Wikimedia's GitLab under the GPL v3 license from the beginning, making Web2Cit one of the first projects being available from Wikimedia's latest software repository.

We first developed Web2Cit core, a JavaScript library and npm package implementing the Web2Cit approach to web extraction. Making this a separate library allows Web2Cit approach to be available not only from the Web2Cit server, but also from other projects which may need it in the future (see the Next steps and opportunities section below). From a developer's perspective, it is worth noting that this library includes automatic tests to simplify and encourage changes and contributions. Another interesting aspect is the modular nature of its main building blocks: translation fields, and selection and transformation steps. This way, adding new fields or step types in the future should be relatively easy. See the Next steps section for some proposals.

As mentioned before, Web2Cit uses collaboratively defined configuration files. To avoid having to deal with logins, permissions and file histories ourselves, we decided to leverage Meta-Wiki capabilities for Web2Cit storage and keep configuration files publicly under meta:Web2Cit/data/. The same strategy was used for Web2Cit monitor's output, as mentioned below.

Next we developed Web2Cit server, a web service available from https://web2cit.toolforge.org/ which uses the Web2Cit core library and collaboratively defined configuration files available from the Web2Cit storage to return citation metadata for a given URL. This web service supports different response formats. The HTML-format response (example) provides a human-readable translation summary, with links to edit the corresponding configuration files, and collaboratively translated into over 14 languages. In addition, it includes embedded metadata that allow consuming these results from projects relying on Zotero translators (such as Zotero connectors, ZoteroBib and Citoid itself) by simply prepending the Web2Cit server's address to the target's URL. On the other hand, JSON-format responses are used by other Web2Cit ecosystem components, as explained below.

Then, because manually editing configuration files is not easy, we created a custom JSON editor that provides a user-friendly form-like interface to edit these files more easily. It is worth noting that this could become a Web2Cit-independent MediaWiki-wide JSON editor, as proposed in T306837.

We soon realized that expecting users to prepend the Web2Cit server's address to their target URLs wouldn’t get us to far. So we reconsidered the priorities in the original proposal and developed Web2Cit user script which fully integrates Web2Cit into Wikipedia. Users just have to install this user script to get two citations instead of one using the usual citation tool: the one from Citoid and the one from Web2Cit. If unhappy with both, they can jump into the corresponding Web2Cit server's translation summary to tweak translation tests or templates as needed.

The collaborative nature of Web2Cit configuration ensures that anyone can help improve automatic citations. But the same as with Wikipedia this opens a window for accidental or intentional disruption. In addition, webpages may change, breaking procedures defined by the community. How can we keep an eye on these changes to fix them as soon as possible? Finally, we developed the Web2Cit monitor, the part of the ecosystem that routinely runs collaboratively defined translation tests,^{[Notes 2]} and writes results to Meta-Wiki pages (example) that contributors can add to their watchlists to be notified whenever test results change.

Launching early versions of Web2Cit tools as soon as they were available made sure we could have some help testing them. This helped us identify and address important bugs and feature requests before the project grant ended. For example, we recently added JSON-LD selection support. JSON-LD is a popular way to include citation metadata on webpages, but it isn’t supported by Citoid or Zotero yet. This way, Web2Cit brings JSON-LD support to the automatic citation environment of MediaWiki projects.

Using and contributing to Web2Cit
Install the Web2Cit user script and get Web2Cit citations in addition to Citoid citations, without leaving Wikipedia. Not happy with either results? Click on the edit link to open the target URL on the Web2Cit server.
From a Web2Cit server translation summary page, click any of the edit links to edit Web2Cit configuration for the corresponding domain.
Use the Web2Cit JSON editor to change Web2Cit configuration files.
Add Web2Cit monitor result pages to your watchlist and stay up to date of changes on the translation tests.

All in all, a few months ago if a Wikipedia editor using the automatic citation generator found an error, they had the following three options to fix it:

Manually fix the citation data on the citation template. Fast and relatively easy, but doesn't help other editors in the same situation; and if the source type is wrong (which is often the case) they must start from scratch.
Convince the webmaster of the target webpage to correctly embed metadata. This is the best solution long term, but requires time. In addition, some embedded metadata may still not work (e.g., JSON-LD).
Write or fix Zotero translators to extract metadata from the web source. This is robust, but needs programming knowledge, and time for the Zotero community to accept the changes and for Citoid to include them.

Web2Cit is now a fourth option, immediate and driven by the Wikimedia community, yet automatic; which lowers technical barriers to participation, and complements Citoid where Citoid fails, while at the same time introducing new features such as JSON-LD support.

Research

Citoid does not work 100% correctly for all web sources. We knew this from personal experience and from previous research. But exactly how often and where does Citoid fail? Can we answer this in a way that would let us measure Web2Cit impact in the future? And how can we use these data to inform the Web2Cit community about where and how to focus their efforts? These are the questions that the Web2Cit research team have been focused on for the last year.

One of the achievements was how to get correct metadata for a long list of diverse Wikipedia-relevant URLs. To do this we extracted metadata from over 460,000 citations from 10,500 Wikipedia featured articles, assuming their quality would be high after curation from the Wikipedia community.

Another challenge was how would we identify citation templates among the wide variety of templates used by Wikipedia editors, and how to map relevant template parameters to the subset of citation fields we wanted to analyze. We achieved this by collaboratively creating a list of citation templates and parameters.

Our results suggest that, on average, Citoid returns between 3 and 4 correct fields of the 6 fields considered in our study. These and more detailed results are available from our public research report and from our interactive notebook, and greatly expand previous research from 2017 manually analyzing 120 top English news sources.

The script we developed is also publicly available and may be run any time in the future to get up to date results. In addition, the collaborative list of citation templates may be updated to consider further templates or parameters, and the automatic citation API used may be easily changed from Citoid to Web2Cit to evaluate the impact of Web2Cit in the future.

Finally, we developed a script that automatically creates Web2Cit translation tests from the citation metadata that we extracted from Wikipedia featured articles. These translation tests may help Web2Cit contributors identify websites having problems with Citoid and fix them with Web2Cit. So far this script was used to generate translation tests for 39 highly-cited low-performing website domains.^{[Notes 3]} More information about this automatic creation of Web2Cit translation tests from our research results can be found in our Research page.

Community and communications

One of our English workshops.

One of our Spanish workshops.

As a set of tools to collaboratively improve automatic citations, Web2Cit is close to nothing without a community. For this reason, the project included a community and communications branch which focused on communicating the project and its developments and on building a sustainable community around it.

The community that grew around the project may be considered in three layers. First, the core team of diverse people working on the project directly as grantees or contractors. This group included smart, curious people committed to their work and the project, many of whom are willing to continue contributing, as volunteer time permits, now that the project grant has ended.

Second, the Advisory Board. A group of volunteers from diverse community and knowledge backgrounds who believed in the project and provided valuable feedback from the beginning. In addition, they have helped us spread the voice about Web2Cit among their communities, and we trust they will continue to do so once the project grant has ended.

Finally, a third more diffuse and potentially wider community layer, including people who may have learned about Web2Cit in one of our presentations and workshops, in Phabricator, Translatewiki, via our written and video documentation or elsewhere.

We are thrilled to have some key Wikimedians among our community, who believed in our project, provided feedback, spread the voice, and with whom we could have one-on-one conversations. We are also happy to know that our community building efforts not only contributed to the Web2Cit community but also that it may have strengthened some links across the wider Wikimedia community as well. For example, grantee links to Wikimedia Argentina, Wikimedistas de Uruguay and Wikimedia Colombia, research team links to Wikimedia Research and the wider Wikimedia community, and links within the Spanish-speaking technical community, to name just a few.

Community growth has probably been supported by our communication efforts and will likely continue to do so. Among them we may highlight:

public and private meetings including Advisory Board meetings, overall project and research presentations in Wikimedia community conferences, and workshops both independent and co-organized with fellow Wikimedians;
written and video documentation, including our research report, user and developer documentation, and other resources listed below;
communication channels including our Advisory Board's mailing list, Phabricator project tags and workboards, and our News page.

Survey(s)

If you used surveys to evaluate the success of your project, please provide a link(s) in this section, then briefly summarize your survey results in your own words. Include three interesting outputs or outcomes that the survey revealed.

Other

Is there another way you would prefer to communicate the actual results of your project, as you understand them? You can do that here!

Among the many project resources listed below, we recommend the following ones for a deeper understanding of what was achieved during the project:

our homepage
our documentation
our research project page and research report
our workshops page
our detailed monthly reports
our news and updates page

Methods and activities

Please provide a list of the main methods and activities through which you completed your project.

From a project management perspective, some activities included:

Agreeing on an estimate timeline that helped us organize and keep track of what had been done and what was pending.
Creating Phabricator umbrella and subproject tags, and respective workboards to keep track of pending tasks, feature requests, etc.
Keeping regular meetings with team members.
Writing monthly, midpoint and this final reports.

Development branch main activities included:

One of many very early Web2Cit draft design notes, back from Web2Cit was just a developing idea.
Further exploration of alternative web extraction tools (e.g., Portia) and of research on the topic (e.g. Zhai & Liu 2007) to better guide the final design of Web2Cit.
During an initial design phase, carefully considering the problem at hand, and coming up with abstract models of possible solutions. Creating software mockups and technical specification draft documents documenting the results of these thought processes.
Creating software code repositories on Wikimedia's GitLab, under the GPL v3 license. Later on, mirroring these repositories to Github (for better discoverability), and normalizing their README files to point to our on-wiki documentation.
Creating supporting Toolforge tool accounts for dynamic and static serving of web resources, and for running automated Web2Cit monitor tasks.
Web2Cit core development:
- Implementing Web2Cit translation features and algorithm as a JavaScript library.
- Adding automatic code tests to simplify changes and contributor participation.
- Publishing the library as an npm package.
Defining the format of Web2Cit configuration files and clearly documenting them using JSON schemas. Considering pros and cons of using Meta-Wiki as Web2Cit storage, together with the project's Advisory Board.
Web2Cit server development:
- Developing a web server that exposes Web2Cit core functionalities, using the Express framework.
- Internationalizing the server's HTML-format response, translating it to the Spanish language, and then collaboratively to many more via Translatewiki.
- Configuring the Toolforge account and deploying to Toolforge (see the Server documentation page for the details).
Using the json-editor software library to create a custom JSON editor that uses JSON schemas defined above to provide a form-like interface that greatly simplifies editing Web2Cit configuration files. More information on the JSON editor section of the Editing documentation page.
Web2Cit monitor development:
- Writing a draft specification and a job description, and hiring a second programmer to join the project.
- Creating a wrapper that handles communication with the Web2Cit server.
- Creating a module that writes monitor's overview, result and log pages to Meta-Wiki, using a bot account created for this purpose, and custom templates to simplify changes and translations in the future.
- Creating a script that monitors Web2Cit configuration files and adds tasks to a check queue.
- Creating a script that reads the queue and runs programmed tasks.
- Configuring both scripts to run automatically from Toolforge.^{[Notes 2]}

Main activities by the research team included:

Subproject setup:
- Hiring the research team and agreeing on the subproject's deliverables and timelines.
- Familiarizing with citation practices of Wikipedia editors, including citation templates frequently used. Creating a list documenting some of these findings.
- Creating a software repository on Github, including a Jupyter Notebook to code the automatic script to be used.
Collecting curated citation data:
- Thoroughly discussing, internally and with the Advisory Board, the main assumption of our research approach: that of using citations from Wikipedia articles as correct citation metadata.
- Downloading the list of featured articles from 4 Wikipedias and their corresponding contents.
- Creating a collaborative list of citation templates and relevant parameters, to distinguish them among the wide variety of templates used by Wikipedia editors. Using this list to parse articles downloaded above and extract corresponding citation templates. 460k citations were extracted this way.
- At this stage, presenting these preliminary results at Wiki Workshop 2022.
Getting Citoid data:
- Discussing with Citoid maintainers to optimize our large number of requests to the service, minimizing impact and avoiding bans.
- Moving our notebook to PAWS for better performance.
- Fetching 288k citations from Citoid.
Comparing data:
- Defining criteria for cleaning and normalizing metadata from Wikipedia and from Citoid. 91k citations remained after this step.
- Creating a map between Citoid/Zotero and Web2Cit fields.
- Defining comparison strategies, depending on the nature of the data from each citation field.
Research results:
- Creating an interactive notebook and applying filters to understand the data and interpret results.
- Writing the research report.
Extra: Automatic Web2Cit tests:
- Developing a script to automatically create Web2Cit tests (i.e., tests.json configuration files) based on research results. These may be used by Web2Cit contributors to identify domains having problems with Citoid and better understand how to use Web2Cit to fix them.
- Using this script to automatically create such tests for a subset of 39 domains.^{[Notes 3]}

Finally, the community building and communications branch of the project involved activities such as:

Issuing a call for members and putting together an Advisory Board to "help us build sustainability and community involvement for this project". Creating a mailing list for communicating among us, and organizing and holding Board meetings.
Creating a collaborative list of problematic URLs, which helped us at the initial stages of the project with the design, testing and demonstrations.
Creating project pages and written documentation, including a guideline for early adopters to encourage adoption of Web2Cit at its earliest stages.
Producing and publishing documentation videos. This was a faster way to document some aspects of the project before written documentation was available, and now complement each other.
Configuring Translatewiki collaborative translation project.
Having one-on-one meetings with key wikimedians from diverse backgrounds, who gave us feedback on the community, research and development aspects of the project, and helped make the project more widely known.
Organizing and giving workshops in English and Spanish, independent or in collaboration with other wikimedians.
Presenting the project at Wikimedia community's meetings, such as WikiConferece North America 2021.

Project resources

Please provide links to all public, online documents and other artifacts that you created during the course of this project. Even if you have linked to them elsewhere in this report, this section serves as a centralized archive for everything you created during your project. Examples include: meeting notes, participant lists, photos or graphics uploaded to Wikimedia Commons, template messages sent to participants, wiki pages, social media (Facebook groups, Twitter accounts), datasets, surveys, questionnaires, code repositories... If possible, include a brief summary with each link.

Development

Source code repositories, hosted on Wikimedia's GitLab under a GPL v3 license:
- Web2Cit core
- Web2Cit server & JSON editor
- Web2Cit user script
- Web2Cit monitor
- Web2Cit integrated editor (experimental)
Software assets
- Web2Cit core npm package: The JavaScript library implementing Web2Cit approach to web extraction; used by the Web2Cit server, the (experimental) Web2Cit integrated editor, and potentially by any other project that may want to use it in the future.
- Web2Cit server available from https://web2cit.toolforge.org/: Provides automatic citation services. A beta instance for testing new releases is available from https://w2c-beta.toolforge.org/. More information on the Server documentation page.
- Web2Cit user script: integrates Web2Cit into Wikipedia's visual editor.
- Web2Cit custom JSON editor available from https://web2cit.toolforge.org/edit.html, used by the Web2Cit community to edit Web2Cit configuration files, and conveniently accessed via one of the "edit" links on any of the Web2Cit server's translation summary pages (example).
- Web2Cit configuration JSON schemas, served from the Web2Cit server. They define what configuration files should look like, and are used by the JSON editor to provide a customized form-like interface:
- Very early pre-alpha version of Web2Cit integrated editor, injected via a bookmarklet from https://tools-static.wmflabs.org/web2cit/embed.js. More information on the Integrated editor section of the Editing documentation page.
- Web2Cit monitor pages, automatically generated and maintained by the Web2Cit monitor:^{[Notes 2]}
  - Overview page: a list of Web2Cit-configured domains, including a summary of their latest check results;
  - Result pages (example): includes latest detailed test results for a specific domain. Web2Cit contributors may add these to their watchlists to be notified of test result changes;
  - Log pages (example): a list of all checks run for a specific domain, including a summary of each check results.
Supporting accounts, including:
- Toolforge tool accounts:
  - web2cit: hosts and serves Web2Cit server (including JSON schemas), JSON editor, and (experimental) integrated editor.
  - w2c-beta: same as web2cit, for testing purposes.
  - w2cmon: runs periodical tasks to keep Web2Cit monitor pages up to date.
- Web2Cit monitor's Meta-Wiki bot account: used to automatically update Web2Cit monitor's overview, results and log pages.^{[Notes 2]}
Draft design documentation:^{[Notes 4]}
- Web2Cit basic and advanced mockup diagrams, and walkthrough recording, showing an initial proposal of how Web2Cit may work.
- Web2Cit technical specifications: a draft document describing early design principles and decisions.
- Web2Cit core specifications: a draft document describing an initial architecture proposal for the Web2Cit core library.
- Integrated editor design principles and decisions.
Job description for Web2Cit monitor developer, used to help us find candidates for the Web2Cit monitor developer role.

Community and communications

Web2Cit project pages:
- Homepage: provides a quick overview of what Web2Cit is, how to use and contribute to it, and easy access to further information.
- Workshops page: a collaborative list of Web2Cit workshops and resources, including slides, notes and recordings.
- News & Updates page: a page people may add to their watchlist to be notified of Web2Cit news and updates.
- Research page: landing page of our research subproject, including an overview of its goals and results, and links to more detailed information.
- Advisory Board page: a list of the project's Advisory Board members
Written documentation
- Documentation on how Web2Cit works:
  - Basics: an overview of how Web2Cit works and of the parts that make the Web2Cit ecosystem.
  - Fields: translation field types and details.
  - Templates: what are translation templates and how they work.
  - Tests: what are translation tests and how they work.
  - Patterns: what are URL path patterns and how they work.
  - Editing: how to edit Web2Cit configuration.
- User and developer documentation on Web2Cit ecosystem parts:
  - Core
  - Server
  - Storage
  - Monitor
  - User script
  - JSON editor
  - Integrated editor (placeholder)
- Guidelines for early adopters (now archived and replaced by documentation above)
Video documentation:
- Web2Cit core architecture introduction video
- How Web2Cit works video
- How to use Web2Cit video
- Web2Cit ecosystem video
- A series of YouTube videos for Web2Cit early adopters (parts of which may still be relevant):
Other media:
- Web2Cit category on Wikimedia Commons, including screenshots, research report figures, diagrams, etc.
- Web2Cit logos:
- Pre-recorded lightning talks for WikiConference North America 2021, made available on YouTube:
  - in English
  - in Spanish
Discussion channels:
- Advisory Board mailing list, including meeting agendas, notes and recordings privately shared with list members.
- Phabricator project tags, workboards and tasks:
  - Web2Cit (umbrella project tag)
  - Software component project tags
  - Web2Cit research
  - Web2Cit community and communications
Translatewiki's Web2Cit project: for collaborative translation of software interfaces.
Web2Cit list of tools on Toolhub: to promote discoverability of Web2Cit via Wikimedia's tool repository.
Community curated resources:
- Web2Cit collaboratively maintained domain configuration files on Web2Cit storage.
- Collaborative list of problematic URLs, which helped us design and test Web2Cit at the earliest stages.

Research

Code repository
Jupyter notebooks:
- Main research notebook
- Automatic Web2Cit test generation notebook
Research results
- Wiki Workshop conference paper, including preliminary results
- Research report
- Interactive notebook
- Automatically generated Web2Cit tests
Supporting resources:
- Citation templates spreadsheet
- Zotero to Web2Cit field map:

Learning

The best thing about trying something new is that you learn from it. We want to follow in your footsteps and learn along with you, and we want to know that you took enough risks in your project to have learned something really interesting! Think about what recommendations you have for others who may follow in your footsteps, and use the below sections to describe what worked and what didn’t.

What worked well

What did you try that was successful and you'd recommend others do? To help spread successful strategies so that they can be of use to others in the movement, rather than writing lots of text here, we'd like you to share your finding in the form of a link to a learning pattern.

Betting on Gitlab. When we were starting the project we had to decide where we would host our source code. We wanted to use a Wikimedia solution, but we were hesitant about using Gerrit, because we did not have experience with it. By that time, Wikimedia's GitLab instance was being tested. Because we did have experience with GitLab and Github, we decided to give it a try and asked to host our projects there. This was a fortunate decision, as just a few months later more and more Wikimedia projects started migrating to GitLab. This probably also prompted Translatewiki integration with Wikimedia's GitLab, which we needed to collaboratively translate our software interfaces. Nonetheless, because Github continues to be an important player in the open-source software ecosytem, we decided to mirror our repositories to Github automatically (something that can be easily done in GitLab) to increase the visibility of the project and to better acknowledge contributions from our collaborators on their Github public profiles.
Tracking tasks on Phabricator. Given that we decided to use Wikimedia's GitLab to host our code, it seemed natural that we used its issue tracker. However, Wikimedia's GitLab does not include an issue tracker, but relies on Wikimedia's Phabricator instead. Although we were relatively familiar with it, we had never used it to track tasks for a project of our own. It seemed intimidating at first to have our tasks tracked along those from the large number and diversity of Wikimedia projects out there, compared to the one-tracker-per-project approach we were used to from previous GitLab and Github experiences. In the end we got used to it and we are now happy to see our project more seamlessly integrated into the ecosystem of Wikimedia projects.
Reconsidering priorities for better integration with Wikipedia. The original proposal considered Web2Cit integration into Wikipedia as an optional goal. This was because Web2Cit server responses would include citations as embedded metadata, allowing consumption from services relying on Zotero translators, such as Wikipedia's Citoid. That is, users would be able to use Web2Cit citations from Wikipedia by simply prepending the Web2Cit server address to their target URLs. This worked, but we soon realized that not having Web2Cit properly integrated into Wikipedia's visual editor would hinder its use. So we decided to change priorities and spend some time looking into user script documentation and Citoid extension's source code to see if we could come up with a relatively easy integration. And we finally did! We think that the user script we developed greatly benefits Web2Cit, as users just have to install it once and then they get Citoid and Web2Cit citations using the same "insert citation" workflow they were used to before.
Our idea of collaborative, template-based, low-technical approach to web extraction. When we started we were not 100% sure that our idea of having a collaborative, relatively non-technical way to improve automatic citations would work. Of course, we had carefully considered the topic when we wrote our proposal, and we had a plan; but we couldn't be 100% sure until we had it working. We spent a long time at the beginning of the project, further considering previous experiences and research, and designing and redesigning our approach. Now that Web2Cit has been available for some months already, we can say with much more certainty that our approach to web extraction does in fact seem to be a useful collaborative and low-technical way to improve automatic citations in Wikipedia.
Leveraging Meta-Wiki as Web2Cit storage. Web2Cit uses collaboratively defined configuration files to provide automatic citations for web sources. Instead of having to create and maintain a custom storage for these files, including user accounts, permissions and change histories, we considered and discussed pros and cons of using a pre-existing MediaWiki instance instead. We finally decided to use Meta-Wiki as Web2Cit storage, under the Web2Cit/data/ subdirectory. This has proved to be a useful choice so far.
Providing a user-friendly configuration file editor. When we first released the Web2Cit server, configuration files had to be edited manually. Although we included instructions to simplify this task (using external editors or alternative formats) in our guidelines for early adopters, manually editing these files was too complicated and error prone. We knew this would no longer be a problem when we had our planned editor ready, but it was going to take us a long time and we wanted people to start using and testing Web2Cit right away. So we considered changing our priorities once again and finally came up with a custom JSON editor providing a form-like relatively user-friendly and intuitive interface to edit these files. Although this changed our roadmap (we could not develop an integrated editor in the end), users found this JSON editor useful already, and it gave us valuable time to have people help us test (and improve) Web2Cit, which we wouldn't have had otherwise. In addition, this custom editor has the potential to become a MediaWiki-wide JSON editing tool if the community considers this may be useful, as discussed in T306837.
Using JMESPath for JSON-LD selection. JSON-LD is a popular way to embed metadata on webpages. However, it is not yet supported by either Citoid or Zotero. So, supporting this on Web2Cit was an interesting opportunity. But, how? The JSON-LD selection recently added to Web2Cit uses a simple approach: it creates an array of JSON-LD objects found on a target webpage and uses JMESPath expressions (one of many non-standard query languages for JSON) to return user-selected parts of it. Because this has just recently been added to Web2Cit it has not been thoroughly tested yet, but it seems a promising way to add JSON-LD support to automatic citations in Wikipedia, as shown by this example from www.mediafax.ro, using JSON-LD selection to get the author name.
Clarifying expectations. Based on a research team member's suggestion, one of the first things we did on one of our first meetings was coming up with a concrete list of deliverables of the research subproject. That is, a list of minimum concrete outputs that were expected from the team's work. This document turned out to be very useful, as we used it throughout the development of the project as a reference to stay on track. This strategical clarity of expectations has been previously documented as a learning pattern, and a similar approach was used for the development of the Web2Cit monitor.
Setting aside time to write documentation and just write, collaboratively. Writing documentation was one of the goals of the project. However, this is often tedious and a task that keeps getting postponed. Our project had two main stages of documentation writing. The first one was around the time we made Web2Cit first available, was prompteed by the upcoming first workshops, and included our guidelines for early adopters. The second one was by the end of the project, where we decidedly set some time aside from software development for writing user and developer documentation, as similarly documented in a previous learning pattern. This let us write steadily, some days more, some days less, but constantly. Also, especially concerning developer documentation, we did not aim at documenting every single piece, but rather at writing something useful, that would help new contributors understand where and how to start, and even ourselves remember how things had been configured (e.g., Toolforge environments) or the steps for frequent tasks, such as publishing new npm package versions of Web2Cit core. In addition, we decided to write all our documentation on-wiki (instead of on README files on our software repositories), which greatly simplified making changes, and which we hope will encourage participation from others.
Other things we tried and were successful have already been included in the corresponding section of our midpoint report.

What didn’t work

What did you try that you learned didn't work? What would you think about doing differently in the future? Please list these as short bullet points.

Make project management a separate role. In this project we did not allocate separate time or budget for project management. However, as already noted in our midpoint report, we think this was a mistake, because project management tasks took more time and effort than expected, and had to be covered voluntarily by the grantee. In future projects, we may consider having a separate role and budget for these tasks.
Avoid enabling page translations too early. When we created our first homepage and guidelines for early adopters, we wanted to make them available for as many people as possible. So we asked translation administrators to enable collaborative translation on them. However, it soon became evident that we needed to make changes all the time to keep the documentation up to date, and eventually these pages even became obsolete and where replaced by others. Although we managed to preserve translations contributed by the community, it meant extra effort from us, and from the translation administrators (see for example this thread). We took note of this and we haven't enabled translation yet on our new documentation pages. Although we understand this may discourage people who don't read English, we think this time it's worth waiting until documentation has stabilized and have non-English readers use automatic translation tools in the meantime if needed.
Be more confident about hiring people. As also mentioned in the midpoint report, I took a long time to hire a second programmer for the project. Although this was in part because it took me some time to figure out exactly what tasks would be given to this person, it was mostly because of my lack of experience leading a development team. In the end I managed to do so, and the programmer who joined our team did a great job with developing the Web2Cit monitor in the time that we had left. I hope that this experience will help me in future projects to hire collaborators from an earlier stage.
Be clearer about timelines and expectations. In general, we were able to stick to timelines and expectations. But sometimes we had to rush, and some tasks had to be finished within the month after the project had officially ended, while writing this report. This may have been in part due to inconsistent project management's enforcement of timelines. In future projects playing a project management role I hope I will be able to better support team members with staying focused on their milestones and with meeting their timelines, to help ensure better use of their time and provide a more comfortable working environment.
Early releases are good, but they have to be planned carefully. When we had the Web2Cit core library ready, we wanted to make it available immediately. So we rapidly developed a temporary Web2Cit server, to start spreading the voice about it. Something similar happened when we realized that we needed a temporary way to edit configuration files and we released the custom JSON editor. These supposedly temporary solutions were great, because they opened the field for others to help us test and improve Web2Cit. But it soon became obvious that they had come to stay longer than originally planned. However, as they had been created as temporary solutions, they weren't as well designed from the beginning as the Web2Cit core library, and patches had to be applied here and there, and in general the code quality was not as high as it could have been (for example, less modularity, and lack of automatic tests). In summary, we do think that early releases were useful, but in future projects I would suggest planning them more carefully, maybe learning about Agile methodologies and release life cycle strategies as well, and in general being suspicious about (self) promises of "temporary" solutions.
Betting on an improved editor did not work as expected. When we had the basics running, we wanted to see if we could develop an improved editor that would integrate the Web2Cit configuration workflow into a single sidebar interface. So with about three months left to end the project, we decided to start working on it. Two months afterwards, we had made advancements, but we still had work ahead. So after consultation with our Advisory Board, we finally decided to focus on improving and fixing bugs on our main components instead (i.e., core, server, and user script). This turned out to be a good decision in the end, but it felt a bit sad that we couldn't finish the improved editor that we had started. Nonetheless, it is worth noting that we did manage to leave development at a reasonable enough point, by releasing a very early pre-alpha version (see the Integrated editor section of the Editing documentation page), that we think might be picked up by ourselves or by somebody else in the future, if Web2Cit is more widely adopted and the community thinks doing so might be worth it.
Better documenting thought processes and decisions. I think we managed to have Web2Cit tools well documented. However, I also think that a better work could have been done in documenting the research, thought processes and decisions that led to Web2Cit being what it is now. I do have some personal notes, but they are not ready to be shared; and some of this got documented in one of our technical specification drafts, but they are incomplete. I acknowledge this may be something difficult, especially because exploration and thought can sometimes be disorganized and trying to document everything may not always be possible. And also because one wants to see progress and writing takes time. But allocating some time to better document thought processes and decisions (instead of just the final product) might be something to take into account in future projects.
Plan time for peer review of research results. The research subproject took us a little longer than expected, and the final report could not be written until just a few days ago. Although we did not plan having time to widely share the results and get feedback as part of the project, future projects may consider this and reserve some time at the end for research sharing and peer review. This would allow us to spot errors or make changes based on others' feedback to improve our work. It is worth noting, however, that we did manage to do this with our preliminary results, which were presented at Wiki Workshop 2022.
Workshops could have been longer. Workshops met our participation expectations and were a great way to spread the voice about Web2Cit. However, although we managed to demonstrate how to use Web2Cit, we usually did not have enough time for hands-on participation. Although we made slides and recordings available so participants could work by themselves afterwards, having longer workshops may be considered by people who may give Web2Cit workshops in the future.
More frequent and less technical Advisory Board meetings. No doubt having an Advisory Board and discussing with them was very valuable for the project. But we could have done better. As the project developed, we started meeting less frequently, particularly because we were too busy with other things, including giving workshops, and meetings had become increasingly technical. In future projects involving an Advisory Board, more frequent meetings could be scheduled, making sure that both technical and non-technical subjects are covered, also giving members the chance to give presentations if they are willing to do so.

Other recommendations

If you have additional recommendations or reflections that don’t fit into the above sections, please list them here.

Next steps and opportunities

Are there opportunities for future growth of this project, or new areas you have uncovered in the course of this grant that could be fruitful for more exploration (either by yourself, or others)? What ideas or suggestions do you have for future projects based on the work you’ve completed? Please list these as short bullet points.

We expect that the community of Web2Cit users and contributors will continue growing. The first versions of Web2Cit tools became available only about 6 months ago, and around then we gave our first workshop. Many things have happened since then, including several improvements and bug fixes, more and better workshops, the availability of written and video documentation, etc, etc. We expect these will fuel the growth of the Web2Cit community in the coming months.
We would be thrilled to see people giving workshops on how to use and contribute to Web2Cit! Feel free to use the slides available from our Workshops page, and please let us know if we may help you in any way.
For now we think it's OK that Web2Cit is integrated into Wikipedia using a user script. Hopefully, more and more people will find it useful and install it. Eventually, if Web2Cit becomes even more widely used, it may be considered to have a gadget instead of a user script. And who knows, maybe one day it may be integrated into the Citoid service directly. The fact that Web2Cit is available as an npm package should make this easier.
New translation fields and selection or transformation step types may be supported. The modular nature of Web2Cit core makes adding these relatively easy. In fact, some tasks have already been created to keep track of such feature requests. See for example this one suggesting that a "page range" field should be supported, or this one suggesting that a "case" transformation step should be added, to mention just a few.
An improved Web2Cit editor has been proposed. This may integrate the configuration editing workflow for a domain into a single interface, showing as a sidebar on the webpage being used as translation template or test. In addition, it may provide real-time previews of configuration effects, before saving them to the Web2Cit storage. Although this integrated editor could not be implemented during the Web2Cit project grant, technical specifications and a very early pre-alpha release (see the Integrated editor section of the Editing documentation page ) have been made available. Development may continue in the future, by us or by other contributors, if the Web2Cit community finds the effort would be worth it.
Enabling collaborative translation of the project subpages and documentation we have written would be very useful. However, it has happened to us before that we caused some problems because we enabled translation too soon (see the Learning section above). For this reason, we have not enabled translation on these pages yet, but it may be done in the future, once they have become relatively stable. In the meantime, readers may translate them using browser extensions, for example.
Web2Cit server user interfaces are already being collaboratively translated on Translatewiki. However, some parts of our software have not been internationalized yet; namely, the JSON editor and the pages written by the Web2Cit monitor. These tasks are being tracked in T316951 and T321606, respectively, including a description of how they could be resolved. It is worth noting that the translation project on Translatewiki has been configured to support additional subprojects when ready. In the meantime, users may again rely on browser extensions to translate these web interfaces.
Web2Cit JSON editor was created to simplify edition of Web2Cit configuration files on Web2Cit's Meta-Wiki-based storage. But these are not the only JSON files that exist on Wikimedia projects. If the community would be interested, it should be relatively easy to make this JSON editor a MediaWiki-wide JSON editor, independent of Web2Cit, as described in T306837.
The research team found that on average Citoid would be returning between 3 and 4 correct fields of the 6 considered in our study for a given URL. However, this is just a sample of what could be described with the data that was collected. For example, different language Wikipedias may be considered separately, or results may be more finely compared against previous research. In addition, the script may be run again any time in the future to get up to date results, without changes or including further Wikipedias, citation templates, and even Web2Cit results to evaluate the impact of the project.

These are just a few ideas of possible opportunities for future growth and exploration. It is worth noting that most of them, among many many others, are being tracked and discussed in Phabricator, under the Web2Cit project tag or one of its subprojects.

Part 2: The Grant

Finances

Actual spending

Please copy and paste the completed table from your project finances page. Check that you’ve listed the actual expenditures compared with what was originally planned. If there are differences between the planned and actual use of funds, please use the column provided to explain them.

Expense	Approved amount	Actual funds spent	Difference
Research	$3,240.00	$3,240.00	$0.00
Software development	$43,560.00	$43,560.00	$0.00
Community engagement	$7,500.00	$7,500.00	$0.00
Wire fees	$300.00	$228.78	$71.22 - We could not plan in advance how much wire fees we would have to pay.
Workshops logistics costs (Zoom subscription)	$45	$0	$45 - We ended up using the Zoom subscription of Wikimedistas de Uruguay
Additional buffer & contingencies costs	$780	$0	$780 -
Total	$55,425.00	$54,528.78	$896.22

Remaining funds

Do you have any unspent funds from the grant?

Please answer yes or no. If yes, list the amount you did not use and explain why.

Yes. $896.22 were not used, including:
- $71.22 wire fee funds, because we could not be sure how much we would need and our estimation was a bit higher than what we needed in the end.
- $45 workshop logistics funds, because in the end we could use the Zoom subscription of Wikimedistas de Uruguay.
- $780 buffer and contingencies costs, because we had budgeted $2700 for possible contingencies and ended up using $1920 for the research project (as properly documented in the project's discussion page).

If you have unspent funds, they must be returned to WMF. Please see the instructions for returning unspent funds and indicate here if this is still in progress, or if this is already completed:

Unspent funds have been returned to WMF on 2022-10-31 and the receipt has been sent to grantsadminwikimedia.org.

Remaining funds from this grant have been returned to WMF in the amount of US$896.22.

Documentation

Did you send documentation of all expenses paid with grant funds to grantsadminwikimedia.org, according to the guidelines here?

Please answer yes or no. If no, include an explanation.

No. I have been informed by a grant administrator that submitting grant receipts is no longer a requirement. I will keep all financial records until September 30, 2026.

Confirmation of project status

Did you comply with the requirements specified by WMF in the grant agreement?

Please answer yes or no.

Yes

Is your project completed?

Please answer yes or no.

Yes

Grantee reflection

We’d love to hear any thoughts you have on what this project has meant to you, or how the experience of being a grantee has gone overall. Is there something that surprised you, or that you particularly enjoyed, or that you’ll do differently going forward as a result of the Project Grant experience? Please share it here!

Overall this was a very enjoyable, challenging and inspiring project to me.

First, I learned as a project manager. This was my first experience leading such an interdisciplinary group of people, our team, working on separate complementary aspects of the project. A very interesting and challenging experience, yet a very enjoyable one as well. In this respect I would like to thank my colleague and friend Scann, who in addition to her Community and Communications Lead role helped me as a project management consultant, with her much wider experience managing projects and with the Wikimedia community.

Particularly, I really enjoyed supporting our research team. I come from an academic background myself, being just finishing my PhD now, and having the chance to occupy this overseeing role, with a team of enthusiastic, curious and independent researchers, was definitely a new, exciting and rewarding experience. Thank you Nidia, Gimena and Romina for this opportunity!

Then, I also learned a lot as a software developer and designer, having the chance to recognize how much I enjoy designing systems, modelling them, software architecture. Fields where I have almost no formal knowledge and which I would like to continue exploring and learning in the future.

I am very happy that I finally managed to hire a second programmer for the project. This gave me the chance to occupy a software development leading role, which was definitely new to me but which felt very good, learning new ways of doing and approaching to things, and also giving me more confidence on my own work and knowledge as a programmer. Thank you Dennis for your patience and commitment!

It is worth noting that we were not alone in this project, but supported by a growing Web2Cit community. Particularly, I would like to thank our Advisory Board, who supported us from the beginning of the project, contributing their valuable time and knowledge to its successful development, and helping us feel that we were working for a purpose.

In addition, I am glad to feel that the project contributed its small part into enriching and strengthening links within the Wikimedia community, including the research and technical communities. It was actually an indirect consequence of this project that, in collaboration with Wikimedistas de Uruguay and Wikimedia Colombia, we organized a meeting with the Spanish-speaking technical community, which we hope will further strengthen links and collaborations within it and with the wider community in the region.

I feel much more connected and knowledgeable of the Wikimedia community and Foundation now, and more strongly linked to regional chapters and groups, such as Wikimedia Argentina, Wikimedistas de Uruguay, Wikimedia Colombia and Wikimedia Chile. It was also in the context of this project, and thanks to the grant, that I could relocate to Córdoba, where I helped start Wikimedistas Calamuchita, a growing user group aimed at engaging the people from the Calamuchita Valley with reflecting their cultural, historical and natural identities onto Wikimedia projects. It was thanks to this that I had the honor of being invited to Wikimania Argentina 2022, where I had a great time and met amazing people. I am happy to say I really feel a Wikimedian now!

But the project wasn't perfect and some things could have been better, as noted in the Learning section above. I hope this experience will help me be more confident about hiring other people and clearer about expectations and timelines in future projects, for the benefit of the projects and of the people working on them. I also hope I will be able to take some things easier and avoid unnecessary stress, such as when I was overly worried about being delayed by the end of the first half of this project, and in the end could easily solve it by simply talking about it and asking for a short extension.

All in all, I am very happy with how the project went and ended. I feel that it was generally well thought of, and that it makes sense to have done things the way we have. It feels so well that something that was just an idea at the beginning is something concrete now. And I was very happy the day that I was editing Wikipedia and had the chance to use Web2Cit myself to improve automatic citations for a webpage I wanted to reference! I now hope the Web2Cit community will continue to grow to collaboratively improve automatic citations in Wikipedia, the wiki way.

Notes

↑ In Web2Cit we adopt Citoid/Zotero's jargon and use web translation to refer to extraction of metadata from webpages.
↑ ^a ^b ^c ^d Automatic writing of Web2Cit monitor results is handled by a bot whose status is pending approval until November 2, 2022.
↑ ^a ^b Web2Cit tests automatically generated by the research team will be uploaded to the Web2Cit storage by February 2023, after the Web2Cit monitor has been tested more thoroughly.
↑ Although draft design documentation may contain relevant information regarding historical design decisions, we recommend referring to the user and developer documentation for most up to date information.

[1] In Web2Cit we adopt Citoid/Zotero's jargon and use web translation to refer to extraction of metadata from webpages.

[:1-2] Automatic writing of Web2Cit monitor results is handled by a bot whose status is pending approval until November 2, 2022.

[:0-3] Web2Cit tests automatically generated by the research team will be uploaded to the Web2Cit storage by February 2023, after the Web2Cit monitor has been tested more thoroughly.

[4] Although draft design documentation may contain relevant information regarding historical design decisions, we recommend referring to the user and developer documentation for most up to date information.

[Notes 1]

[Notes 2]

[Notes 3]

[Notes 4]