Jump to content

Talk:Web2Cit

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 3 months ago by Cloventt in topic Rapid fund proposal

Please leave any message about the project below.

Source code?

[edit]

Where does the source code for the service (https://web2cit.toolforge.org/) live? Cheers. Mvolz (talk) 13:29, 14 June 2022 (UTC)Reply

Hi, Marielle! Sorry for the delay. For some reason I wasn't notified of this message. The source code is hosted on Wikimedia's Gitlab: https://gitlab.wikimedia.org/diegodlh/w2c-server. It started as something temporary, but as it usually happens it continued to grow, so it's still a bit disorganized. The Phabricator project is https://phabricator.wikimedia.org/tag/web2cit-server/ Diegodlh (talk) 14:23, 21 June 2022 (UTC)Reply

Archive translated page

[edit]

Dear @Pppery. Further elaborating on what was raised here, I would like to archive this version of the Web2Cit home page for future reference (maybe move it to Web2Cit/Archive/Home) to leave room for moving the page from User:Diegodlh/Web2Cit over to here. Unfortunately I cannot move it myself because it's a page marked for translation. Could you please do that? Thank you! Diegodlh (talk) 18:12, 7 September 2022 (UTC)Reply

Happy to do this, but why Web2Cit/Archive/Home rather than Web2Cit/Archive? * Pppery * it has begun 18:13, 7 September 2022 (UTC)Reply
Web2Cit/Archive sounds better, yeah. Thanks! Diegodlh (talk) 18:19, 7 September 2022 (UTC)Reply
Done * Pppery * it has begun 18:37, 7 September 2022 (UTC)Reply
Wow!! I'm impressed how fast that was!! Thank you!!! The new page is already on site. This time we will wait before requesting translation, but this one should be more stable already. Thanks again! Diegodlh (talk) 18:39, 7 September 2022 (UTC)Reply

Default on azwiki

[edit]

Hello, just wanted to let you know that the generator has been activated by default on Azerbaijani Wikipedia =) Toghrul R (talk) 10:52, 21 May 2023 (UTC)Reply

I broke it straight away ;)

[edit]

Sorry Diego! I was trying out Web2Cit for the NZGeo website following along with the demo you did at Wikimania, so that if and when we have a fix for the no title problem for PapersPast, I am ready to configure it. I had it working for NZGeo at least to recognise the correct item type, but then when I tried to go back and add translation for author name and date, I stuffed it up and it doesn't find the translation template, I think. I'm not sure how to fix it! DrThneed (talk) 01:03, 23 August 2023 (UTC)Reply

[edit]

Hello Web2Cit maintainers, contributors, and fans! I wanted to let you know that I highlighted the Web2Cit documentation as a shining example in the new Tool Docs guide that I just published. Thank you for creating lovely tool documentation that can serve as an example to help others create and improve tool docs :-) This guide was created as part of the Doc Your Tool project for the upcoming 2024 Hackathon. If you're interested, please join that project to work on or talk about tool documentation during the hackathon! TBurmeister (WMF) (talk) 16:57, 16 April 2024 (UTC)Reply

Wow, @TBurmeister (WMF)! Thank you so much! I wasn't getting notifications for changes on this page, even though it is in my watchlist; I haven't looked into why this was happening. I've just subscribed to new topics, though. Hopefully this will fix it! Diegodlh (talk) 19:20, 9 August 2024 (UTC)Reply

Author name

[edit]

Hi! I have been working on making translation templates for some Dutch news websites, and I've decided that splitting author names properly is hard, so I've been putting the full name in the last/full names field. However, when I then try to add a reference on the Dutch wiki, it puts the full name in the last name field. Is there some way to make it use the full name field if the first name field is not filled in? Alternatively, could you maybe explain how to properly split names, especially in edge cases where there are multiple authors and/or people have multiple first names or last names? Lwgph (talk) 10:46, 22 August 2024 (UTC)Reply

Hi @Lwgph! Thanks for your message :)
Yeah, splitting author names into first and last names is a complicated issue that goes well beyond Web2Cit. Not only are there technical considerations, but also cultural.
In Web2Cit we have decided to merge the author last and full-name fields because that's Citoid's behavior (the default service that provides automatic citations for Wikipedia) and Web2Cit is meant to simply patch or augment Citoid's responses.
Based on Citoid/Web2Cit response, the Wikipedia Editor chooses a citation template and populates its fields according to the configuration specified in the citation template's TemplateData. You can read more about this here: https://www.mediawiki.org/wiki/Citoid/Maps_TemplateData
For example, for the "Citeer nieuws" citation template in the Dutch Wikipedia, the Citoid configuration in its TemplateData (https://nl.wikipedia.org/w/index.php?title=Sjabloon:Citeer_nieuws&action=edit&templatedata=edit) indicates the following:
"author": [ [ "voornaam", "achternaam" ] ]
Which means that, for each author, Citoid's first field (Web2Cit's authorFirst) will be mapped to the template's "voornam" field, and Citoid's last field (Web2Cit's authorLast) will be mapped to the template's "achternaam". This is standard behavior.
I don't know what the case is for citation templates in the Dutch Wikipedia, but in the English Wikipedia the "last" and "author" (i.e. full name) fields of a citation template are simply aliases, as mentioned here: https://en.wikipedia.org/wiki/Template:Citation_Style_documentation/author. That is, as far as I understand, their values should be formatted exactly the same; it would be just in the wikitext that one would notice the difference.
Your suggestion that, for each author, the second Citoid field may be mapped to the corresponding "full name" field of the citation template (if the first Citoid field is empty) sounds interesting, but I don't think this is supported. Maybe @Mvolz (WMF) knows better?
Alternatively, it could be possible to change the citation template's configuration to always use the corresponding "full name" field (as mentioned here), but this would mean losing the information for split first and last names in all cases, which is probably undesired. Diegodlh (talk) 01:37, 23 August 2024 (UTC)Reply
Thanks for your answer! Good to know that at least I'm not missing anything obvious. I'll just put the names in the correct field to the best of my ability, and it might not always be right, but it's still better than not getting any names at all!
On a somewhat unrelated note, has the server been having some problems recently? I've noticed that sometimes over the past few days when I try to add a reference, the cite tool will just break unless I uncheck Web2Cit, and when I tried the website, it also couldn't translate the url I put in, even when trying urls that worked before. Lwgph (talk) 21:13, 25 August 2024 (UTC)Reply

Keep defaulting to fallback

[edit]

Hi,

Sorry, I don't know of I'm completely stupid, but I've been trying for three hours without results: why does this URL keeps defaulting to fallback? — Omnilaika02 (talk) 19:22, 28 August 2024 (UTC)Reply

Hi, @Omnilaika02! Web2Cit can be quite challenging, especially at the beginning. But you were really close! Just a very small fix was needed.
First, in cases like this I recommend clicking "Enable debugging" at the bottom of the results page. This detailed translation output can be a bit difficult to understand at first, but provides useful information such as which templates were tried, whether they were found applicable or not, what the output for each selection and transformation step was, etc. More (technical) information about this debug output here: Web2Cit/Docs/Server#Debugging
In this case I could see that the translation template that you had created was not even being tried. This usually happens when any of the mandatory fields (itemType and title) have not not been included in the template. In this case it was the title field that had not been included. Well, actually it was, only that under the wrong (and duplicate) name of itemType :). So I just changed the (duplicate) field name from "itemType" to "title". Now the template was no longer ignored!
However, translation was still defaulting to fallback. As seen in the debugging output, the reason was that the template was marked as not applicable because the output of the "title" field (which is a required field) was empty (therefore invalid). Again checking the debugging output, the reason why it was empty was because the XPath selection step was slightly misconfigured: there was no element matching the path //meta[@property="og:title"]/@content. The correct path is slightly different: //meta[@name="og:title"]/@content. Anyways, because Citoid was getting the title right, I changed it to use the Citoid output for title instead.
These two changes were enough to get the template working. But still there was no output for the "authorLast" field (note that in this case this didn't make the template not applicable because the field was configured as non-required). The reason here was the same as before: there was no element matching the XPath provided: //meta[@property="parsely-author"]/@content. Changed it very slightly to //meta[@name="parsely-author"]/@content.
Also there was no output in the "language" field, but this was just because it wasn't configured in the template. Added this field to the template and set it to use the Citoid output for language, which already matched the expected output.
In addition, I also removed two extra unnecessary selection steps in the "publishedIn" field. These are added by default to get this information from any of the three Citoid fields that may provide it, but are not needed if not using the Citoid output.
Finally, I took the liberty to add a date field to both the templates and tests files. You should get this data from Web2Cit now too :)
Please let me know if you have any other questions. I'm happy to help! Diegodlh (talk) 22:57, 3 September 2024 (UTC)Reply
Hi @Diegodlh thank you SO MUCH for your very detailed explanation and your help. I was able to understand exactly what I was doing wrong and correct it, and managed to do another URL without problem! I'm sure this will help other too.
Could you help me with blick.ch ? There is a french version blick.ch/fr and a german version blick.ch, but I cannot work the /patterns out... Thanks again, Omnilaika02 (talk) 06:29, 4 September 2024 (UTC)Reply
Sure, my pleasure! To help you with this, could you please start by defining a couple test cases (expected outputs), one for German-version path, and another for a French-version path? I've seen you've configured a translation template already for path "/fr/news/suisse/ludc-a-de-la-peine-a-y-croire-le-president-du-plr-thierry-burkart-durcit-le-ton-sur-lasile-id20101854.html". I recommend you use this path too for the French-version test case.
Let me know when you are done and I can help you with the patterns and templates :) Diegodlh (talk) 15:40, 4 September 2024 (UTC)Reply
Thank you :) it's done : french version and german version. Omnilaika02 (talk) 17:01, 4 September 2024 (UTC)Reply
Thanks, @Omnilaika02! Based on the expected outputs you specified for both the German and the French examples, I noticed that there don't seem to be differences between the ways how we may get these values from either version. That is, item type and "published in" is the same for both versions, title and author name can be found using the same XPath expression, and even language (which differs between versions) can be retrieved using XPath instead of using a fixed value. Therefore, I see no need to use patterns here.
There is a problem with this site, however, and that seems to be that they have blocked us, both Citoid and Web2Cit. I haven't looked into the details, but probably it's happening what's happening with some other sites as well, as described here: https://phabricator.wikimedia.org/T362379
This causes both Citoid and Web2Cit to fail because they can't retrieve the webpage to parse it. Unfortunately, as far as I know, we don't have a definitive solution for these cases yet. A couple solutions have been proposed, such as registering Citoid as a friendly bot with CDNs (see T370118) or having Citoid process webpages client-side (see T368980), and some of these may eventually be implemented in Web2Cit too; but we are not there yet.
The only workaround there is right now is having a template that does not include any selection step which relies on actually fetching the webpage. That is, only fixed-value selection steps. This is probably quite useless though, of course, but at least may give users a better citation to start with, rather than no citation at all. A URL selection step may be useful here to offer a better title guess, but it hasn't been implemented (see T304326).
I have done this. Note that because in this case we cannot infer the language from the HTML (since we cannot fetch the webpage) it does make sense to use URL patterns and have two separate templates, one for German and another for French pages. Please try this, check the configuration and let me know if you have any questions or comments!
Finally, we are aware that in cases like this understanding what's going on is definitely not clear at all. Task T317448 describes a possible solution, but it hasn't been addressed yet. Diegodlh (talk) 17:40, 5 September 2024 (UTC)Reply
Oh, by the way! We have recently created a Web2Cit userbox that Web2Cit contributors can add to their meta-wiki user page to advertise their Web2Cit skills and contributions. This userbox also automatically adds the user page to the Web2Cit contributors category, making it easier for Web2Cit users and other contributors to find and help one another.
If you think adding this userbox to your user page makes sense, you can do so by just adding {{User Web2Cit}} to it :) Diegodlh (talk) 17:56, 5 September 2024 (UTC)Reply
Thank you for your help. As I see, we are quite limited with Blick... I put the userbox on my profile, and will continue to create templates for newssources I use, and promote the tool in frwiki! Omnilaika02 (talk) 08:30, 6 September 2024 (UTC)Reply

Extend an existing template

[edit]

Do I have to re-add every single field? Is there no way to just use the fallback template for the fields that are msising? Aaron Liu (talk) 14:51, 20 February 2025 (UTC)Reply

Hi @Aaron Liu! Yes, I'm afraid you have to re-add every single field, even if you simply want to use the default Citoid output for them.
Templates work as a block. This is by design, as there may be reasons to skip a whole template if just one field is found to be non-applicable. Therefore, there's no such thing as using parts of the fallback template for fields missing in another template.
However, I acknowledge that what you suggest does make sense. This has been already discussed elsewhere. It would be possible to change the template editor to default to the fallback template settings when a new template is created. But some users found this confusing and the current middle-ground solution was proposed instead. Diegodlh (talk) 17:32, 20 February 2025 (UTC)Reply
That's a different discussion on whether templates should include all fields by default; that still requires the template to have every field. I'm thinking we should allow some sort of inheritance, where Web2Cit scans the less specific yet still applicable templates for fields that aren't already included. This way, we don't need to change every single template that has made a copy of some fallback behavior when/if we want to change some fallback behavior. Aaron Liu (talk) 17:46, 20 February 2025 (UTC)Reply
Would this alternative approach reflect what you have in mind? https://phabricator.wikimedia.org/T302019 Diegodlh (talk) 17:55, 20 February 2025 (UTC)Reply

Range transformation

[edit]

I'm trying to fix https://web2cit.toolforge.org/debug/sandbox/Aaron%20Liu/https://www.haaretz.com/israel-news/culture/2016-05-17/ty-article/watch-the-new-israeli-made-coldplay-video/0000017f-f37f-d5bd-a17f-f77f71840000 . But for some reason, my true range transformation is not generating any substring; it just returns the entire string. Aaron Liu (talk) 15:05, 20 February 2025 (UTC)Reply

Hi @Aaron Liu! Thanks for providing the link to the Web2Cit output. That was very useful for diagnosis!
The Range transformation step is to be applied on a list of values, but instead you are applying it to an individual value (i.e., the string "2016-05-17T17:15:00+03:00").
I have adapted your configuration to include a Split transformation step (to split the string into separate items) before the Range step. Note that I had to add a Join step in the end. See here, in my sandbox.
However, I would recommend a different approach: what about using a Split step first, with separator "T"? That would give you two items as output: ["2016-05-17", "17:15:00+03:00"]. Then you can simply use a Range step with configuration "1" to keep "2016-05-17" only.
I hope this helps! Diegodlh (talk) 17:46, 20 February 2025 (UTC)Reply
Thanks. I feel like it's more performant to split at a specific index. Aaron Liu (talk) 17:49, 20 February 2025 (UTC)Reply
Also, sometimes some Web2Cit service appears down and only produces error messages, and then sometimes it works again. Aaron Liu (talk) 17:50, 20 February 2025 (UTC)Reply
Web2Cit logging is still limited I'm afraid (T302696), but could you please provide some additional context, like what error message you are getting, where (in the Web2Cit page? or in the Wikipedia editor dialog), for what target URL... Maybe this could help us understand what may be going on. Diegodlh (talk) 17:58, 20 February 2025 (UTC)Reply
It just says "No applicable template" sometimes. Maybe citoid is sometimes down?

Target translation error

No applicable translation template found for target webpage

Aaron Liu (talk) 18:08, 20 February 2025 (UTC)Reply
Or websites may be blocking Citoid, as it has been reported for several websites in the past. To be honest, I haven't been editing Wikipedia much lately, so I haven't been using Web2Cit much myself. But I try to see what's going on whenever I get an error myself.
I would recommend that you check what happens in the Wikipedia editor's Add-a-citation dialog in these cases (assuming that you have the Web2Cit user script installed):
  1. Are you getting a citation from Citoid and not from Web2Cit? Or is the citation from Citoid failing too?
  2. Try disabling Web2Cit in the Add-a-citation dialog. Are you still getting an error? If yes, that's probably something with Citoid. If not, that's probably related to Web2Cit.
Diegodlh (talk) 18:14, 20 February 2025 (UTC)Reply
It currently looks like just https://web2cit.toolforge.org/debug/sandbox/Aaron%20Liu/https://www.haaretz.com/israel-news/2025-02-20/ty-article/work-on-arrow-3-missile-defense-system-starts-in-germany-to-be-operational-in-2025/00000195-2398-d293-a1d5-e79f8ae40000 is blocking citoid? I've also had the article whose path I put into the template and currently works, though. Aaron Liu (talk) 18:19, 20 February 2025 (UTC)Reply
Apparently in this case the reason is indeed that haaretz.com may be blocking Citoid.
The reason why it is working for the first URL, though, seems to be that Citoid is fetching metadata from the Internet Archive's snapshot instead. This seems to be a solution they are testing to workaround sites blocking Citoid. I wasn't aware of it and you can read more about it here: https://phabricator.wikimedia.org/T95388.
This would explain why it isn't working for the second URL, which I checked and isn't archived in the Wayback Machine yet. I assume that if you request it to be archived, it may start working again. Diegodlh (talk) 20:36, 21 February 2025 (UTC)Reply
Hey @Diegodlh, thanks for your help last time. I am making a new template today and remembered this thing about the range transformation. The documentation said that when "item-wise=true", the range transformation would take each item of the list independently as a list of characters. That doesn't seem true since I had to split the string first. Aaron Liu (talk) 16:13, 20 April 2025 (UTC)Reply
Also, I wonder if there's a way to select the last item of a list, so that I can generate the author last name? Aaron Liu (talk) 16:17, 20 April 2025 (UTC)Reply
Solved that by using a regex instead. How is multiple procedures for the same field intended to be used? I thought it would find the first procedure with a valid value and return that value, but instead it just sees that the 1st has an invalid value, the 2nd has a valid value, thus the field is invalid instead of using the 2nd procedure's value. Aaron Liu (talk) 17:03, 20 April 2025 (UTC)Reply
Hi @Aaron Liu! Thanks for your questions :)

The documentation said that when "item-wise=true", the range transformation would take each item of the list independently as a list of characters. That doesn't seem true

You seem to have found a bug here. I had forgotten that the Range transformation was supposed to have that behavior when used with itemwise=true and you're right that it doesn't seem to be working as expected. I have created task T392381 in Phabricator and will look into it as soon as possible.

Also, I wonder if there's a way to select the last item of a list

Unfortunately there isn't, I'm afraid. This has been requested previously, but hasn't been added yet; see T305898. I haven't been doing much with the code lately, except fixing bugs and keeping things up and running.

How is multiple procedures for the same field intended to be used? I thought it would find the first procedure with a valid value and return that value, but instead it just sees that the 1st has an invalid value, the 2nd has a valid value, thus the field is invalid

When multiple procedures are defined for the same field, the output of all of them are concatenated into a single field output, as explained in the docs here. If you give me the specific example you are working on maybe I can help you find a way that more closely matches the behavior you were looking for? Diegodlh (talk) 18:28, 21 April 2025 (UTC)Reply

Web2Cit

[edit]

when i used to tool recently, it's not working and i need to unclicked it to make the automatic citation working. Is there any change happened? Agus Damanik (talk) 17:17, 29 March 2025 (UTC)Reply

@Diegodlh, do you know anything about this? It's a serious error that also breaks Citoid generation. I've reported it at phab:T390373. ponor (talk) 15:10, 1 April 2025 (UTC)Reply
Thank you both for bringing this to my attention. I hadn't seen the report on Phabricator, I'm sorry. I'm checking this right now. I will keep you posted both here and on the Phabricator task. I'm sorry for the inconveniences! Diegodlh (talk) 19:09, 1 April 2025 (UTC)Reply
Fixed. Should be working now. Diegodlh (talk) 21:35, 1 April 2025 (UTC)Reply
Thank you very much, @Diegodlh! ponor (talk) 01:24, 7 April 2025 (UTC)Reply

Additional fields

[edit]

Hello, I’m investigating Web2Cit support for major websites in New Zealand.

One thing I’ve noticed (and I may have misread the code) is that the tool can only populate quite a limited range of fields in citation templates. For example, some fields I often fill in on citation templates are via, doi, issn, volume, issue isbn, oclc,journal, publication-place, url-access and pages with multiple authors or editors, as well as some others. In many cases these fields can be easily extracted from a web page structured data, particularly for scientific journals.

Are there any plans to extend the tool to support additional fields in citation templates? I’m happy to contribute to enhancing the list of available fields to populate. Cloventt (talk) 22:37, 28 April 2025 (UTC)Reply

Hi, @Cloventt! Thanks for writing. I didn't see the notification before. Sorry about that.
You are right Web2Cit only supports a basic set of citation metadata fields. The full list is available at https://meta.wikimedia.org/wiki/Web2Cit/Docs/Fields
It should be possible to support additional fields, yes. But there are no plans to do so at the moment. Actually, I'm not currently further developing the tool, except fixing critical bugs to make sure it stays up and running.
Which fields would you be interested in? Maybe we could discuss it and we could work on it together? Diegodlh (talk) 16:59, 5 June 2025 (UTC)Reply
Hi, @Cloventt! I will apply for a Rapid Fund grant to make some improvements to Web2Cit (see thread below). I'm still working on the list of tasks that I will propose to work on, but supporting additional (or maybe all?) fields is one of my candidate tasks. Are there any fields in particular that you consider most important to support? Thank you! Diegodlh (talk) 20:43, 30 October 2025 (UTC)Reply

Rapid fund proposal

[edit]

Hi all! I'm considering applying for a Rapid Fund grant to work on some long-pending tasks to improve user experience, fix annoying bugs and to encourage developer participation.

I'm still deciding on the list of tasks, from the list of open tasks in Phabricator, and some new tasks I may create.

If there is something you would like me to consider, please let me know! If you find your suggestion already tracked in Phabricator, please include the task ID too. And also feel free to open a new task if you want!

Grant applications for the current cycle close this Saturday, November 1st. If you can provide feedback before that, great. But if you can't, I plan to leave room in the proposal for community feedback, so I may tweak the final task list based on your suggestions.

Thank you! Diegodlh (talk) 15:52, 27 October 2025 (UTC)Reply

Fantastic. Good luck, Web2Cit is great and even basic QoL updates would benefit a lot of us :) –SJ talk  16:52, 27 October 2025 (UTC)Reply
Hi Diego, here is what is preventing me from enhancing Web2Cit more: it's incredibly complicated to work on config files with existing tools. That's it and that's all, as Arnold would say.
I know that's not actionable feedback, so here are a few things I think would help. Some might have Phab tasks, haven't checked.
  • simplify and clarify the documentation. I don't know (and TBH as a user I don't care) what is the server or the monitor. I suggest to divide the homepage into sections for users vs casual editors vs hardcore users
  • a browser extension or bookmarklet or Greasemonkey script or whatever that would allow me to pick and click on the relevant page elements and it would just solve the trivial stuff itself (such as the test URL and whatever else can be automatically selected) just like in the browser Inspector, and would then generate a minimal config I can choose to edit further or just save. It should also be able to load the correct config for the current page
  • improved notification on config rot (for instance, suggest me to follow the monitor subpage in the tool above)
  • a notification when Citoid returns more fields than Web2Cit (not sure how that happens, but I've seen it happen)
Strainu (talk) 17:34, 27 October 2025 (UTC)Reply
Hi, @Strainu! Thank you for your feedback. I will consider all of your suggestions and try to include them in the proposal. In the meantime, I have a two comments/questions:
  • improved notification on config rot: I plan to fix phab:T329573; this should enable notifications when test results change for a given domain. Hope this helps.
  • a notification when Citoid returns more fields than Web2Cit: Because of the way how Web2Cit has been designed, a small set of basic Citoid fields are supported. But we could consider supporting a few more. Do you have a suggestion of which fields we may consider supporting?
Diegodlh (talk) 13:56, 28 October 2025 (UTC)Reply
Re Citoid, I am not sure exactly what the extra information is. I'll keep an eye out and when this will happen again I'll let you know what the difference is. Strainu (talk) 14:34, 28 October 2025 (UTC)Reply
Just noting that while reviewing currently open tasks I found phab:T321669 suggesting that Pages field be supported. Would you find this useful too? Diegodlh (talk) 16:29, 28 October 2025 (UTC)Reply
Useful? Sure. But if I were you, I would not focus on single fields (or any one-off improvement). You need to be better than Citoid or integrated in it to justify the project. Strainu (talk) 17:07, 28 October 2025 (UTC)Reply
Here is my brief list of potential improvements to enhance the functionality, usability, and adoption of the Web2Cit tool.
  • To increase adoption and reduce user confusion, Web2Cit translators should be integrated directly into Citoid for all users by default. Many users are unaware of Web2Cit's existence. Furthermore, on wikis where it is loaded as a gadget, some users find the two distinct options for generating a citation confusing. Communities would have instant access to Web2Cit translators, instead of waiting on Zotero translator developers, and Zotero translator deployment on WMF wikis.
  • A feature should be added to allow a single translator to apply to a root domain and all or a specified list of its subdomains. For example, this would allow www.enciklopedija.hr and enciklopedija.hr to use the same translator. A single hrt.hr translator could cover sport.hrt.hr, magazin.hrt.hr, vijesti.hrt.hr, and other subdomains specified in the main configuration file.
  • The maintainer workflow should be simplified by adding a direct link from the list of all translators to that translator's editing interface and its corresponding test data. The current process for returning to edit an existing translator is non-intuitive and cumbersome, often requiring manual steps like the exact address copying and pasting from a JSON file
  • The translator framework should be enhanced by adding a set of chainable text transformation functions: lowercase, titlecase, uppercase first character, etc.
Thanks, and good luck! ponor (talk) 18:59, 27 October 2025 (UTC)Reply
I am certainly supportive of further enhancing Web2Cit. In terms of immediate suggestions, I constantly find the situation where the default title is
Blah blah blah | website name
and so I write web2cit rules to split this into 2 pieces, one to go in the title and one into the publishedIn field (although the later can usually done as a constant value rather than split-and-range). It would be nice to have a built-in shorthand for this very common pattern. Kerry Raymond (talk) 08:10, 28 October 2025 (UTC)Reply
Hi, @Ponor! Thank you for your feedback. I have a few comments/questions:
  • allow a single translator to apply to a root domain and all or a specified list of its subdomains: This is currently supported by the domain aliases feature. Does this help? Do you think addressing phab:T320771 may be useful?
  • adding a direct link from the list of all translators to that translator's editing interface and its corresponding test data: I plan fixing phab:T317977. Hope this helps.
  • text transformation functions: for case transformations there is phab:T302692. I will consider adding it to the list of planned tasks. Any other text transformations you may consider prioritary?
Diegodlh (talk) 14:02, 28 October 2025 (UTC)Reply
@Diegodlh:
  • I'm familiar with the domain aliases feature, I've used it for quite a few translators, but I think that for many sites it'd be better to have at least www.domain.xyz and domain.xyz covered by the same configuration file (if they show the same page; if they do not, there should be a configuration switch to enable or disable this). Maybe Web2Cit can fall back to domain.xyz if a sub.domain.xyz translator wasn't found (again, with a switch to disable)? I can't think of any way of telling Web2Cit to "use this translator for ALL subdomains" – what if there are too many?
The rest will help. Can't think of any other text transformation functions atm, so probably the standard ones would suffice. Unless calls to freestyle (user defined) functions can be added ;-] ponor (talk) 15:48, 28 October 2025 (UTC)Reply
Regarding other transformation functions, I'm considering phab:T302691 suggested by @Kerry Raymond to support "replace" transformation steps, and phab:T305898 suggested by @Strainu to support negative indexing in "range" transformation steps. Would you consider these useful?
Regarding freestyle functions, see phab:T305886. Maybe an integration with Wikifunctions? It would be nice to have, but I'm not sure the effort would be worth it. I will consider it for the proposal. Diegodlh (talk) 16:45, 28 October 2025 (UTC)Reply
Support: A great tool for working with citations; I hope the rapid grant will be helpful to improve and simplify it further. Toghrul R (talk) 10:31, 28 October 2025 (UTC)Reply
@Sj @Strainu @Ponor @Kerry Raymond@ Toghrul R Thank you all for you feedback! I tried to consider as many of your recommendations as possible to select the tasks for the proposal. It is now available here. You can endorse it or provide further feedback there if you want. Again, thank you all for your contributions! Diegodlh (talk) 16:18, 3 November 2025 (UTC)Reply
  • I think there is a lot to like with this tool. The main thing I would suggest is adding support for additional citation template params. The idea is that for certain sources, it would be great to extend the citoid data with slightly more static stuff. Major ones would be things like via etc for websites that host content from other sources.
However, increasingly websites are putting themselves behind bot-blockers to prevent AI scrapers. WMF ourselves have started doing this. In practice this means that citoid cannot even access websites to check the information. I've decided that this basically makes this tool nearly useless for some major websites, so putting effort into a browser-based tool makes more sense I think. Cloventt (talk) 23:31, 8 November 2025 (UTC)Reply