Community Wishlist Survey 2023/Larger suggestions/Improve speed at which InternetArchiveBot archives links

From Meta, a Wikimedia project coordination wiki

Improve speed at which InternetArchiveBot archives links

  • Problem: citations suffer linkrot and arent archived quickly enough
  • Proposed solution: have an automatic bot that archives links straight away that is built into the automatic citations bit
  • Who would benefit: everyone who uses citations
  • More comments: I already know that it is done automatically eventually, however, I feel that it is too slow
  • Phabricator tickets:
  • Proposer: HoHo3143 (talk) 22:28, 23 January 2023 (UTC)[reply]

Discussion

  • I think an argument can be made that this would be an outstanding redundancy in times and places where news media and even academia may be an enemy of the state. Such a robust automatic backup could insure that versions of events and information is retained before a take down can occur. Not that it is necessarily overly common, but in an ever changing geopolitical landscape, there is a certain utility to automatic duplicates. Foxtrot620 (talk) 02:33, 24 January 2023 (UTC)[reply]
  • @HoHo3143: I've retitled this to make it clearer and more general. I hope the new title accurately reflects your intention. Note that you can manually submit pages to have the InternetArchiveBot process them (not that that necessarily solves your issue, but it's a useful workaround). There are also other tools such as the Internet Archive's Wayback Machine browser extension which allow instance archiving of any page or URL. SWilson (WMF) (talk) 03:50, 24 January 2023 (UTC)[reply]
    • May be there would be some tools to automatically archive pages to Internet Archive and add archiver link into the article. Thingofme (talk) 09:22, 24 January 2023 (UTC)[reply]
      @HoHo3143 Pinging again in case the above question was missed. We're wondering if the manual archiving tool meets your needs, since it gives you a way to fetch archive URLs in real-time, should the bot have not processed recently enough.
      I worry "making InternetArchiveBot go faster" may be out of scope. The bot is wholly maintained by a volunteer, and from we understand it already edits essentially as quickly as it can. MusikAnimal (WMF) (talk) 21:42, 3 February 2023 (UTC)[reply]
      Ok thank you for letting me know. There are large numbers of sources which haven't yet been archived so I thought why not suggest speeding it up. If this isn't possible as it is volunteer based, that is ok. HoHo3143 (talk) 04:25, 4 February 2023 (UTC)[reply]
      @HoHo3143 I wouldn't say it's impossible because it's volunteer-based, rather it's just out of scope for our team since we know it to be a very massive codebase and the production setup is quite complex. We'd rely wholly on the volunteer assisting us. I'm pretty sure making it faster isn't an infrastructural issue (which seemingly is something we could help with), but I could be wrong. I just know reviewing the contributions, the bot already seems to go pretty dang fast. Maybe it could be ran as a second bot to go even faster. Let's just ping the maintainer and ask: @Cyberpower678 Do you have any thoughts on this? MusikAnimal (WMF) (talk) 03:05, 6 February 2023 (UTC)[reply]
      Actually, there is an infrastructure issue. Cloud VPS was recently removed from rate limit exceptions and now the bot is being throttled by a webservice rate limit, not to be confused with the API rate limit. We've reached a scalability limit here. Of course, we are working on optimization to make it be more efficient with the production servers, but ultimately, IABot 3 is what will be the ultimate solution to scaling and speed. IABot 3 is not around the corner though, and is still in the planning and design stages. I agree, the bot is too slow as it stands right now. —CYBERPOWER (Chat) 17:09, 17 February 2023 (UTC)[reply]

Voting