User talk:InternetArchiveBot

From Meta, a Wikimedia project coordination wiki


Archive
Archives

Connect with the developers and other users[edit]

Telegram IRC (irc.libera.chat #iabot)

Operation status[edit]

For the most up to date information see the run pages or Wiki Operations Summary on Airtable

  • 🟢 InternetArchiveBot is currently running on 300+ Wikimedia wikis.
  • 🟢 We have moved the management interface to a new server. Please start using iabot.wmcloud.org instead of iabot.toolforge.org. Please let us know if anything broke during this process.
  • 🟡 Testing is stalled on Alemannisch Wikipedia (als), Asturian Wikipedia (ast), and Japanese Wikipedia (ja).
  • 🔴 Bot is approved but disabled indefinitely pending software improvements on French Wikipedia (fr), MediaWiki.org, Norwegian Nynorsk Wikipedia (nn), Polish Wikipedia (pl), and Portuguese Wikipedia (pt).

Last updated: 20:54, 5 November 2023 (UTC)

How this page works[edit]

  1. Ask your question in any language. Questions in English or German will receive the fastest responses.
  2. Our team will try to respond within seven days.
  3. Seven days after our response we will mark the thread as resolved. This queues the thread for archiving.
    If our response does not answer your question, you are welcome to remove the "section resolved" tag and write an additional comment.
  4. Seven days after the thread is marked as resolved, it will be archived. Once a thread is archived, it should not be un-archived. Instead, create a new thread and link to the old one.


SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days.


Inaccessible link[edit]

IABot “fixed” a link that it reported as inaccessible here: https://nl.wikipedia.org/w/index.php?title=Chora_%28Patmos%29&diff=66511300&oldid=66454976

However, the link works fine with http on my end. Now I do agree that https is safer (although in this case it was hardly an improvement), but that's no reason to treat a link as “inaccessible”. Mondo (talk) 18:15, 13 December 2023 (UTC)[reply]

Hello Mondo. The bot did not necessarily declare the link inaccessible, though the edit summary would indicate that because the bot's edit summaries are very imprecise. The bot upgrades HTTP links to HTTPS where possible, separately from its process of fixing dead links. Harej (talk) 18:20, 13 December 2023 (UTC)[reply]
Hello Harej, in that case, it's against the guidelines of the Dutch Wikipedia. We have the guideline “bij twijfel niet inhalen”, which is similar to the one on EN:WP called If it ain't broke, don't fix it, except that ours is much more detailed. The link was not broken and https hardly made a difference with this specific link, therefore it was against the guideline. I have reverted IABot and added the article to the deny list, but I hope this can be fixed, because this will happen again on other pages. Mondo (talk) 18:23, 13 December 2023 (UTC)[reply]
So I'm guessing you won't do anything about this. This is not the first time you're not open for communication. In that case, I'll be requesting a temporary halt of the bot on NL:WP, partly because of what the bot did here against our guidelines and partly because of your constant lack of communication. Mondo (talk) 10:00, 17 December 2023 (UTC)[reply]
.
3750 2409:4081:2E1B:10CF:C8EC:865C:203E:844A 06:20, 16 April 2024 (UTC)[reply]

It's not resolved. I explained what the issue was back in December and nothing has changed. Mondo (talk) 19:27, 3 April 2024 (UTC)[reply]

Mondo, as explained above, it is our practice to replace HTTP with HTTPS on all wikis, and we are not changing that. Continuing to remove the "section resolved" template will not change this. If changing HTTP to HTTPS is in fact against policy, please cite the policy. Harej (talk) 20:09, 3 April 2024 (UTC)[reply]
If you wanted me to cite the policy, it would've been nice to know that when I posted my last comment instead of not responding to me for months. But here you go:

https://nl.wikipedia.org/wiki/Wikipedia:Bij_twijfel_niet_inhalen
“De ene goede variant door de andere goede variant vervangen is geen verbetering of verslechtering, maar een neutrale bewerking. Dergelijke bewerkingen zijn ongewenst”

Which translates to: “Replacing one good variant with another is not an improvement nor the opposite. It's a neutral edit. Such edits are undesirable.

Replacing http with https is exactly that: http works fine, i.e. it's a good variant, which makes it against policy. Now I could see it being somewhat useful if it's a URL where security is of the utmost importance, but in this case it's a link to a spreadsheet file. There's nothing that https will do to protect the user in this case. (Or if the http link was dead and replaced with https.) Mondo (talk) 20:18, 3 April 2024 (UTC)[reply]
And that goes for a lot of URL's, btw. Not all of them need https. http is fine in a lot of cases. Mondo (talk) 20:20, 3 April 2024 (UTC)[reply]

Adding strange, non-functional archive links to en:2024 Haneda Airport runway collision[edit]

Please see this diff. I'm not sure what's going on, but InternetArchiveBot keeps adding incorrect archive links pointing to a googleads.g.doubleclick.net page that doesn't seem to exist rather than to the kyodonews.net link that's actually present in the reference. (It's also edit-warring with Citation Bot, which correctly removes the bad archive links.) This appears to be a bot problem rather than an Internet Archive problem, as the proper link does exist in the Internet Archive. Jay8g (talk) 00:20, 7 January 2024 (UTC)[reply]

Jay8g, this should now be resolved. Please let us know if it happens again. Harej (talk) 20:11, 3 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

Useless bot edits[edit]

Tracked in Phabricator:
Task T361746

Hi! What is the point of these two changes?

Ideawipik (talk) 02:02, 25 January 2024 (UTC)[reply]

Hi I was wondering the same, that is why the bot keeps replacing .is links with .today ones, even if the only one working are .is.
I've corrected the same page twice now, so i was wandering how to make it stop. Astubudustu (talk) 10:39, 16 March 2024 (UTC)[reply]
Ideawipik, Astubudustu, while "archive.today" is the standard domain and we tend to standardize this domain, you are right that if this is the only content of the edit, the edit should not be made. I have prepared a bug report. Harej (talk) 20:21, 3 April 2024 (UTC)[reply]
Thank you so much! Astubudustu (talk) 20:54, 3 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

cbignore[edit]

Why didn't cbignore work? Proeksad (talk) 20:20, 28 January 2024 (UTC)[reply]

Proeksad, for whatever reason the "Cbignore" template was not configured as a setting for Russian Wikipedia. This setting has now been changed. Harej (talk) 20:26, 3 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

Archive.ph → Archive.today[edit]

https://nl.wikipedia.org/w/index.php?title=Patreon&diff=next&oldid=66920330

and

https://nl.wikipedia.org/w/index.php?title=Prog_(tijdschrift)&diff=next&oldid=66920158

But archive.ph is the same service and the link with ph works fine. This is again a clear violation of the Dutch version of “if it ain't broke, don't fix it” guideline, just like the most recent time we spoke. Mondo (talk) 20:11, 1 February 2024 (UTC)[reply]

Mondo, bug report has been filed. Harej (talk) 20:29, 3 April 2024 (UTC)[reply]
Thank you. 🙂 Mondo (talk) 20:38, 3 April 2024 (UTC)[reply]
I replied in the Phab giving the technical reason why, it's done for functional reasons not cosmetic, archive.today is a special domain that is functionally more reliable then the other ones, and it's also the domain the owners of archive.today requested we use on Wikipedia as a safeguard against potential future outages. -- GreenC (talk) 14:42, 4 April 2024 (UTC)[reply]
They can request whatever they want, but at least on the Dutch Wikipedia, changes at the request of owners are seen as an unwanted change and even without their request it's seen as an unwanted change, so something still needs to be done about it. Mondo (talk) 14:57, 4 April 2024 (UTC)[reply]
Besides, it looks like the bot doesn't even care for archive.today that much anyway, as it just changed an archive.today URL to archive.is: https://nl.wikipedia.org/w/index.php?diff=prev&oldid=67337586 (the second highlighted reference). I used IABot for this. Mondo (talk) 19:56, 7 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

The bot keep adding archive link where it isn't required.[edit]

The bot always try to add this link but it isn't needed. It happened like 3 times and I had to cancel the change every time.

https://web.archive.org/web/20211012034604/https://incubator.wikimedia.org/w/index.php?hidebots=1&translations=filter&hidecategorization=1&hideWikibase=1&limit=50&days=3&title=Special%3ARecentChanges&testwiki=wp%2Fryu&urlversion=2 Patronus95 (talk) 12:51, 2 February 2024 (UTC)[reply]

Patronus95, where is this link being added? Harej (talk) 20:53, 3 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

stalled out job?[edit]

https://iabot.wmcloud.org/index.php?page=viewjob&id=17011 didn't noticed this had stalled out 2 days ago Akaibu (talk) 18:14, 7 February 2024 (UTC)[reply]

Akaibu, looks like it is now done. Sometimes it can take a while. Harej (talk) 20:54, 3 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

Finlex.fi URLs aren't dead[edit]

Bot's edits: [1], [2], [3]. Some URLs it tagged as dead but are actually working: [4], [5], [6]. 85.76.13.79 18:33, 10 February 2024 (UTC)[reply]

The site has a "Are you human?" check box and that is probably the cause. I set the domain to Subscription for now. It will stop the bot from changing it to dead. It also means that bot won't be fixing dead links, for this domain. -- GreenC (talk) 15:01, 17 March 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

urldatachangestate[edit]

Hi!

I'm translating InternetArchiveBot user interface into Hebrew, and I have a question.

The message urldatachangestate says "from <b>{{logfrom}}</b> to <b>{{logto}}</b>". I guess that "{{logfrom}}" and "{{logto}}" are something like "live", "dead", etc., but can you please explain more specifically what are the possible values?

And are they always in English, or can they be translated?

I'll update the documentation for translators after you reply.

Thanks! Amir E. Aharoni (talk) 03:00, 29 February 2024 (UTC)[reply]

Dead, Dying, Alive, Unknown, Subscription, Permadead, and Permalive are the statuses and yes, they are translatable. —CYBERPOWER (Chat) 21:09, 3 April 2024 (UTC)[reply]
Thanks! I updated the documentation accordingly. Amir E. Aharoni (talk) 15:07, 4 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

Issue with Billboard short links[edit]

I've run into this a bit when going through the url=value CS1 pages. So, this bot was just run on https://en.wikipedia.org/w/index.php?title=Tony_Martin_(British_singer). If you look at the comparison between 9 January 2024 and 2 March 2024 (04:13), you'll see that one of the changes made was to the shortened Billboard link used by previous editors. I'm fixing it with the long links, but it seems IABot wants to change the symbols used to shorten URLs on Wikipedia into the code used in URLs? I've been fixing these for a while, but they aren't the only issues I come across in the CS1 pages, so it's the first time I've noticed which bot is doing this particular function.

I though someone should know. OIM20 (talk) 09:51, 2 March 2024 (UTC)[reply]

OIM20, thank you for letting us know. I have filed a bug report. Harej (talk) 21:13, 3 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

Bot citing dead link on talk pages[edit]

On talk pages where the bot leaves a description of its edits (example), it links to a dead page where we are supposed to report errors. Ubh (talk) 15:36, 7 March 2024 (UTC)[reply]

The URL changed to https://iabot.wmcloud.org. Please don't report errors from 6+ year old edits. They are far too old to be meaningful in improving the bot.—CYBERPOWER (Chat) 21:14, 3 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

IABot for Gagauz language[edit]

Can you please authorize me to use IABot for Gagauz language to on the gagwiki (Gagauz Wikipedia)? I can currently use it for English (enwiki) and Russian (ruwiki), but not the Gagauz one.

When I try to use the bot on a gagwiki (Gagauz Wikipedia) page, I get "Permission error" and "The action you are trying to perform requires the analyzepage permission." and "This permission is obtainable with the following groups: basicuser, user, admin, root, bot".

My Wikipedia userpage is https://en.wikipedia.org/wiki/User:Maxim_Masiutin Maxim Masiutin (talk) 03:34, 10 March 2024 (UTC)[reply]

Maxim Masiutin, you need a minimum of ten edits on that wiki in order to use InternetArchiveBot. You currently have four. Harej (talk) 21:22, 3 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

Month names on ary[edit]

Hello! Is there a way that InternetArchiveBot can use Moroccan Darja month names instead of English ones, on arywiki? If there's a configuration page where I can translate the month names, please let me know. Thanks! Ideophagous (talk) 09:53, 13 March 2024 (UTC)[reply]

Ideophagous, month names are handled within the code, so if you could give us the 12 months in Moroccan Darja we can update the code. Harej (talk) 21:30, 3 April 2024 (UTC)[reply]
Hello @Harej. Please check this json file on arywiki, it has the month names in English (en_name) and their Moroccan Darija equivalents (ary_name). You can ignore the alt_name. Ideophagous (talk) 23:19, 3 April 2024 (UTC)[reply]

Bot (innocently) allowing itself to look rude and arrogant/condescending/entitled[edit]

In case this has already been fixed, I apologize for being behind.
I have no way of knowing whether it has, and/or have not found a place where I would have had.
I guess something might be in the docs, but it has not been obvious or easy to find for me, sorry.

tldr: This could IMHO be fixed without any fuss and for good with a flick of the wrist by just adding a few words at the start of the first paragraph of the bot's message, making it begin with "Internet Archive Bot [Link] here." /tldr

I came across a place ([here https://en.wikipedia.org/wiki/Talk:Aerospace_engineering] and in fact many more) where there is a section, created by this bot, titled "External links modified", followed by an IMHO appropriate greeting, "Hello fellow Wikipedians", followed by a number of very appropriate factual statements, BUT THEN followed by,

"When you have finished reviewing my changes, you may follow the instructions..."

It seems to me that for a reader who, to this point of reading the section (and onwards), is not aware (as content may well be read from top to bottom rather commonly) that they are reading a message generated by a bot, being told rather bluntly that

  • "When you have finished reviewing my changes",
may appear to that reader to have been written by an author with a rather entitled personality and/or behavior, such as to assume that the reader "will" or "has to" review that authors's changes, as though the author were (feeling) entitled to the reader doing so.
It seems to me that this means running a risk of causing a casual reader to
  • be upset
    by what they may well perceive as "this kind of language and behaviour towards" [themselves and the "fellow Wikipedians"],
  • respond badly, such as
  • feeling treated condescendingly and/or
  • now feeling specifically disinclined to "review ... the changes"
thus producing a disservice to
  • the objective of having the changes reviewed by a person
  • peace, quiet and style on WP
  • "... you may follow the instructions ..."
It seems to me that this looks and feels like more of the same, and even more strongly so.
(I know the wording may sound innocent by itself, but it seems to me that it's the context that makes the difference.)
Remark: That part of the wording was not found on the page given above (seems something had been improved in the meantime) (but on a page I don't wish to link to.)

JFTR, that is for sure how I just felt when I had read that passage to that point without realizing I was reading something written by a Bot.

About followup (after a fix has been done) on older pages that still reflect the previous presence of the problem: Would it be historical misrepresentation or maybe just a nice idea to have the Bot occasionally fix (update) the wording it left there when it did, maybe in its free time :) ?

$02c FWIW, HTH -- 93.232.230.13 13:17, 13 March 2024 (UTC)[reply]

Thank you for the feedback. I'd like to note that the bot has largely stopped posting these messages, especially on English Wikipedia. Harej (talk) 21:31, 3 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

Page size limit[edit]

Greetings. I read you plan to increase the limit on the single page tool. I need this to run on a page with about 800 links. When do you plan to increase the limit? SusanLesch (talk) 17:59, 20 March 2024 (UTC)[reply]

Well I figured out a workaround for now. I copied the article to a sandbox in parts, and ran the bot on the parts. SusanLesch (talk) 20:44, 20 March 2024 (UTC)[reply]
There shouldn't be a page size limit on the bot anymore. Are you getting an error? —CYBERPOWER (Chat) 21:33, 3 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

DOI[edit]

For some reason, the bot reported an error in a DOI link (here), the link is http://dx.doi.org/10.2307/597203 and currently works fine (it redirects me to JSTOR). פעמי-עליון (talk) 11:34, 30 March 2024 (UTC)[reply]

Sorry, but I don't see what you are referring to. Please give me a link to a faulty edit that I can review. —CYBERPOWER (Chat) 21:36, 3 April 2024 (UTC)[reply]
The bot reported in this edit that the DOI link has an error (it is obviously not true, DOI links are very stable). I thaught you night want to know about it and find the source of this mistake פעמי-עליון (talk) 19:58, 4 April 2024 (UTC)[reply]
פעמי-עליון, thank you for the report. The URL in question couldn't be found in our URL database (where links that are checked would be found), so I suppose this was a one-off situation. Please let me know if you see anything like this anywhere else. Harej (talk) 20:35, 10 April 2024 (UTC)[reply]
here, as well, two link that are fine. Maybe the problem is with academic papers that are not open-access? פעמי-עליון (talk) 17:19, 14 April 2024 (UTC)[reply]

Encode subject lines of emails from InternetArchiveBot[edit]

When there are non-ASCII characters in an email subject line, the entire subject should be encoded as UTF-8 so that it will display properly for the recipient. I received email from InternetArchiveBot about a submission for the Turkish Wikipedia with "Subject: Bot iÅŸiniz 18485 tamamlandı!" and about one for the Italian Wikipedia with "Subject: La tua attività di bot 16464 è stata completata!" The corresponding text in the body of the message displayed properly, with all the diacritical messages where they should be: La tua attività di bot 16464 è stata completata! I use gmail, so it's possible that gmail is doing something wrong.

This page explains what to do: https://www.telemessage.com/developer/faq/how-do-i-encode-non-ascii-characters-in-an-email-subject-line/ and the service at https://www.sendblaster.com/utf8-email-subject-encoder/ will encode a subject line, one line at a time, so that "Subject: La tua attività di bot 16464 è stata completata!" would become Subject: =?UTF-8?B?TGEgdHVhIGF0dGl2aXTDoCBkaSBib3QgMTY0NjQgw6ggc3RhdGEgY29tcGxldGF0YSEg?= Eastmain (talk) 20:56, 30 March 2024 (UTC)[reply]

Thank you for your report Eastmain. I have filed a bug report. Harej (talk) 21:43, 3 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

False positives and reporting[edit]

The bot appears to mark https://ochem.eu/* pages as dead links. These are not dead: when I visit http://ochem.eu/article/99826, the page redirects and asks me to login, but I can login as a guest and get redirected back to the page I'm looking for. This elaborate double-redirection process may be blocking the site to crawlers and causing the false positives.

I would report this problem through the "report false positive" link, but that appears broken: it says I don't have the "reportfp" privilege, even though that should be available to all users.

Thanks, Bernanke's Crossbow (talk) 05:58, 2 April 2024 (UTC)[reply]

Bernanke's Crossbow, usually when this happens it's because of geo-restrictions affecting our link checker. However, I visited that website with a VPN and the site would not load then either. So the website appears to at least be inaccessible to much of the Internet. Harej (talk) 21:51, 3 April 2024 (UTC)[reply]
Ah. In fact, I just discovered it's even weirder than that: until today, I've only ever visited the site in Firefox's InPrivate mode. I just tried it without InPrivate, and it fails to load then too (but works fine in InPrivate still). They must be doing something very strange with cookies.
Thanks and sorry to have bothered you, Bernanke's Crossbow (talk) 22:24, 3 April 2024 (UTC)[reply]
I set the domain to Subscription so the bot will skip it. -- GreenC (talk) 14:23, 4 April 2024 (UTC)[reply]
Checkmark This section is resolved and can be archived. If you disagree, replace this template with your comment. Harej (talk) 20:21, 10 April 2024 (UTC)[reply]

Azwiki translation[edit]

For Azwiki, en:User:Nemoralis requested the following translation (I can't find it in the translation tables):

"Reformat 1 URL" should be "1 URL yenidən formatlaşdırıldı"

-- GreenC (talk) 14:18, 4 April 2024 (UTC)[reply]

Small fixes in translatable messages[edit]

Hi! I've sent a few trivial message fixes for IABot: https://github.com/internetarchive/internetarchivebot/pulls . Can anyone please review them?

Thanks :) Amir E. Aharoni (talk) 14:59, 4 April 2024 (UTC)[reply]