Talk:WebCite

From Meta, a Wikimedia project coordination wiki

Rephrasing[edit]

I have done some rewording of the proposal. I have tried not to change the meaning of what was said, because of course those people who previously expressed support did so for the proposal as it then stood, and not for the slightly different proposal that I might wish were in its place. Still, anyone wishing to revert any or all of the minor changes I made is of course welcome to do so. -- Hoary Returns (talk) 04:34, 10 February 2013 (UTC)[reply]

There is a rude, hostile feel to this whole thing, with the implied assumption in the proposal and most comments that WebCite will welcome or have no say in being taken over by Wikimedia. --2A01:4F8:140:1046:0:0:0:2 22:27, 28 February 2013 (UTC)[reply]

WMF and Internet Archive[edit]

It seems unlikely that the WMF sets up a sort of competitor for the Internet Archive: has someone asked WMF people to contact them to see what can be done with archive-it or if they're going to do anything about WebCite? There are some differences in policies (for instance IA hides stuff from public view if the domain owner asks in their robots.txt, even if it's not the owner of the content being hidden), but they are the most qualified people to explain them anyway. --Nemo 14:27, 17 March 2013 (UTC)[reply]

Resource allocation and sister project proposals[edit]

I looked at WebCite and was surprised to find no mention of Erik Möller or Sue Gardner. I looked at User talk:Eloquence and User talk:Sue Gardner and was surprised to find no mention of WebCite. This is a little strange. Taking on a new sister project is a lot of work. That costs time and money, both of which are already pretty constrained by the twelve current Wikimedia projects (and by twelve current projects, I mean Wikipedia). And of course with adoption of a project, the upfront cost (getting everything transitioned, up-to-date, stable) is likely to be higher than the maintenance cost (keeping the site up and running, basically), so that adds to the uncertainty and anxiety of adding another project. You have to either convince one of these two people of adopting a new project (so that they can assign the necessary engineering resources, after all, these are all big Web sites we're talking about) or convince the Board to pass a resolution mandating a new project. There may be some other avenues to pursue, but as it is, I can't see this proposal going anywhere. Maybe a grant. --MZMcBride (talk) 08:00, 24 March 2013 (UTC)[reply]

Well, the WMF claimed that new sister projects have been considered by the WMF because they had some consensus traction,[1] so people are trying to get traction. WMF also claimed that the costs for setting up Wikivoyage were negligible.[2] --Nemo 10:22, 24 March 2013 (UTC)[reply]
I imagine the cost of adding an additional set of MediaWiki wikis is negligible compared to setting up and maintaining the infrastructure of this proposed service. :-) --MZMcBride (talk) 07:31, 25 March 2013 (UTC)[reply]

Proposal superseded[edit]

More on #WMF and Internet Archive: this proposal appears to be superseded, see Fixing Broken Links on the Internet (posted on October 25, 2013 by Alexis Rossi):

We have started crawling the outlinks for every new article and update as they are made – about 5 million new URLs are archived every day. Now we have to figure out how to get archived pages back in to Wikipedia to fix some of those dead links.

--Nemo 14:21, 28 October 2013 (UTC)[reply]

Ping to User:Sj. Biosthmors (talk) 14:21, 29 October 2013 (UTC)[reply]
how about links that ARE ALREADY DEAD and ALREADY saved on WebCite? will that links be lost or what? (Idot (talk) 14:22, 29 October 2013 (UTC))[reply]
upd: can't open your link http://blog.archive.org/2013/10/25/fixing-broken-links => so unsure that solution that you proposed will work at all (Idot (talk) 14:26, 29 October 2013 (UTC))[reply]
That would destroy the brand value of WebCite, so I wouldn't worry about that. They should be back soon if they're not visible still, I would anticipate. Biosthmors (talk) 14:23, 29 October 2013 (UTC)[reply]
The link works for me, for whatever it's worth. Biosthmors (talk) 14:28, 29 October 2013 (UTC)[reply]
The link worked for me, too. The four problems with Wayback compared to WebCite are:
  • there is no on-demand processing, editors have to search, see if it exists in a state comparable to the live link, and if it does not, just can't use it
  • unreliable -- pages often don't appear as they originally were, pages will go offline for a while, usually DB errors
  • slow -- loading links often takes a minute or so
  • limited metadata; the database does not contain searchable matter such as title, author, publication date
I find it surprising, given that they are reasonably well funded, whereas WebCite runs on a shoestring. If the above problems could be resolved, then Wayback would work well as a go-to source for WP editors' archiveurl info. Note that Jimbo went on record as also supporting a grant, as discussed above, rather than making it an outright WMF project, also as discussed above. Dovid (talk) 18:23, 29 October 2013 (UTC)[reply]
Idot, oh, sure, there might be some. Again, Legoktm's bot mentioned in the Internet Archive blog could also help identify how many such links (dead, not archived by IA but available on WebCite) there are. Edit: I just found out a weeks-old discussion about RotlinkBot finding copies of dead links on IA in 3/4 of cases on en.wiki; there were some problems about it linking yet another archival service, but now IA has full coverage so no more such headaches; good news is that they say such a bot takes only a few lines of code.
Dovid, your comment seems not updated: there is on-demand archiving now. On "unreliable" and "slow", I'm unable to comment; I'd need some concrete data.
The rest seems OT. As for the metadata, it's true but it's not among the features we really need for Wikimedia projects. Finally, on funding: true, turns out that crawling the whole web is an expensive thing, making it full-text searchable even more[3] (I'm sure you know how big Google is compared to the Internet Archive). To me, this seems one more reason to avoid wasting money on scattered initiatives with unclear cost-effectiveness, and rather partner with the established and stable Internet Archive. --Nemo 08:46, 30 October 2013 (UTC)[reply]
A decade later, the reliability issues of the Wayback Machine appear to be under control. Pages are usually visible immediately after archival or with a delay of a few minutes. But there is still limited searchability. URL prefixes of a domain can be searched, but not titles. Also, query strings (?&) can not be searched for prefix as of writing.
However, the Wayback Machine's ability to handle pages heavily dependent on JavaScript, such as YouTube watch pages since 2017 when they switched from static HTML to the polymer JS framework, has improved.
If we took over WebCite, a possibility would be renaming it to "WikiWayback", which would be more descriptive, since "Wayback" is a more familiar term to Internet users. It could be interpreted as a "wayback machine for use in wikis". However, it could also cause confusion, since "WikiWayback" would not be a wiki itself, but made for wikis.
An existing MediaWiki skin could be repurposed for the site layout, like Ghost Archive did with the "timeless" skin. The top buttons could be turned into "previous" and "next" buttons ("← previous (date, time)"; "next (date,time) →"; "View history"), and the layout of the revision history can be re-used. Elominius (talk) 12:02, 12 March 2023 (UTC) - last modified 14:20, 12 March 2023 (UTC)[reply]

You can see some of IA's thoughts on future linkrot protection at mw:Archived_Pages. user:Legoktm has been working on a bot that can help automatically convert deadlinks to archival links. IA's service has been upgraded over the past year to address most of the issues Dovid mentions, at scale. If we have a specific proposal for a metadata API, and examples of how it is useful, I suspect they could implement one as well.

Medium term: I would like to see WebCite adopted by IA - it is directly in line with their mission; they can give it some recognition and preserve its current URL schema, as they have done with many other sites whose maintainers have transferred them over. (Keeping old URLs from breaking, not just archiving the web, is something that IA actively promotes.) This is something a small WM grant could help enable, as an active user of their service. SJ talk  20:08, 30 October 2013 (UTC)[reply]

According to w:WebCite, they already feed their data to IA. On the stats I asked above: for Wayback we have https://archive.org/stats/wb.php?tz=UTC#60d , it doesn't seem so bad; do we have open stats for WebCite too? --Nemo 07:59, 1 November 2013 (UTC)[reply]

update?[edit]

Where can I find the latest discussions of this issue? Agradman (talk) 06:57, 18 January 2014 (UTC)[reply]

It's all here. --Nemo 17:33, 18 January 2014 (UTC)[reply]
The immediate issue seems to have been resolved seeing that it's 2014 and WebCite is still functioning and without and shutdown/fundraising banners. Fredlyfish4 (talk) 04:54, 19 January 2014 (UTC)[reply]