Talk:Community Wishlist Survey 2015/Status report 1

From Meta, a Wikimedia project coordination wiki

Task #1, dead links and wayback machine[edit]

Please DON'T. The wayback machine is a dead end for content. The vast majority of dead links are dead only because the hoster reorganized their website and the content is still somewhere else online. Over the last weeks, I repaired far more than hundred "dead links" on deWP, and the huge majority could be updated to a new URL. Once a dead link is changed to the wayback machine, no one will ever bother to check for the content on the live web. --h-stt !? 13:17, 22 January 2016 (UTC)[reply]

One of the things we're thinking about is how to have human and automated link-repairing work together. It is good when a person finds where the link has moved to, but there are a few reasons why that may be difficult to depend on.
For one thing, if the hoster reorganized their website once, they can do it again. The link that you repair today could go dead a year from now, and it would have to be fixed again. The Wayback Machine is a permanent link that will always have the page content.
Also, as a site changes, the information on the pages might change. Citation templates list the date that the page was accessed, but you can't actually go back and check what was on that page on that date -- except with the Wayback Machine, where you can.
But most importantly, there are just too many links for humans to check. There are currently more than 113,000 pages on English Wikipedia that are marked with the Dead link template. A few weeks ago, that was over 130,000, and Cyberbot is responsible for fixing the majority of the ~17,000 pages that were fixed. And that's just the articles where an editor's noticed that the link is dead -- there are many more that we don't know about yet.
So when we think about scaling up -- fixing every link on every project in every language, both now and a month from now -- I don't think it's feasible to expect human editors to be able to do that entirely by hand.
That being said -- I'm still thinking and learning about this, and I'm sure there are things I don't know yet. What do you think? -- DannyH (WMF) (talk) 20:43, 22 January 2016 (UTC)[reply]
I wonder if such a thing can be posted in Phabricator somewhere. Anyhow, I do sometimes repair dead links and then I look for the non-URL content of the citation template - citation templates typically contain more than the URL such as e.g the author, press agency, title of article. These can be helpful both when gauging whether an URL is truly broken and to find replacements.Jo-Jo Eumerus (talk) 20:52, 22 January 2016 (UTC)[reply]
Jo-Jo Eumerus: The tracking ticket for this project is T120433, and the current investigation ticket is T120850. -- DannyH (WMF) (talk) 19:25, 25 January 2016 (UTC)[reply]
While the dead link bot is a good thing for small wikis, where the community does not act on dead links, in the case of german wikipedia there is a bot (GiftBot) active, that does something different: Dead links are marked on the talk page and an archive link is proposed, so that a human editor can decide, what to do (use a new link, delete the link, use the provided archive link, use an archive link, but a different archived version). If the international bot gets activated there will be a conflict with the two approaches. How will this be solved? Also this is limited to one archive provider, there are other archives! --° (Gradzeichen) 06:56, 23 January 2016 (UTC)[reply]
I'm can't speak for small projects with a small workforce. And we will not be able to repair each and every dead link at deWP. But I strongly oppose a bot replacing them by links to the wayback machine. Oh, and I'm certainly not alone with that. Please expect resistance at deWP if you plan to roll out your bot there without extensive discussion. --h-stt !? 13:30, 23 January 2016 (UTC)[reply]
We're certainly not here to impose unwanted things on the communities. There's a reason why we work on community requests. If we have a bot doing a thing and German Wikipedia doesn't want it, there's no reason for that bot to be active on German Wikipedia. Several of the largest Wikimedia wikis already have bots doing this to some extent, though, and a lot of the support for this came from users active on language versions with a lot of editors, so I'd argue this is not a case of big versus small wikis. /Johan (WMF) (talk) 21:52, 24 January 2016 (UTC)[reply]
Yeah, just to chime in on Johan's point -- here's the proposal that people voted for. There were some people objecting to the idea, or suggesting that we do something different. But the overall tone was very positive, and this got the most support votes in the survey.
As Johan said, if a wiki decides not to use this, then we don't have to enable it on that wiki, and we're interested in finding out about other possible solutions to the problem. Thanks for telling us about GiftBot, we hadn't looked at that yet. I'll go learn more about what it does. -- DannyH (WMF) (talk) 19:21, 25 January 2016 (UTC)[reply]
All right. My main issue is with an indiscriminate bot run across projects, without community consultation. If you are aware of that I see no problem with the mere development of such a bot (not that it should take more than an afternoon for anyone with at least some skills). --h-stt !? 16:25, 26 January 2016 (UTC)[reply]
As a person not working on the technical side of this, so I'm not defending myself here, I wonder if it isn't possible that you might slightly underestimate the difficulties involved in some parts of this, such as bot recognition of whether a page has been moved or not on sites that don't put up a standard 404 or making sure it's easy to adapt to all languages with different editing cultures and templates. (:
But if German Wikipedia takes a look at a bot we've developed and decides it doesn't to run it, we're certainly not going to try to force you. The point of the Community Tech team is to help the communities by working on the things they've requested. /Johan (WMF) (talk) 13:44, 28 January 2016 (UTC)[reply]
User:Johan (WMF): It's not like you have to invent the wheel again. Try checking out Giftbot, who does that task at deWP with the plugins dwl*.{sh,tcl} --h-stt !? 16:04, 28 January 2016 (UTC)[reply]

As far as I know the current development uses one archive. While this may be the best, it is not the only one available. As any such service can go out of service at any time for any reason without prior notice, I would think it to be a good idea not to rely on a single service but to use more of them. I am aware, that the author of the bot has connections to the archive in question, but this shouldn't prevent the additional use of other services.

Here my question: What is the reason to use only one archive?

--° (Gradzeichen) 18:25, 16 February 2016 (UTC)[reply]

Our work on this project is essentially supporting and extending the work that Cyberpower678 is doing with Cyberbot II on English WP. The Internet Archive has been really involved in the project, working on their APIs and tracking the results. IA has also developed some advanced dead link detection code that we're going to start building into the bot. In general, Internet Archive is a good fit with Wikimedia, because it's a non-profit with a very similar mission and values. They've been around for twenty years, and I don't think they're going anywhere.
That being said -- French WP has a gadget that's on by default for logged-in users, which shows links to Wikiwix, another archiving service. We've been talking to Wikiwix this week, to learn more about their service.
One of our important goals for this project is to help wiki communities outside of English WP to set up dead link detection and archiving bots, using the archiving service that each community wants. We're setting up a centralized logging service on Tool Labs to help dead link archiving bots keep track of what they've fixed, and it's going to be available to any bot, using any archive service. We're planning to provide modular code and documentation to bot writers on all WPs, which can be modified and extended to use the templates, policies and preferred archives for each language.
Is there another archive that you think we should look at, or talk to? -- DannyH (WMF) (talk) 19:05, 17 February 2016 (UTC)[reply]

Task #3 : central global repository for templates, gadgets and Lua modules[edit]

Hi everybody. Just for giving you an exemple of that we can do, with Molarus we are working to do on d:Module:Cycling race an unique program that give functions. One permits to display a table that list stages, the other list the palmares of a cycling race when datas are entered on Wikidata. Since the month, we use this program on around ten Wikipedias and wa are preparing on the same model a common infobox. This projects also permits to build bridges between differents users that don't speak the same language.

The solution is not to try to adapt former infoboxes, we must create new infoboxes that take their datas on Wikidata and have a good documentation translated. Jérémy-Günther-Heinz Jähnick (talk) 13:01, 27 January 2016 (UTC)[reply]