Jump to content

Talk:Community Tech/Migrate dead external links to archives

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 8 years ago by DannyH (WMF) in topic WikiCache

WikiCache

[edit]

Martix made a similar proposal at WikiCache. Perhaps the ideas they drafted there could be used to guide the development of this project. Blue Rasberry (talk) 15:45, 16 May 2016 (UTC)Reply

Oh, thanks for pointing that out. I'll go and reply to him. -- DannyH (WMF) (talk) 20:42, 17 May 2016 (UTC)Reply

Adding query against alternative archive if not present in Wayback Machine due to robots.txt or other reasons

[edit]

Websites with robots.txt restrictions will not be captured by the Internet Archive's global Wayback crawls, and even content captured in the past from a given host will not be displayed if/when robots.txt restrictions are added. For this project, how often do the dead links not have corresponding versions in the Wayback Machine? If this happens a non-trivial amount, could be good to subsequently check against and/or Memento (http://timetravel.mementoweb.org/) or (more narrowly) Archive-It (wayback.archive-it.org); these archives may contain captures irrespective of robots.txt restrictions.