人们如何从 Google 缓存下载网站?
一位朋友不小心删除了他的论坛数据库。 这通常不会是一个大问题,除了他忽略执行备份这一事实。 2年的内容就这么消失了。 显然,他已经吸取了教训。
然而,好消息是,即使个别网站所有者是白痴,谷歌也会保留备份。 坏消息是,传统的爬行机器人会在网站的 Google Cache 版本上卡住。
是否有任何现有的东西可以帮助搜索 Google 缓存,或者我将如何开始自己的缓存?
A friend accidentally deleted his forum database. Which wouldn't normally be a huge issue, except for the fact that he neglected to perform backups. 2 years of content is just plain gone. Obviously, he's learned his lesson.
The good news, however, is that Google keeps backups, even if individual site owners are idiots. The bad news is, that traditional crawling robots would choke on the Google Cache version of the website.
Is there anything existing that would help trawl the Google Cache, or how would I go about rolling my own?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可能还需要考虑爬行 archive.org 缓存。 如果你在那里,通常结构会更好。
You may want to consider looking at crawling the archive.org cache as well. If you're in there, it's generally better structured.
如果网站足够小,您可以手动抓取,则可以使用此用户脚本来无缝导航 Google 缓存很有用。
If the website is small enough that you can crawl it manually, this userscript to seamlessly navigate Google's cache is very useful.