人们如何从 Google 缓存下载网站?

发布于 2024-07-10 14:53:28 字数 218 浏览 12 评论 0原文

一位朋友不小心删除了他的论坛数据库。 这通常不会是一个大问题,除了他忽略执行备份这一事实。 2年的内容就这么消失了。 显然,他已经吸取了教训。

然而,好消息是,即使个别网站所有者是白痴,谷歌也会保留备份。 坏消息是,传统的爬行机器人会在网站的 Google Cache 版本上卡住。

是否有任何现有的东西可以帮助搜索 Google 缓存,或者我将如何开始自己的缓存?

A friend accidentally deleted his forum database. Which wouldn't normally be a huge issue, except for the fact that he neglected to perform backups. 2 years of content is just plain gone. Obviously, he's learned his lesson.

The good news, however, is that Google keeps backups, even if individual site owners are idiots. The bad news is, that traditional crawling robots would choke on the Google Cache version of the website.

Is there anything existing that would help trawl the Google Cache, or how would I go about rolling my own?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

随梦而飞# 2024-07-17 14:53:28

您可能还需要考虑爬行 archive.org 缓存。 如果你在那里,通常结构会更好。

You may want to consider looking at crawling the archive.org cache as well. If you're in there, it's generally better structured.

司马昭之心 2024-07-17 14:53:28

如果网站足够小,您可以手动抓取,则可以使用此用户脚本来无缝导航 Google 缓存很有用。

If the website is small enough that you can crawl it manually, this userscript to seamlessly navigate Google's cache is very useful.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文