如何存档整个网站以供离线查看?
事实上,我们已经多次为客户刻录了我们的 asp.net 网站的静态/存档副本。 到目前为止,我们一直使用 WebZip,但我们遇到了无数的崩溃问题,下载的页面无法正确重新链接, (
我们基本上需要一个应用程序来抓取和下载 ASP.NET 网站上所有内容的静态副本(页面、图像、文档、CSS 等),然后处理下载的页面,以便可以在没有互联网连接的情况下在本地浏览它们 摆脱链接中的绝对网址等)。 白痴证明越多越好。 这似乎是一个非常常见且(相对)简单的过程,但我尝试了一些其他应用程序,但并没有留下深刻的印象
有人有他们推荐的存档软件吗? 有人可以分享一个非常简单的过程吗?
We actually have burned static/archived copies of our asp.net websites for customers many times. We have used WebZip until now but we have had endless problems with crashes, downloaded pages not being re-linked correctly, etc.
We basically need an application that crawls and downloads static copies of everything on our asp.net website (pages, images, documents, css, etc) and then processes the downloaded pages so that they can be browsed locally without an internet connection (get rid of absolute urls in links, etc). The more idiot proof the better. This seems like a pretty common and (relatively) simple process but I have tried a few other applications and have been really unimpressed
Does anyone have archive software they would recommend? Does anyone have a really simple process they would share?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
您可以使用 wget:
You could use wget:
在 Windows 中,您可以查看 HTTrack。 它是非常可配置的,允许您设置下载速度。 但您也可以将其指向一个网站并运行它,而无需任何配置。
根据我的经验,它是一个非常好的工具并且效果很好。 我喜欢 HTTrack 的一些特点是:
In Windows, you can look at HTTrack. It's very configurable allowing you to set the speed of the downloads. But you can just point it at a website and run it too with no configuration at all.
In my experience it's been a really good tool and works well. Some of the things I like about HTTrack are:
Wayback Machine Downloader 作者:hartator 简单且快速。
通过 Ruby 安装,然后使用 Internet Archive 中的所需域和可选时间戳运行。
The Wayback Machine Downloader by hartator is simple and fast.
Install via Ruby, then run with the desired domain and optional timestamp from the Internet Archive.
我在 OSX 和 Windows 上的 WebCopier。
I use Blue Crab on OSX and WebCopier on Windows.
wget -r -k
...并研究其余选项。 我希望您遵循以下准则:http://www.w3.org /Protocols/rfc2616/rfc2616-sec9.html 因此,您的所有资源对于 GET 请求都是安全的。
wget -r -k
... and investigate the rest of the options. I hope you've followed these guidelines:http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html so all your resources are safe with GET requests.
我只是使用:
wget -m
。I just use:
wget -m <url>
.如果您的客户因合规性问题而存档,您希望确保内容可以经过身份验证。 列出的选项适合简单查看,但在法律上不可接受。 在这种情况下,您正在寻找时间戳和数字签名。 如果你自己做的话就复杂得多。 我建议使用 PageFreezer 等服务。
If your customers are archiving for compliance issues, you want to ensure that the content can be authenticated. The options listed are fine for simple viewing, but they aren't legally admissible. In that case, you're looking for timestamps and digital signatures. Much more complicated if you're doing it yourself. I'd suggest a service such as PageFreezer.
对于 OS X 用户,我发现 此处 找到的 siteucker 应用程序运行良好,无需配置任何内容,只需配置深度即可遵循链接。
For OS X users, I've found the sitesucker application found here works well without configuring anything but how deep it follows links.
我已经使用 HTTrack 好几年了。 它可以很好地处理所有页面间链接等。 我唯一的抱怨是我还没有找到一个很好的方法来将其限制在子站点上。 例如,如果我想要存档一个网站 www.foo.com/steve,它可能会跟踪 www.foo.com/rowe 的链接并对其进行存档。 否则就太好了。 高度可配置且可靠。
I've been using HTTrack for several years now. It handles all of the inter-page linking, etc. just fine. My only complaint is that I haven't found a good way to keep it limited to a sub-site very well. For instance, if there is a site www.foo.com/steve that I want to archive, it will likely follow links to www.foo.com/rowe and archive that too. Otherwise it's great. Highly configurable and reliable.