有什么好的网络爬虫可以下载 HTML 页面?
我正在寻找一个网络爬虫/蜘蛛来下载各个页面。支持此功能的好产品(最好是免费的)是什么?
I am looking for a web crawler/spider to download individual pages. What is a good (preferably free) product that supports this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
wget
或curl
浮现在脑海中。您的具体要求是什么?您需要递归抓取页面,还是只下载特定的 URL?wget
两者都可以。wget
orcurl
come to mind. What exactly are your requirements? Do you need to recursively crawl pages, or just download specific URLs?wget
can do both.我会选择 WGET www.gnu.org/s/wget/
I'd go for WGET www.gnu.org/s/wget/
如果您想下载一个洞网站,请尝试wget。它具有递归下载的功能。如果您需要操作标头并且只下载一些小文件,请尝试 curl (或 wget)。
如果您需要并行下载大文件等功能,我建议您使用 aria2。
If you want to download a hole website then give wget a try. It has features to download recursively. If you need to manipulate headers and only download a few small files try curl (or wget).
Should you need features like parallel downloading huge files I would suggest aria2.
开源爬虫列表: http://en.wikipedia.org/wiki/Web_crawler#开源爬虫
A list of open source crawlers: http://en.wikipedia.org/wiki/Web_crawler#Open-source_crawlers