HtmlUnit 与 HttpUnit 的性能比较
我想编写一个支持cookie存储操作和会话的爬虫。 java headless browser 有两种不同的实现。 HtmlUnit 对 javascript 甚至 html 解析有更好的支持。但是为了爬虫的性能有什么理由使用HttpUnit吗?
I would like to write crawler which supports cookies storing operation and sessions. There are two different implementations of java headless browser. HtmlUnit has better support of javascript and perhaps html parsing. But is there are any reason to use HttpUnit for performance of crawler?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
此处有一篇相关文章,来自 HtmlUnit 之一开发商。
它基本上是说,除了 Javascript 支持之外,HtmlUnit 比 HttpUnit 更高级别。 HtmlUnit 似乎也得到了更积极的开发(2014 年发布了 2 个版本,而 HttpUnit 自 2008 年以来就没有更新过)。
There is a relevant article here, from one of the HtmlUnit developers.
It basically says that, apart from Javascript support, HtmlUnit is more high level that HttpUnit. HtmlUnit also seems to be more actively developed (2 releases in 2014 while HttpUnit has not been updated since 2008).