Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 10 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(2)
有十几个用 Java 编写的屏幕抓取库。仅举几例:
还有更多用 Java 编写的 HTML 屏幕抓取工具< /a>.但正如我在 之前的回答。但这对您来说可能不是问题。
以防万一,也许可以查看线程Nokogiri pure Java状态。
更新:一个新项目已经发布(2010-01-31),jsoup ,它提供了选择器语法来查找元素。请参阅其网站了解更多详细信息和/或作者的此答案 。
There are dozen of screen scraping library written in Java. Just to cite a few :
And many more at HTML Screen Scraping Tools written in Java. But these are IMO the best to deal with any kind of content (understand all kind of crap) as I mentioned in this previous answer. This might not be an issue for you though.
Just in case, maybe check out the thread Nokogiri pure Java status.
Update: A new project has been released (the 2010-01-31), jsoup, which offers a selector-syntax to find elements. See its website for more details and/or this answer from its author.
您可以通过 jRuby 使用 hpricot。有关详细信息,请参阅这个问题。
You could use hpricot through jRuby. See this SO question for more details about it.