有哪些好的 Java 库可以用于搜索和从网页中抓取数据。

发布于 2024-11-27 06:47:40 字数 447 浏览 0 评论 0原文

有哪些好的开源 Java 库可以从网页中搜索和抓取数据并将其粘贴到数据库中。例如，假设我有一个页面，例如：

<tr><td><b>Address:</b></td>
<td colspan=3>123 My Street        </td></tr>

“地址：”是关键，但我实际上试图获取“123 My Street”，其中有一堆 html 标签和中间的空格。理想情况下，我想获取字符串“Address:”后面的 td 之间的值。似乎 JSoup 可以进行查找，但我没有看到关于如何进行偏移的好示例（我可能错过了）。是否有处理键/值的库？

我还有兴趣了解任何类似于 Kapow Extraction Browser 的 UI 脚本开源 (MIT/Apache) 计划。

谢谢。

原文

What are some good open source java libraries to search and scrape data out of a web page and stick it into a database. For example, suppose I had a page such as:

<tr><td><b>Address:</b></td>
<td colspan=3>123 My Street        </td></tr>

"Address:" is the key, but I'm actually trying to get "123 My Street" which has a bunch of html tags and spaces in between. Ideally I want to get the value between the td that follows the string "Address:". It seems like JSoup can do the find, but I didn't see a good example on how to do the offset (I may have missed it). Is there a library that handles key/value?

I'd also be interested in learning about any open source (MIT/Apache) initiatives for UI scripting similar to the Kapow Extraction Browser.

Thanks.

分享到QQ

分享到微博