HTML::TableExtract 的 Java 等效项
任何人都可以建议一个类似于 perl 模块的 Java 库 HTML ::表提取。该模块的显着特点之一是它可以帮助用户通过使用深度和计数轻松识别嵌套表。我使用了一些库,如 JSoup、HTML 解析器和 HTML Unit。但到目前为止,我还没有任何接近 HTML::TableExtract 模块的东西。那么这里有人可以建议我一些与 Java 中的这个模块等效的模块吗?因为我想做的是在每个表中搜索关键字,如果存在,我将提取表。对于嵌套表,我只想提取作为关键字的表,而不是父表。预先感谢
@Łukasz Rżanek,请考虑下面的 HTML 代码
<html>
<table border=3 cellpadding=10>
<tr>
<td valign=top>
Content 1
</td><td>
Content 2<p>
<table border=1>
<tr>
<td>Content 3</td>
<td>Content 4</td>
</tr><tr>
<td>Content 5</td>
<td>Content 6</td>
</tr>
</table><p>
Content 7
</td>
</tr>
</table>
</html>
这里我只想提取包含内容 3 的表。我如何使用 JSoup 来做到这一点?
Can anyone please suggest a library for Java which is similar to perl module HTML::TableExtract. The one of the noted feature of this module is it can helps the user to identify the nested tables easily by using depth and count. I used some libraries like JSoup, HTML parser and HTML Unit. But till now I don't anything close to the HTML::TableExtract module. So can anyone here please suggest me some equivalent to this module in Java if any? Because what i am trying to do is searching for the keyword in every tables and if it is present i am going to extract table. In case of nested tables, i want to extract only the table that as the keyword not the parent table also. Thanks in advance
@Łukasz Rżanek please consider the below HTML code
<html>
<table border=3 cellpadding=10>
<tr>
<td valign=top>
Content 1
</td><td>
Content 2<p>
<table border=1>
<tr>
<td>Content 3</td>
<td>Content 4</td>
</tr><tr>
<td>Content 5</td>
<td>Content 6</td>
</tr>
</table><p>
Content 7
</td>
</tr>
</table>
</html>
Here I want to extract only the table which contains Content 3. How can I do that using JSoup?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可能需要添加一些额外的逻辑。
检查&让我知道这是否是您想要的?
You might need to add some additional logic.
Check & let me know if this is what you wanted?