HTML::TableExtract 的 Java 等效项

发布于 2025-01-05 18:21:10 字数 943 浏览 0 评论 0原文

任何人都可以建议一个类似于 perl 模块的 Java 库 HTML ::表提取。该模块的显着特点之一是它可以帮助用户通过使用深度和计数轻松识别嵌套表。我使用了一些库，如 JSoup、HTML 解析器和 HTML Unit。但到目前为止，我还没有任何接近 HTML::TableExtract 模块的东西。那么这里有人可以建议我一些与 Java 中的这个模块等效的模块吗？因为我想做的是在每个表中搜索关键字，如果存在，我将提取表。对于嵌套表，我只想提取作为关键字的表，而不是父表。预先感谢

@Łukasz Rżanek，请考虑下面的 HTML 代码

<html>
<table border=3 cellpadding=10>
<tr>
  <td valign=top>
    Content 1
  </td><td>
    Content 2<p>

    <table border=1>
    <tr>
      <td>Content 3</td>
      <td>Content 4</td>
    </tr><tr>
      <td>Content 5</td>
      <td>Content 6</td>
    </tr>
    </table><p>

    Content 7
  </td>
</tr>
</table>
</html>

这里我只想提取包含内容 3 的表。我如何使用 JSoup 来做到这一点？

原文

Can anyone please suggest a library for Java which is similar to perl module HTML::TableExtract. The one of the noted feature of this module is it can helps the user to identify the nested tables easily by using depth and count. I used some libraries like JSoup, HTML parser and HTML Unit. But till now I don't anything close to the HTML::TableExtract module. So can anyone here please suggest me some equivalent to this module in Java if any? Because what i am trying to do is searching for the keyword in every tables and if it is present i am going to extract table. In case of nested tables, i want to extract only the table that as the keyword not the parent table also. Thanks in advance

@Łukasz Rżanek please consider the below HTML code

<html>
<table border=3 cellpadding=10>
<tr>
  <td valign=top>
    Content 1
  </td><td>
    Content 2<p>

    <table border=1>
    <tr>
      <td>Content 3</td>
      <td>Content 4</td>
    </tr><tr>
      <td>Content 5</td>
      <td>Content 6</td>
    </tr>
    </table><p>

    Content 7
  </td>
</tr>
</table>
</html>

Here I want to extract only the table which contains Content 3. How can I do that using JSoup?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

能否归途做我良人 2025-01-12 18:21:10

您可能需要添加一些额外的逻辑。

检查&让我知道这是否是您想要的？

 /**
 * @param args the command line arguments
 */
public static void main(String[] args) throws IOException {
    // TODO code application logic here
    Document doc = Jsoup.parse(new File("**//HTML File Path//**"), "utf-8");
    Elements elements = doc.getElementsContainingOwnText("**//Text to Search//**");
    for(Element element : elements) {
        Element table = getImmediateParentTable(element);
        System.out.println(table.outerHtml());
    }
}

public static Element getImmediateParentTable(Element element) {
    while(element.tagName().equals("table")) {
        return element;
    }
    return getImmediateParentTable(element.parent());
}

You might need to add some additional logic.

Check & let me know if this is what you wanted?

 /**
 * @param args the command line arguments
 */
public static void main(String[] args) throws IOException {
    // TODO code application logic here
    Document doc = Jsoup.parse(new File("**//HTML File Path//**"), "utf-8");
    Elements elements = doc.getElementsContainingOwnText("**//Text to Search//**");
    for(Element element : elements) {
        Element table = getImmediateParentTable(element);
        System.out.println(table.outerHtml());
    }
}

public static Element getImmediateParentTable(Element element) {
    while(element.tagName().equals("table")) {
        return element;
    }
    return getImmediateParentTable(element.parent());
}

回复收藏 0 原文

~没有更多了~