对于没有子元素和 COLSPAN 属性的 TD 元素,Jsoup 选择器是什么?

发布于 2025-01-06 03:06:44 字数 297 浏览 4 评论 0原文

所以我试图解析一个相对混乱的网页。它包含我想要提取的几个键值对。这些对的统一主题是它们非空、没有子项并且没有 COLSPAN 属性。这是我尝试过的,这在逻辑上似乎有意义,但没有产生任何结果。

Elements tds = document.select("td:not([colspan]):not(:has(*))");

所以我想要 TD:

  1. 不包含 COLSPAN
  2. 没有任何孩子

看起来我一定很接近,但只是没有任何运气。有什么想法吗?

So I'm trying to parse through a web page that is relatively messy. It contains several key-value pairs that I would like to extract. The unifying theme of these pairs is that they are non-empty, they have no children, and they do not have a COLSPAN attribute. Here's what I've tried, which seems to make sense logically but does not yield any results.

Elements tds = document.select("td:not([colspan]):not(:has(*))");

So I want TDs that:

  1. Do not contain COLSPAN
  2. Do not have any children

Seems like I must be close, but just not having any luck. Any thoughts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

星星的轨迹 2025-01-13 03:06:44

我想出了一个答案,它使用循环来删除那些您不想选择的元素。

http://jsoup.org/apidocs/org/jsoup/select/Selector.html

我模拟了一个表格,其中包含您试图避免选择的两种情况。

    String html = 
    "<table>" +
            "<thead><tr><th>Col1</th><th>Col2</th><th>Col3</th></tr></thead>" +
            "<tbody>" +
                "<tr><td>row1col1</td><td>row1col2</td><td>row1col3</td></tr>" +
                "<tr><td colspan='3'>row2fullrow</td></tr>" +
                "<tr><td></td><td>row3col2</td><td><strong>row3col3</strong></td></tr>" +
                "<tr><td>row4col1</td><td colspan='2'><strong>row4col2and3</strong></td></tr>" +
            "</tbody>" +
    "</table>";

    Document doc = Jsoup.parse(html);
    for(Element td : doc.select("td")) {
        if (td.children().size() > 0 || td.hasAttr("colspan")) {
            td.remove();
        }
    }
    System.out.println(doc);

+++++++++++++++++++++++++
更新
+++++++++++++++++++++++++
我又玩了一下,然后想出了这个(这证明你的选择确实有效)。你的 HTML 一定有一些我无法用我的来表示的其他小东西。

    String html = 
    "<table>" +
            "<thead><tr><th>Col1</th><th>Col2</th><th>Col3</th></tr></thead>" +
            "<tbody>" +
                "<tr><td>row1col1</td><td>row1col2</td><td>row1col3</td></tr>" +
                "<tr><td colspan='3'>row2fullrow</td></tr>" +
                "<tr><td></td><td>row3col2</td><td><strong>row3col3</strong></td></tr>" +
                "<tr><td id='x'>row4col1</td><td colspan='2'><strong>row4col2and3</strong></td></tr>" +
            "</tbody>" +
    "</table>";

    Document doc = Jsoup.parse(html);
    System.out.println(doc.select("td:not([colspan]):not(:has(*))"));

I came up with an answer that uses a loop to remove those elements that you don't want to select.

http://jsoup.org/apidocs/org/jsoup/select/Selector.html

I mocked up a table that has the two situations you are trying to keep out of your select.

    String html = 
    "<table>" +
            "<thead><tr><th>Col1</th><th>Col2</th><th>Col3</th></tr></thead>" +
            "<tbody>" +
                "<tr><td>row1col1</td><td>row1col2</td><td>row1col3</td></tr>" +
                "<tr><td colspan='3'>row2fullrow</td></tr>" +
                "<tr><td></td><td>row3col2</td><td><strong>row3col3</strong></td></tr>" +
                "<tr><td>row4col1</td><td colspan='2'><strong>row4col2and3</strong></td></tr>" +
            "</tbody>" +
    "</table>";

    Document doc = Jsoup.parse(html);
    for(Element td : doc.select("td")) {
        if (td.children().size() > 0 || td.hasAttr("colspan")) {
            td.remove();
        }
    }
    System.out.println(doc);

+++++++++++++++++++++++
UPDATE
+++++++++++++++++++++++
I played around with it a little more and came up with this (which proves your select does work). Your HTML must have some other little thing that I don't represent with mine.

    String html = 
    "<table>" +
            "<thead><tr><th>Col1</th><th>Col2</th><th>Col3</th></tr></thead>" +
            "<tbody>" +
                "<tr><td>row1col1</td><td>row1col2</td><td>row1col3</td></tr>" +
                "<tr><td colspan='3'>row2fullrow</td></tr>" +
                "<tr><td></td><td>row3col2</td><td><strong>row3col3</strong></td></tr>" +
                "<tr><td id='x'>row4col1</td><td colspan='2'><strong>row4col2and3</strong></td></tr>" +
            "</tbody>" +
    "</table>";

    Document doc = Jsoup.parse(html);
    System.out.println(doc.select("td:not([colspan]):not(:has(*))"));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文