GPath 查找表头是否包含匹配的字符串

发布于 2025-01-05 09:01:26 字数 1195 浏览 0 评论 0原文

我正在使用 NekoHTML 解析器将 HTML 文件解析为格式良好的 XML 文档。但是我无法完全弄清楚 GPath，以便我可以识别具有“Settings”字符串的表。

def parser = new org.cyberneko.html.parsers.SAXParser()
parser.setFeature('http://xml.org/sax/features/namespaces', false)

    def html = 
    ''' 
        <html>
            <title>Hiya!</title>
        </html>
        <body>
            <table>
                <tr>
                    <th colspan='3'>Settings</th>
                    <td>First cell r1</td>
                    <td>Second cell r1</td>
                </tr>
            </table>
            <table>
                <tr>
                    <th colspan='3'>Other Settings</th>
                    <td>First cell r2</td>
                    <td>Second cell r2</td>
                </tr>
            </table>
    '''

    def slurper = new XmlSlurper(parser)
    def page = slurper.parseText(html)

在此示例中，应选择第一个表，以便我可以迭代其中的其他行值。有人可以帮我解决这个 GPath 吗？

编辑：附带问题 - 为什么

println page.HTML.HEAD.TITLE

打印一个空字符串，它不应该返回标题吗？

原文

I'm parsing an HTML file into a well-formed XML document using NekoHTML parser. However I can't quite figure out the GPath so that I can identify the table that has the "Settings" string.

def parser = new org.cyberneko.html.parsers.SAXParser()
parser.setFeature('http://xml.org/sax/features/namespaces', false)

    def html = 
    ''' 
        <html>
            <title>Hiya!</title>
        </html>
        <body>
            <table>
                <tr>
                    <th colspan='3'>Settings</th>
                    <td>First cell r1</td>
                    <td>Second cell r1</td>
                </tr>
            </table>
            <table>
                <tr>
                    <th colspan='3'>Other Settings</th>
                    <td>First cell r2</td>
                    <td>Second cell r2</td>
                </tr>
            </table>
    '''

    def slurper = new XmlSlurper(parser)
    def page = slurper.parseText(html)

In this sample, the first table should be selected so that I can iterate over other row values in it. Can someone help me with this GPath please?

EDIT: Side question - why does

println page.HTML.HEAD.TITLE

print an empty string, shouldn't it return the title?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橪书 2025-01-12 09:01:26

要获取标题中包含“设置”的表格，您应该能够执行以下操作：

def settingsTableNode = page.BODY.TABLE.find { 表 ->
  table.TBODY.TR.TH.text() == '设置'
}

page 指向文档的根目录，因此您不需要'不需要 HTML。您需要做的就是：
```
println page.HEAD.TITLE
```

To get the table with 'Settings' in the header, you should be able to do:

def settingsTableNode = page.BODY.TABLE.find { table ->
  table.TBODY.TR.TH.text() == 'Settings'
}

page points to the root of the document, so you don't need the HTML. All you should need to do is:
```
println page.HEAD.TITLE
```

回复收藏 0 原文

~没有更多了~

关于作者

盗琴音

暂无简介

文章

26 人气

关注发私信

忆悲凉

文章 0 评论 0

关注

hgfg1645

文章 0 评论 0

关注

qq_qLPLYi

文章 0 评论 0

关注

戏舞

文章 0 评论 0

关注

殊姿

文章 0 评论 0

关注

﹂绝世的画

文章 0 评论 0

友情链接

文江博客

GPath 查找表头是否包含匹配的字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

忆悲凉

hgfg1645

qq_qLPLYi

戏舞

殊姿

﹂绝世的画

友情链接

GPath 查找表头是否包含匹配的字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

忆悲凉

hgfg1645

qq_qLPLYi

戏舞

殊姿

﹂绝世的画

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。