GPath 查找表头是否包含匹配的字符串
我正在使用 NekoHTML 解析器将 HTML 文件解析为格式良好的 XML 文档。但是我无法完全弄清楚 GPath,以便我可以识别具有“Settings”字符串的表。
def parser = new org.cyberneko.html.parsers.SAXParser()
parser.setFeature('http://xml.org/sax/features/namespaces', false)
def html =
'''
<html>
<title>Hiya!</title>
</html>
<body>
<table>
<tr>
<th colspan='3'>Settings</th>
<td>First cell r1</td>
<td>Second cell r1</td>
</tr>
</table>
<table>
<tr>
<th colspan='3'>Other Settings</th>
<td>First cell r2</td>
<td>Second cell r2</td>
</tr>
</table>
'''
def slurper = new XmlSlurper(parser)
def page = slurper.parseText(html)
在此示例中,应选择第一个表,以便我可以迭代其中的其他行值。有人可以帮我解决这个 GPath 吗?
编辑:附带问题 - 为什么
println page.HTML.HEAD.TITLE
打印一个空字符串,它不应该返回标题吗?
I'm parsing an HTML file into a well-formed XML document using NekoHTML parser. However I can't quite figure out the GPath so that I can identify the table that has the "Settings" string.
def parser = new org.cyberneko.html.parsers.SAXParser()
parser.setFeature('http://xml.org/sax/features/namespaces', false)
def html =
'''
<html>
<title>Hiya!</title>
</html>
<body>
<table>
<tr>
<th colspan='3'>Settings</th>
<td>First cell r1</td>
<td>Second cell r1</td>
</tr>
</table>
<table>
<tr>
<th colspan='3'>Other Settings</th>
<td>First cell r2</td>
<td>Second cell r2</td>
</tr>
</table>
'''
def slurper = new XmlSlurper(parser)
def page = slurper.parseText(html)
In this sample, the first table should be selected so that I can iterate over other row values in it. Can someone help me with this GPath please?
EDIT: Side question - why does
println page.HTML.HEAD.TITLE
print an empty string, shouldn't it return the title?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
要获取标题中包含“设置”的表格,您应该能够执行以下操作:
page
指向文档的根目录,因此您不需要'不需要HTML
。您需要做的就是:To get the table with 'Settings' in the header, you should be able to do:
page
points to the root of the document, so you don't need theHTML
. All you should need to do is: