XPath 表达式对 //element 不返回任何内容,但 //* 返回一个计数
我将 XOM 与以下示例数据一起使用:
Element root = cleanDoc.getRootElement();
//find all the bold elements, as those mark institution and clinic.
Nodes nodes = root.query("//*");
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head>
<title>Patient Information</title>
</head>
</html>
以下元素返回许多元素(来自真实数据):
//*
但类似“
//head
不返回任何内容”。如果我遍历根的子元素,数字似乎是匹配的,如果我打印元素名称,一切看起来都是正确的。
我正在获取 HTML,使用 tagoup 对其进行解析,然后从结果字符串构建 XOM 文档。其中哪一部分会出现如此严重的错误?我觉得这里发生了一些奇怪的编码问题,但我只是没有看到它。 Java 字符串就是字符串,对吗?
I'm using XOM with the following sample data:
Element root = cleanDoc.getRootElement();
//find all the bold elements, as those mark institution and clinic.
Nodes nodes = root.query("//*");
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head>
<title>Patient Information</title>
</head>
</html>
The following element returns many elements (from real data):
//*
but something like
//head
Returns nothing. If I run through the children of the root, the numbers seem to match up, and if I print the element name, everything seems to look correct.
I'm taking HTML, parsing it with tagsoup, and then building a XOM Document from the resulting string. What part of this could go so horribly wrong? I feel there's some weird encoding issue going on here, but I'm just not seeing it. Java Strings are Strings, right?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的文档有一个默认名称空间,这意味着在 XPath 模型中,所有元素都位于该名称空间中。
查询应为
//html:head
。您必须提供到 XPath 查询的命名空间映射。请注意,虽然 XPath 表达式使用命名空间前缀,但必须匹配命名空间 uri。
Your document has a default namespace, which means in the XPath model all the elements are in that namespace.
The query should be
//html:head
. You will have to supply the namespace mapping to the XPath query.Note that while the XPath expression uses a namespace prefix, it is the namespace uri that must match.