dom4j XPath 无法解析 xhtml 文档
我正在尝试使用 dom4j 来解析 xhtml 文档。如果我只是打印出文档,我就可以看到整个文档,这样我就知道它正在正确加载。我尝试选择的两个 div 在文档中处于完全相同的级别。
html
body
div
table
tbody
tr
td
table
tbody
tr
td
div class="definition"
div class="example"
我的代码是,
List<Element> list = document.selectNodes("//html/body/div/table/tbody/tr/td/table/tbody/tr/td");
但当我执行 System.out.println(list);
时列表为空
如果我只执行 List
它实际上返回一个包含一个元素的列表。所以我很困惑我的 xpath 有什么问题以及为什么它找不到这些 div
I'm trying to use dom4j to parse an xhtml document. If I simply print out the document I can see the entire document so I know it is being loaded correctly. The two divs that I'm trying to select are at the exact same level in the document.
html
body
div
table
tbody
tr
td
table
tbody
tr
td
div class="definition"
div class="example"
My code is
List<Element> list = document.selectNodes("//html/body/div/table/tbody/tr/td/table/tbody/tr/td");
but the list is empty when i do System.out.println(list);
If i only do List<Element> list = document.selectNodes("//html");
it does actually return a list with one element in it. So I'm confused about whats wrong with my xpath and why it won't find those divs
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
尝试将 xhtml 命名空间声明为 xpath,例如将其绑定到前缀
x
并使用//x:html/x:body...
作为 XPath 表达式(另请参阅这篇文章适用于 Groovy,而不适用于普通 Java)。也许类似下面的事情应该在 Java 中完成:(未经测试)
Try declaring the xhtml namespace to the xpath, e.g. bind it to the prefix
x
and use//x:html/x:body...
as XPath expression (see also this article which is however for Groovy, not for plain Java). Probably something like the following should do it in Java:(untested)
如果只是“//div”呢?或者“//html/body/div/table/tbody”?我发现长文本 XPath 表达式很难调试,因为我的眼睛很容易被欺骗......所以我将它们分解,直到它起作用,然后再次构建回来。
What about just "//div"? Or "//html/body/div/table/tbody"? I've found long literal XPath expressions hard to debug, as it's easy for my eyes to get tricked... so I break them down until it DOES work and then build back up again.
另一种方法可以是: -
这将搜索文档中“class”属性值等于“definition”或“example”的任何位置的“div”元素。
我发现这种方法更清楚地说明了您尝试从页面检索的内容。另一个好处是,如果页面结构发生变化,但 div 类保持不变,那么您的 xpath 不需要更新。
您还可以使用以下非常有用的 Firefox 插件检查您的 xpath 针对 HTML 文档的工作情况。
Firefox 插件 - XPath 检查器 0.4.4
An alternative could be: -
This searches for "div" elements, anywhere in the document with "class" attributes values equal to "definition" or "example".
I find this approach more clearly illustrates what you are trying to retrieve from the page. An added benefit is if the structure of the page changes, but the div classes stay the same, then your xpath doesn't need to be updated.
You can also check your xpath works against an HTML document using the following firefox plugin which is very useful.
Firefox Plugin - XPath Checker 0.4.4