如何匹配 XPath (lxml) 中元素的内容?
我想使用 XPath 表达式通过 lxml 解析 HTML。我的问题是匹配标签的内容:
例如,给定元素,
<a href="http://something">Example</a>
我可以使用匹配 href 属性,
.//a[@href='http://something']
但给定表达式
.//a[.='Example']
甚至
.//a[contains(.,'Example')]
lxml 会抛出“无效节点谓词”异常。
我做错了什么?
编辑:
示例代码:
from lxml import etree
from cStringIO import StringIO
html = '<a href="http://something">Example</a>'
parser = etree.HTMLParser()
tree = etree.parse(StringIO(html), parser)
print tree.find(".//a[text()='Example']").tag
预期输出为“a”。我收到“语法错误:节点谓词无效”
I want to parse HTML with lxml using XPath expressions. My problem is matching for the contents of a tag:
For example given the
<a href="http://something">Example</a>
element I can match the href attribute using
.//a[@href='http://something']
but the given the expression
.//a[.='Example']
or even
.//a[contains(.,'Example')]
lxml throws the 'invalid node predicate' exception.
What am I doing wrong?
EDIT:
Example code:
from lxml import etree
from cStringIO import StringIO
html = '<a href="http://something">Example</a>'
parser = etree.HTMLParser()
tree = etree.parse(StringIO(html), parser)
print tree.find(".//a[text()='Example']").tag
Expected output is 'a'. I get 'SyntaxError: invalid node predicate'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会尝试使用:
.//a[text()='Example']
使用 xpath() 方法:
如果您想使用 iterfind()、findall()、find(), findtext(),请记住,ElementPath。
I would try with:
.//a[text()='Example']
using xpath() method:
If case you would like to use iterfind(), findall(), find(), findtext(), keep in mind that advanced features like value comparison and functions are not available in ElementPath.
目前接受的答案存在性能缺陷。考虑使用:
代替。
这在原始 xml.ElementTree 实现的查找语法文档中进行了记录: https://docs.python.org/3/library/xml.etree.elementtree.html#supported-xpath-syntax
lxml.etree 表示它的 find 方法使用相同的受限语法。
The currently accepted answer has performance drawbacks. Consider using:
instead.
This is documented in the find syntax docs here for the original xml.ElementTree implementation: https://docs.python.org/3/library/xml.etree.elementtree.html#supported-xpath-syntax
and lxml.etree says it uses the same restricted syntax for its find methods.