lxml:获取具有特定子元素的元素?
在 lxml 中工作,我想获取具有 title="Go to next page"
的 img
子项的所有链接的 href
属性。
因此,在以下代码片段中:
<a class="noborder" href="StdResults.aspx">
<img src="arrowr.gif" title="Go to next page"></img>
</a>
我想取回 StdResults.aspx
。
我已经到目前为止:
next_link = doc.xpath("//a/img[@title='Go to next page']")
print next_link[0].attrib['href']
但是 next_link
是 img
,而不是 a
标签 - 我怎样才能获得 a标签?
谢谢。
Working in lxml, I want to get the href
attribute of all links with an img
child that has title="Go to next page"
.
So in the following snippet:
<a class="noborder" href="StdResults.aspx">
<img src="arrowr.gif" title="Go to next page"></img>
</a>
I'd like to get StdResults.aspx
back.
I've got this far:
next_link = doc.xpath("//a/img[@title='Go to next page']")
print next_link[0].attrib['href']
But next_link
is the img
, not the a
tag - how can I get the a
tag?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
只需将
a/img...
更改为a[img...]
: (括号的意思是“这样”)或者,您可以走得更远,用于
检索 href 属性的值。
Just change
a/img...
toa[img...]
: (the brackets sort of mean "such that")Or, you could go even farther and use
to retrieve the values of the href attributes.
您还可以使用
//a/img[@title='Go to next page']/parent::a
或//a/img[ 选择父节点或任意祖先节点@title='转到下一页']/ancestor::a
分别作为 XPath 表达式。You can also select the parent node or arbitrary ancestors by using
//a/img[@title='Go to next page']/parent::a
or//a/img[@title='Go to next page']/ancestor::a
respectively as XPath expressions.