Python:在本地/特定元素上使用 xpath
我正在尝试使用 xpath 从页面获取链接。问题是我只想要表格内的链接,但如果我在整个页面上应用 xpath 表达式,我将捕获我不想要的链接。
例如:
tree = lxml.html.parse(some_response)
links = tree.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")
问题是将表达式应用于整个文档。我找到了我想要的元素,例如:
tree = lxml.html.parse(some_response)
root = tree.getroot()
table = root[1][5] #for example
links = table.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")
但这似乎也在整个文档中执行查询,因为我仍在捕获表外部的链接。 此页面表示“当对元素使用 xpath() 时,将根据元素(如果相对)或根树(如果绝对):”。那么,我使用的是绝对表达式,我需要将其设为相对表达式吗?是这样吗?
基本上,我怎样才能只过滤该表中存在的元素?
I'm trying to get the links from a page with xpath. The problem is that I only want the links inside a table, but if I apply the xpath expression on the whole page I'll capture links which I don't want.
For example:
tree = lxml.html.parse(some_response)
links = tree.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")
The problem is that applies the expression to the whole document. I located the element I want, for example:
tree = lxml.html.parse(some_response)
root = tree.getroot()
table = root[1][5] #for example
links = table.xpath("//a[contains(@href, 'http://www.example.com/filter/')]")
But that seems to be performing the query in the whole document as well, as I still am capturing the links outside of the table. This page says that "When xpath() is used on an Element, the XPath expression is evaluated against the element (if relative) or against the root tree (if absolute):". So, what I using is an absolute expression and I need to make it relative? Is that it?
Basically, how can I go about filtering only elements that exist inside of this table?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的 xpath 以斜杠 (
/
) 开头,因此是绝对路径。在前面添加一个点(.
),使其相对于当前元素,即Your xpath starts with a slash (
/
) and is therefore absolute. Add a dot (.
) in front to make it relative to the current element i.e.另一种选择是直接询问表中的元素。
例如:
如果页面中有很多表格,则需要
**criteria**
。一些可能的标准是根据表 ID 或类进行过滤。例如:Another option would be to ask directly for elements inside your table.
For instance:
Where
**criteria**
is necessary if there are many tables in the page. Some possible criteria would be to filter based on the table id or class. For instance: