使用 XPath 获取第二个元素文本？

发布于 2024-09-30 20:15:36 字数 672 浏览 3 评论 0原文

<span class='python'>
  <a>google</a>
  <a>chrome</a>
</span>

我想要 chrome 并让它像这样工作。

q = item.findall('.//span[@class="python"]//a')
t = q[1].text # first element = 0

我想将它组合成一个 XPath 表达式，并且只获取一项而不是列表。
我尝试了这个，但它不起作用。

t = item.findtext('.//span[@class="python"]//a[2]') # first element = 1

实际的而不是简化的 HTML 是这样的。

<span class='python'>
  <span>
    <span>
      <img></img>
      <a>google</a>
    </span>
    <a>chrome</a>
  </span>
</span>

原文

<span class='python'>
  <a>google</a>
  <a>chrome</a>
</span>

I want to get chrome and have it working like this already.

q = item.findall('.//span[@class="python"]//a')
t = q[1].text # first element = 0

I'd like to combine it into a single XPath expression and just get one item instead of a list.
I tried this but it doesn't work.

t = item.findtext('.//span[@class="python"]//a[2]') # first element = 1

And the actual, not simplified, HTML is like this.

<span class='python'>
  <span>
    <span>
      <img></img>
      <a>google</a>
    </span>
    <a>chrome</a>
  </span>
</span>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

假情假意假温柔 2024-10-07 20:15:36

我试过了，但没用。
t = item.findtext('.//span[@class="python"]//a[2]')

这是有关 // 缩写的常见问题解答。

.//a[2] 表示：选择当前节点的所有 a 后代，并且是其父节点的第二个 a 子节点。因此，这可能会选择多个元素或不选择任何元素——具体取决于具体的 XML 文档。

更简单地说，[] 运算符的优先级高于 //。

如果您只想返回所有节点中的一个（第二个），则必须使用括号来强制您想要的优先级：

(.//a)[2]

这实际上选择了第二个 a< /code> 当前节点的后代。

问题中实际使用的表达方式，改为：

(.//span[@class="python"]//a)[2]

或改为：

(.//span[@class="python"]//a)[2]/text()

I tried this but it doesn't work.
t = item.findtext('.//span[@class="python"]//a[2]')

This is a FAQ about the // abbreviation.

.//a[2] means: Select all a descendents of the current node that are the second a child of their parent. So this may select more than one element or no element -- depending on the concrete XML document.

To put it more simply, the [] operator has higher precedence than //.

If you want just one (the second) of all nodes returned you have to use brackets to force your wanted precedence:

(.//a)[2]

This really selects the second a descendent of the current node.

For the actual expression used in the question, change it to:

(.//span[@class="python"]//a)[2]

or change it to:

(.//span[@class="python"]//a)[2]/text()

回复收藏 0 原文

神仙妹妹 2024-10-07 20:15:36

我不确定问题是什么......

>>> d = """<span class='python'>
...   <a>google</a>
...   <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>

I'm not sure what the problem is...

>>> d = """<span class='python'>
...   <a>google</a>
...   <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>

回复收藏 0 原文

会发光的星星闪亮亮i 2024-10-07 20:15:36

来自评论：

或者实际情况的简化
我发布的 HTML 太简单了

你是对的。 .//span[@class="python"]//a[2] 是什么意思？这将扩展为：

self::node()
 /descendant-or-self::node()
  /child::span[attribute::class="python"]
   /descendant-or-self::node()
    /child::a[position()=2]

它将最终选择第二个a子节点（fn:position()指的是child轴）。因此，如果您的文档如下所示，则不会选择任何内容：

<span class='python'> 
  <span> 
    <span> 
      <img></img> 
      <a>google</a><!-- This is the first "a" child of its parent --> 
    </span> 
    <a>chrome</a><!-- This is also the first "a" child of its parent --> 
  </span> 
</span>

如果您想要所有后代中的第二个，请使用：

descendant::span[@class="python"]/descendant::a[2]

From Comments:

or the simplification of the actual
HTML I posted is too simple

You are right. What is the meaning of .//span[@class="python"]//a[2]? This will be expanded to:

self::node()
 /descendant-or-self::node()
  /child::span[attribute::class="python"]
   /descendant-or-self::node()
    /child::a[position()=2]

It will finaly select the second a child (fn:position() refers to the child axe). So, nothing will be select if your document is like:

<span class='python'> 
  <span> 
    <span> 
      <img></img> 
      <a>google</a><!-- This is the first "a" child of its parent --> 
    </span> 
    <a>chrome</a><!-- This is also the first "a" child of its parent --> 
  </span> 
</span>

If you want the second of all descendants, use:

descendant::span[@class="python"]/descendant::a[2]

回复收藏 0 原文

~没有更多了~

关于作者

等风也等你

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

使用 XPath 获取第二个元素文本？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

吝吻

Jasmine

∞梦里开花

阳光①夏

暮念

梦里泪两行

友情链接

使用 XPath 获取第二个元素文本？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

吝吻

Jasmine

∞梦里开花

阳光①夏

暮念

梦里泪两行

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。