访问 lxml.html 中输出的第一个元素

发布于 2024-09-15 21:50:23 字数 615 浏览 6 评论 0原文

使用 lxml.html,如何在不使用 for 循环的情况下访问单个元素?

这是 HTML:

<tr class="headlineRow">
  <td>
    <span class="headline">This is some awesome text</span>
  </td>
</tr>

例如,这将因 IndexError 而失败:

 for row in doc.cssselect('tr.headlineRow'):
     headline = row.cssselect('td span.headline')
     print headline[0]

这将通过:

 for row in doc.cssselect('tr.headlineRow'):
     headline = row.cssselect('td span.headline')
     for first_thing in headline:
         print headline[0].text_content()

With lxml.html, how do I access single elements without using a for loop?

This is the HTML:

<tr class="headlineRow">
  <td>
    <span class="headline">This is some awesome text</span>
  </td>
</tr>

For example, this will fail with IndexError:

 for row in doc.cssselect('tr.headlineRow'):
     headline = row.cssselect('td span.headline')
     print headline[0]

This will pass:

 for row in doc.cssselect('tr.headlineRow'):
     headline = row.cssselect('td span.headline')
     for first_thing in headline:
         print headline[0].text_content()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

忱杏 2024-09-22 21:50:23

我通常使用 xpath 方法来处理这样的事情。
它返回匹配元素的列表。

>>> spans = doc.xpath('//tr[@class="headlineRow"]/td/span[@class="headline"]')
>>> spans[0].text
'This is some awesome text'

I usually use the xpath method for things like this.
It returns a list of matching elements.

>>> spans = doc.xpath('//tr[@class="headlineRow"]/td/span[@class="headline"]')
>>> spans[0].text
'This is some awesome text'
绿光 2024-09-22 21:50:23

我使用 CSSSelectorheadline[ 尝试了您的示例0] 工作正常。见下文:

>>> html  ="""<tr class="headlineRow">
  <td>
    <span class="headline">This is some awesome text</span>
  </td>
</tr>"""
>>> from lxml import etree
>>> from lxml.cssselect import CSSSelector
>>> doc = etree.fromstring(html)
>>> sel1 = CSSSelector('tr.headlineRow')
>>> sel2 = CSSSelector('td span.headline')
>>> for row in sel1(doc):
    headline = sel2(row)
    print headline[0]

<Element span at 8f31e3c>

I tried out your example using CSSSelector and headline[0] worked fine. See below:

>>> html  ="""<tr class="headlineRow">
  <td>
    <span class="headline">This is some awesome text</span>
  </td>
</tr>"""
>>> from lxml import etree
>>> from lxml.cssselect import CSSSelector
>>> doc = etree.fromstring(html)
>>> sel1 = CSSSelector('tr.headlineRow')
>>> sel2 = CSSSelector('td span.headline')
>>> for row in sel1(doc):
    headline = sel2(row)
    print headline[0]

<Element span at 8f31e3c>
挽梦忆笙歌 2024-09-22 21:50:23

你的“失败”例子非常适合我?要么你在尝试时犯了一个错误,要么你使用的是较旧版本的 lxml,该版本有一个 - 现在已修复 - 错误(我尝试了 2.2.6 和 2.1.1 - 我使用的最旧版本,并且都有效)

Your "failing" example works perfectly for me? Either you made a mistake when trying it out, or you are using an older version of lxml that has a - now fixed - bug (I tried 2.2.6, and with 2.1.1 - the oldest I had around, and both worked)

爱你不解释 2024-09-22 21:50:23

元素的访问方式与访问嵌套列表的方式相同:

>>> doc[0][0]
<Element span at ...>

或者通过 CSS 选择器:

doc.cssselect('td span.headline')[0]

Elements are accessed the same way you access nested lists:

>>> doc[0][0]
<Element span at ...>

Or via CSS selectors:

doc.cssselect('td span.headline')[0]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文