访问 lxml.html 中输出的第一个元素
使用 lxml.html,如何在不使用 for 循环的情况下访问单个元素?
这是 HTML:
<tr class="headlineRow">
<td>
<span class="headline">This is some awesome text</span>
</td>
</tr>
例如,这将因 IndexError 而失败:
for row in doc.cssselect('tr.headlineRow'):
headline = row.cssselect('td span.headline')
print headline[0]
这将通过:
for row in doc.cssselect('tr.headlineRow'):
headline = row.cssselect('td span.headline')
for first_thing in headline:
print headline[0].text_content()
With lxml.html, how do I access single elements without using a for loop?
This is the HTML:
<tr class="headlineRow">
<td>
<span class="headline">This is some awesome text</span>
</td>
</tr>
For example, this will fail with IndexError:
for row in doc.cssselect('tr.headlineRow'):
headline = row.cssselect('td span.headline')
print headline[0]
This will pass:
for row in doc.cssselect('tr.headlineRow'):
headline = row.cssselect('td span.headline')
for first_thing in headline:
print headline[0].text_content()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我通常使用 xpath 方法来处理这样的事情。
它返回匹配元素的列表。
I usually use the xpath method for things like this.
It returns a list of matching elements.
我使用
CSSSelector
和headline[ 尝试了您的示例0]
工作正常。见下文:I tried out your example using
CSSSelector
andheadline[0]
worked fine. See below:你的“失败”例子非常适合我?要么你在尝试时犯了一个错误,要么你使用的是较旧版本的 lxml,该版本有一个 - 现在已修复 - 错误(我尝试了 2.2.6 和 2.1.1 - 我使用的最旧版本,并且都有效)
Your "failing" example works perfectly for me? Either you made a mistake when trying it out, or you are using an older version of lxml that has a - now fixed - bug (I tried 2.2.6, and with 2.1.1 - the oldest I had around, and both worked)
元素的访问方式与访问嵌套列表的方式相同:
或者通过 CSS 选择器:
Elements are accessed the same way you access nested lists:
Or via CSS selectors: