美丽的汤 - 如何获取 href
我似乎无法从以下 html 汤中提取 href(页面上只有一个 Website:
):
<div id='id_Website'>
<strong>Website:</strong>
<a href='http://google.com' target='_blank' rel='nofollow'>www.google.com</a>
</div></div><div>
这就是我的思想应该有效
href = soup.find("strong" ,text=re.compile(r'Website')).next["href"]
I can't seem to be able to extract the href (there is only one <strong>Website:</strong>
on the page) from the following soup of html:
<div id='id_Website'>
<strong>Website:</strong>
<a href='http://google.com' target='_blank' rel='nofollow'>www.google.com</a>
</div></div><div>
This is what I thought should work
href = soup.find("strong" ,text=re.compile(r'Website')).next["href"]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在本例中,
.next
是一个NavigableString
,其中包含标记和
之间的空格。代码>标签。此外,
text=
属性用于匹配NavigableString
,而不是元素。我认为以下内容可以满足您的要求:
...但这不是很稳健。如果封闭的
div
具有可预测的 ID,那么最好找到它,然后找到其中的第一个元素。
.next
in this case is aNavigableString
containing the whitespace between the<strong>
tag and the<a>
tag. Also, thetext=
attribute is for matchingNavigableString
s, rather than elements.The following does what you want, I think:
... but that isn't very robust. If the enclosing
div
has a predictable ID, then it would better to find that, and then find the first<a>
element within it.