A href 捕获
我正在使用 BeautifulSoup 来解析一些 html。内容如下:
<tr>
<th>Your provider:</th>
<td>
<img src="/isp_logos/la-la-la.ico" alt=""/>
<a href="/isp/SomeProvider">
Provider name </a>
<a href="http://*/isp-comparer/?isp=000000">
</a>
</td>
</tr>
我必须从链接中获取 SomeProvider 文本。我的代码是:
contentSoup = BeautifulSoup(ThatHtml)
print contentSoup.findAll('a', href=re.compile('/isp/(.*)'))
结果是空数组,为什么?也许还有其他方法?
I'm using BeautifulSoup for parsing some html. Here is the content:
<tr>
<th>Your provider:</th>
<td>
<img src="/isp_logos/la-la-la.ico" alt=""/>
<a href="/isp/SomeProvider">
Provider name </a>
<a href="http://*/isp-comparer/?isp=000000">
</a>
</td>
</tr>
I have to get SomeProvider text from the link . My code is:
contentSoup = BeautifulSoup(ThatHtml)
print contentSoup.findAll('a', href=re.compile('/isp/(.*)'))
The result is empty array, why? Maybe there are another ways?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通过您发布的代码和输入,我得到:
作为数组的返回。您使用的是最新的 3.1.x 版本的 BeautifulSoup 吗?我实际上也遇到了同样的问题,但事实证明我下载了BeautifulSoup的2.x版本,认为2.x意味着它与python 2.x兼容。
假设第一个包含 SomeProvider,您可以使用:
来提取该标签。
With your posted code and input, I'm getting:
As the return of the array. Are you using the newest 3.1.x version of BeautifulSoup? I actually had the same problem, but it turns out I downloaded the 2.x version of BeautifulSoup thinking that the 2.x meant it was compatible with python 2.x.
Assuming that the first contains the SomeProvider, you could just use:
to extract that tag.