A href 捕获

发布于 2024-09-10 03:43:40 字数 561 浏览 1 评论 0原文

我正在使用 BeautifulSoup 来解析一些 html。内容如下：

<tr> 
<th>Your provider:</th> 
<td> 

<img src="/isp_logos/la-la-la.ico" alt=""/> 
 <a href="/isp/SomeProvider"> 
 Provider name </a> 
 &nbsp;
 <a href="http://*/isp-comparer/?isp=000000"> 
 </a> 
</td> 
</tr>

我必须从链接中获取 SomeProvider 文本。我的代码是：

contentSoup = BeautifulSoup(ThatHtml)
print contentSoup.findAll('a', href=re.compile('/isp/(.*)'))

结果是空数组，为什么？也许还有其他方法？

原文

I'm using BeautifulSoup for parsing some html. Here is the content:

<tr> 
<th>Your provider:</th> 
<td> 

<img src="/isp_logos/la-la-la.ico" alt=""/> 
 <a href="/isp/SomeProvider"> 
 Provider name </a> 
  
 <a href="http://*/isp-comparer/?isp=000000"> 
 </a> 
</td> 
</tr>

I have to get SomeProvider text from the link . My code is:

contentSoup = BeautifulSoup(ThatHtml)
print contentSoup.findAll('a', href=re.compile('/isp/(.*)'))

The result is empty array, why? Maybe there are another ways?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟沫凡尘 2024-09-17 03:43:40

通过您发布的代码和输入，我得到：

[<a href="/isp/SomeProvider">   Provider name </a>]

作为数组的返回。您使用的是最新的 3.1.x 版本的 BeautifulSoup 吗？我实际上也遇到了同样的问题，但事实证明我下载了BeautifulSoup的2.x版本，认为2.x意味着它与python 2.x兼容。

假设第一个包含 SomeProvider，您可以使用：

contentSoup.a

来提取该标签。

With your posted code and input, I'm getting:

[<a href="/isp/SomeProvider">   Provider name </a>]

As the return of the array. Are you using the newest 3.1.x version of BeautifulSoup? I actually had the same problem, but it turns out I downloaded the 2.x version of BeautifulSoup thinking that the 2.x meant it was compatible with python 2.x.

Assuming that the first contains the SomeProvider, you could just use: