使用 beautiful soup 基于类和 href 标签解析 html 标签

发布于 2024-12-06 07:34:59 字数 581 浏览 1 评论 0原文

我正在尝试使用 BeautifulSoup 解析 HTML。

我想要的内容是这样的：

<a class="yil-biz-ttl" id="yil_biz_ttl-2" href="http://some-web-url/" title="some title">Title</a>

我尝试并得到以下错误：

maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
------------------------------------------------------------
   File "<ipython console>", line 1
     maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
                                             ^
SyntaxError: invalid syntax

我想要的是字符串：http://some-web-url/

原文

I am trying to parse HTML with BeautifulSoup.

The content I want is like this:

<a class="yil-biz-ttl" id="yil_biz_ttl-2" href="http://some-web-url/" title="some title">Title</a>

i tried and got the following error:

maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
------------------------------------------------------------
   File "<ipython console>", line 1
     maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
                                             ^
SyntaxError: invalid syntax

what i want is the string : http://some-web-url/

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

横笛休吹塞上声 2024-12-13 07:34:59

soup.findAll('a', {'class': 'yil-biz-ttl'})[0]['href']

要查找所有此类链接：

for link in soup.findAll('a', {'class': 'yil-biz-ttl'}):
    try:
        print link['href']
    except KeyError:
        pass

soup.findAll('a', {'class': 'yil-biz-ttl'})[0]['href']

To find all such links:

for link in soup.findAll('a', {'class': 'yil-biz-ttl'}):
    try:
        print link['href']
    except KeyError:
        pass

回复收藏 0 原文

最偏执的依靠 2024-12-13 07:34:59

您在 "class 之后缺少一个闭引号：

 maxx = soup.findAll("href", {"class: "yil-biz-ttl"})

也应该

 maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

如此，我认为您不能像这样搜索 href 这样的属性，我认为您需要搜索标签：

 maxx = [link['href'] for link in soup.findAll("a", {"class": "yil-biz-ttl"})]

You're missing a close-quote after "class:

 maxx = soup.findAll("href", {"class: "yil-biz-ttl"})

should be

 maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

also, I don't think you can search for an attribute like href like that, I think you need to search for a tag:

 maxx = [link['href'] for link in soup.findAll("a", {"class": "yil-biz-ttl"})]

回复收藏 0 原文

烟雨凡馨 2024-12-13 07:34:59

要查找 CSS 类 "yil-biz-ttl" 中具有 href 属性（其中包含任何内容）的所有元素：

from bs4 import BeautifulSoup  # $ pip install beautifulsoup4

soup = BeautifulSoup(HTML)
for link in soup("a", "yil-biz-ttl", href=True):
    print(link['href'])

目前所有其他答案都不满足上述要求。

To find all <a/> elements from CSS class "yil-biz-ttl" that have href attribute with anything in it:

from bs4 import BeautifulSoup  # $ pip install beautifulsoup4

soup = BeautifulSoup(HTML)
for link in soup("a", "yil-biz-ttl", href=True):
    print(link['href'])

At the moment all other answers don't satisfy the above requirements.

回复收藏 0 原文