使用 beautiful soup 基于类和 href 标签解析 html 标签

发布于 2024-12-06 07:34:59 字数 581 浏览 1 评论 0原文

我正在尝试使用 BeautifulSoup 解析 HTML。

我想要的内容是这样的:

<a class="yil-biz-ttl" id="yil_biz_ttl-2" href="http://some-web-url/" title="some title">Title</a> 

我尝试并得到以下错误:

maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
------------------------------------------------------------
   File "<ipython console>", line 1
     maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
                                             ^
SyntaxError: invalid syntax

我想要的是字符串:http://some-web-url/

I am trying to parse HTML with BeautifulSoup.

The content I want is like this:

<a class="yil-biz-ttl" id="yil_biz_ttl-2" href="http://some-web-url/" title="some title">Title</a> 

i tried and got the following error:

maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
------------------------------------------------------------
   File "<ipython console>", line 1
     maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
                                             ^
SyntaxError: invalid syntax

what i want is the string : http://some-web-url/

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

横笛休吹塞上声 2024-12-13 07:34:59
soup.findAll('a', {'class': 'yil-biz-ttl'})[0]['href']

要查找所有此类链接:

for link in soup.findAll('a', {'class': 'yil-biz-ttl'}):
    try:
        print link['href']
    except KeyError:
        pass
soup.findAll('a', {'class': 'yil-biz-ttl'})[0]['href']

To find all such links:

for link in soup.findAll('a', {'class': 'yil-biz-ttl'}):
    try:
        print link['href']
    except KeyError:
        pass
最偏执的依靠 2024-12-13 07:34:59

您在 "class 之后缺少一个闭引号:

 maxx = soup.findAll("href", {"class: "yil-biz-ttl"})

也应该

 maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

如此,我认为您不能像这样搜索 href 这样的属性,我认为您需要搜索标签:

 maxx = [link['href'] for link in soup.findAll("a", {"class": "yil-biz-ttl"})]

You're missing a close-quote after "class:

 maxx = soup.findAll("href", {"class: "yil-biz-ttl"})

should be

 maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

also, I don't think you can search for an attribute like href like that, I think you need to search for a tag:

 maxx = [link['href'] for link in soup.findAll("a", {"class": "yil-biz-ttl"})]
烟雨凡馨 2024-12-13 07:34:59

要查找 CSS 类 "yil-biz-ttl" 中具有 href 属性(其中包含任何内容)的所有 元素:

from bs4 import BeautifulSoup  # $ pip install beautifulsoup4

soup = BeautifulSoup(HTML)
for link in soup("a", "yil-biz-ttl", href=True):
    print(link['href'])

目前所有其他答案都不满足上述要求。

To find all <a/> elements from CSS class "yil-biz-ttl" that have href attribute with anything in it:

from bs4 import BeautifulSoup  # $ pip install beautifulsoup4

soup = BeautifulSoup(HTML)
for link in soup("a", "yil-biz-ttl", href=True):
    print(link['href'])

At the moment all other answers don't satisfy the above requirements.

娇俏 2024-12-13 07:34:59

好吧,首先你有一个语法错误。您在 class 部分的引用错误。

尝试:

maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

Well first of all you have a syntax error. You have your quotes wrong in class part.

Try:

maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文