我如何从此HTML中提取HREF和标题

发布于 2025-01-22 19:42:17 字数 515 浏览 0 评论 0原文

我的BS4.Element.ResultSet具有这种格式：

    [<h3 class="foo1">
    <a href="someLink" title="someTitle">SomeTitle</a>
    </h3>,
    <h3 class="foo1">
    <a href="OtherLink" title="OtherTitle">OtherTitle</a>
    </h3>]

而且我希望能够提取并保存在元组中 [（title，href），（title2，href2）]但是我似乎不能这样做

，但最接近的尝试

    link = soup.find('h3',class_='foo1').find('a').get('title')
    print(link)

只是返回2个或更多的第一个元素我如何成功提取每个HREF和标题

原文

my bs4.element.ResultSet has this format:

    [<h3 class="foo1">
    <a href="someLink" title="someTitle">SomeTitle</a>
    </h3>,
    <h3 class="foo1">
    <a href="OtherLink" title="OtherTitle">OtherTitle</a>
    </h3>]

and i want to be able to extract and save in tuple
[(title,href),(title2, href2)] but i cant seem to do so

my closest attempt was

    link = soup.find('h3',class_='foo1').find('a').get('title')
    print(link)

but that only returns the first element of the 2 or more
how can i successfully extract each href and title

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千笙结 2025-01-29 19:42:17

使用css选择器选择您的元素更具体的元素，然后在Resultset上迭代以获取每个元素的属性，为tum>元组的列表：

[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href][title]')]

示例

from bs4 import BeautifulSoup
html = '''
<h3 class="foo1">
    <a href="someLink" title="someTitle">SomeTitle</a>
</h3>
<h3 class="foo1">
    <a href="OtherLink" title="OtherTitle">OtherTitle</a>
</h3>
'''
soup = BeautifulSoup(html)

[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href]')]

输出

[('someTitle', 'someLink'), ('OtherTitle', 'OtherLink')]

Select your elements more specific e.g. with css selectors and iterate over your ResultSet to get the attributes of each of them as list of tuples:

[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href][title]')]

Example

from bs4 import BeautifulSoup
html = '''
<h3 class="foo1">
    <a href="someLink" title="someTitle">SomeTitle</a>
</h3>
<h3 class="foo1">
    <a href="OtherLink" title="OtherTitle">OtherTitle</a>
</h3>
'''
soup = BeautifulSoup(html)

[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href]')]

Output

[('someTitle', 'someLink'), ('OtherTitle', 'OtherLink')]

回复收藏 0 原文

穿透光 2025-01-29 19:42:17

代码：

soup.select('h3.foo1>a[href][title]').map(lambda link : (link.get("href"), link.get("title")))

说明：

soup.select('h3.foo1>a[href][title]')

选择具有HREF和标题的所有a元素，它们是H3 带有foo1类的元素。

.map(lambda link :

对于这些a元素中的每个元素，请用下面的内容替换它们。

(link.get("href"), link.get("title"))

制作包含链接的HREF和标题的元组。

Code:

soup.select('h3.foo1>a[href][title]').map(lambda link : (link.get("href"), link.get("title")))

Explanation:

soup.select('h3.foo1>a[href][title]')

Selects all the a elements that have a href and a title that are a direct child of an h3 element with the foo1 class.

.map(lambda link :

For each of those a elements, replace each of them with what follows.

(link.get("href"), link.get("title"))

Make a tuple containing the link's href and title.

回复收藏 0 原文

~没有更多了~

关于作者

缱绻入梦

暂无简介

文章

27 人气

关注发私信

饮湿

文章 0 评论 0

关注

明月

文章 0 评论 0

关注

02

文章 0 评论 0

关注

hs1283

文章 0 评论 0

关注

风向决定发型

文章 0 评论 0

关注

落花浅忆

文章 0 评论 0

友情链接

文江博客

我如何从此HTML中提取HREF和标题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

示例

输出

Example

Output

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

我如何从此HTML中提取HREF和标题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

示例

输出

Example

Output

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。