当前位置：文江博客话题详情

Python tags text beautifulsoup extract

使用 BeautifulSoup 在 python 中提取链接标签之间的文本

发布于 2024-11-13 07:38:31 字数 375 浏览 1 评论 0原文

我有这样的 html 代码：

`我的主页`

`部分`

我需要提取“a”标签之间的文本（链接描述）。我需要一个数组来存储这些内容，例如：

a[0] = "My HomePage"

a[1] = "Sections"

我需要使用 BeautifulSoup 在 python 中执行此操作。

请帮助我，谢谢！

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（3）

前事休说 2024-11-20 07:38:31

你可以这样做：

import BeautifulSoup

html = """
<html><head></head>
<body>
<h2 class='title'><a href='http://www.gurletins.com'>My HomePage</a></h2>
<h2 class='title'><a href='http://www.gurletins.com/sections'>Sections</a></h2>
</body>
</html>
"""

soup = BeautifulSoup.BeautifulSoup(html)

print [elm.a.text for elm in soup.findAll('h2', {'class': 'title'})]
# Output: [u'My HomePage', u'Sections']

You can do something like this:

import BeautifulSoup

html = """
<html><head></head>
<body>
<h2 class='title'><a href='http://www.gurletins.com'>My HomePage</a></h2>
<h2 class='title'><a href='http://www.gurletins.com/sections'>Sections</a></h2>
</body>
</html>
"""

soup = BeautifulSoup.BeautifulSoup(html)

print [elm.a.text for elm in soup.findAll('h2', {'class': 'title'})]
# Output: [u'My HomePage', u'Sections']

回复收藏 0 原文

高冷爸爸 2024-11-20 07:38:31

print [a.findAll(text=True) for a in soup.findAll('a')]

回复收藏 0 原文

嗼ふ静 2024-11-20 07:38:31

以下代码提取“a”标签之间的文本（链接描述）并存储在数组中。

>>> from bs4 import BeautifulSoup
>>> data = """<h2 class="title"><a href="http://www.gurletins.com">My 
HomePage</a></h2>
...
... <h2 class="title"><a href="http://www.gurletins.com/sections">Sections</a>
</h2>"""
>>> soup = BeautifulSoup(data, "html.parser")
>>> reqTxt = soup.find_all("h2", {"class":"title"})
>>> a = []
>>> for i in reqTxt:
...     a.append(i.get_text())
...
>>> a
['My HomePage', 'Sections']
>>> a[0]
'My HomePage'
>>> a[1]
'Sections'

The following code extracts text (link descriptions) between 'a' tags and stores in an array.

>>> from bs4 import BeautifulSoup
>>> data = """<h2 class="title"><a href="http://www.gurletins.com">My 
HomePage</a></h2>
...
... <h2 class="title"><a href="http://www.gurletins.com/sections">Sections</a>
</h2>"""
>>> soup = BeautifulSoup(data, "html.parser")
>>> reqTxt = soup.find_all("h2", {"class":"title"})
>>> a = []
>>> for i in reqTxt:
...     a.append(i.get_text())
...
>>> a
['My HomePage', 'Sections']
>>> a[0]
'My HomePage'
>>> a[1]
'Sections'

回复收藏 0 原文

~没有更多了~

关于作者

分开我的手

暂无简介

0 文章

0 评论

25 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

missyouangeled

文章 0 评论 0

三生一梦

文章 0 评论 0

压抑⊿情绪

文章 0 评论 0

天涯离梦残月幽梦

文章 0 评论 0

指尖微凉心微凉

文章 0 评论 0

☆獨立☆

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文