使用Beautifutsoup从HTML标签中检索所有名称

发布于 2025-01-28 17:11:57 字数 710 浏览 1 评论 0原文

我设法用美丽的汤来设置，并找到所需的标签。如何提取标签中的所有名称？

tags = soup.find_all("a")
print(tags)

运行上述代码后，我得到了以下输出，

[<a href="/wiki/Alfred_the_Great" title="Alfred the Great">Alfred the Great</a>, <a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">Queen Elizabeth I</a>, <a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">Family tree of Scottish monarchs</a>, <a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">Kenneth MacAlpin</a>]

该如何检索名称，Alfred the Great the Great，伊丽莎白一世，肯尼斯·麦卡尔平（Kenneth Macalpin）等？我需要使用正则表达式吗？使用.String给了我一个错误

原文

I managed to setup by Beautiful Soup and find the tags that I needed. How do I extract all the names in the tags?

tags = soup.find_all("a")
print(tags)

After running the above code, I got the following output

[<a href="/wiki/Alfred_the_Great" title="Alfred the Great">Alfred the Great</a>, <a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">Queen Elizabeth I</a>, <a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">Family tree of Scottish monarchs</a>, <a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">Kenneth MacAlpin</a>]

How do I retrieve the names, Alfred the Great,Queen Elizabeth I, Kenneth MacAlpin, etc? Do i need to use regular expression? Using .string gave me an error

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沦落红尘 2025-02-04 17:11:57

您可以在标签上迭代并使用tag.get（'title'）获取标题值。

其他一些做同样的方法：
https：//www.crummy.com/ /＃属性

回复收藏 0 原文

原野 2025-02-04 17:11:57

无需应用re。轻松获取所有名称：

html='''
<html>
 <body>
  <a href="/wiki/Alfred_the_Great" title="Alfred the Great">
   Alfred the Great
  </a>
  ,
  <a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">
   Queen Elizabeth I
  </a>
  ,
  <a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">
   Family tree of Scottish monarchs
  </a>
  ,
  <a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">
   Kenneth MacAlpin
  </a>
 </body>
</html>

'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'lxml')

#print(soup.prettify())

for name in soup.find_all('a'):
    txt = name.get('title')
    #OR
    #txt = name.get_text(strip=True)
    print(txt)

您可以通过迭代所有标签，然后调用标题属性或get_text（）或.find（text = true） 输出来

Alfred the Great
Queen Elizabeth I
Family tree of Scottish monarchs
Kenneth MacAlpin

No need to apply re. You can easily grab all the names by iterating all a tags then call title attribute or get_text() or .find(text=True)

html='''
<html>
 <body>
  <a href="/wiki/Alfred_the_Great" title="Alfred the Great">
   Alfred the Great
  </a>
  ,
  <a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">
   Queen Elizabeth I
  </a>
  ,
  <a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">
   Family tree of Scottish monarchs
  </a>
  ,
  <a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">
   Kenneth MacAlpin
  </a>
 </body>
</html>

'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'lxml')

#print(soup.prettify())

for name in soup.find_all('a'):
    txt = name.get('title')
    #OR
    #txt = name.get_text(strip=True)
    print(txt)

Output:

Alfred the Great
Queen Elizabeth I
Family tree of Scottish monarchs
Kenneth MacAlpin

回复收藏 0 原文

~没有更多了~

关于作者

梦情居士

暂无简介

文章

27 人气

关注发私信

李珊平

文章 0 评论 0

关注

Quxin

文章 0 评论 0

关注

范无咎

文章 0 评论 0

关注

github_ZOJ2N8YxBm

文章 0 评论 0

关注

若言

文章 0 评论 0

关注

南…巷孤猫

文章 0 评论 0

友情链接

文江博客

使用Beautifutsoup从HTML标签中检索所有名称

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

使用Beautifutsoup从HTML标签中检索所有名称

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。