使用Beautifutsoup从HTML标签中检索所有名称
我设法用美丽的汤来设置,并找到所需的标签。如何提取标签中的所有名称?
tags = soup.find_all("a")
print(tags)
运行上述代码后,我得到了以下输出,
[<a href="/wiki/Alfred_the_Great" title="Alfred the Great">Alfred the Great</a>, <a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">Queen Elizabeth I</a>, <a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">Family tree of Scottish monarchs</a>, <a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">Kenneth MacAlpin</a>]
该如何检索名称,Alfred the Great the Great,伊丽莎白一世,肯尼斯·麦卡尔平(Kenneth Macalpin)等?我需要使用正则表达式吗?使用.String给了我一个错误
I managed to setup by Beautiful Soup and find the tags that I needed. How do I extract all the names in the tags?
tags = soup.find_all("a")
print(tags)
After running the above code, I got the following output
[<a href="/wiki/Alfred_the_Great" title="Alfred the Great">Alfred the Great</a>, <a class="mw-redirect" href="/wiki/Elizabeth_I_of_England" title="Elizabeth I of England">Queen Elizabeth I</a>, <a href="/wiki/Family_tree_of_Scottish_monarchs" title="Family tree of Scottish monarchs">Family tree of Scottish monarchs</a>, <a href="/wiki/Kenneth_MacAlpin" title="Kenneth MacAlpin">Kenneth MacAlpin</a>]
How do I retrieve the names, Alfred the Great,Queen Elizabeth I, Kenneth MacAlpin, etc? Do i need to use regular expression? Using .string gave me an error
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以在标签上迭代并使用
tag.get('title')
获取标题值。其他一些做同样的方法:
https://www.crummy.com/ /#属性
You can iterate over the tags and use
tag.get('title')
to get the title value.Some other ways to do the same:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#attributes
无需应用
re
。 轻松获取所有名称:您可以通过迭代所有标签,然后调用
标题属性或get_text()或.find(text = true)
输出来No need to apply
re
. You can easily grab all the names by iterating all a tags then calltitle attribute or get_text() or .find(text=True)
Output: