网页抓取：我得到了所需的结果，但 get_text 在读取空行时提供错误。有什么想法吗？

发布于 2025-01-12 00:14:10 字数 616 浏览 0 评论 0原文

import requests

from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States"

page = requests.get(url)

#print(page.status_code)

#print(page.content)

soup = BeautifulSoup(page.content, 'html.parser')

#print(soup.prettify())


tb = soup.find('table', class_='wikitable')

"""for link in tb.find_all('b'):
    name = link.find('a')
    print(name)"""

for link in tb.find_all('b'):
    name = link.find('a')
    print(name.get_text('title'))

代码和结果的图片

我相信它正在读取表格，然后当它到达空行时我收到错误。

原文

import requests

from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States"

page = requests.get(url)

#print(page.status_code)

#print(page.content)

soup = BeautifulSoup(page.content, 'html.parser')

#print(soup.prettify())


tb = soup.find('table', class_='wikitable')

"""for link in tb.find_all('b'):
    name = link.find('a')
    print(name)"""

for link in tb.find_all('b'):
    name = link.find('a')
    print(name.get_text('title'))

Picture of code and result

I believe it is reading the table and then when it gets to an empty line I am getting an error.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷弦 2025-01-19 00:14:10

如果未找到请求的标签，find 方法将返回 None，因此您必须检查这一点：

for link in tb.find_all('b'):
    name = link.find('a')
    if name is not None:
        print(name.get_text())

此外 .get_text 方法只是剥离每个标签并返回其中的文本，因此您传递的论点可能不会达到您期望的效果。 Soup 只是将其视为分隔符。这是此方法的签名：

def get_text(self, separator="", strip=False,
             types=default):

文档：https://www.crummy。 com/software/BeautifulSoup/bs4/doc/#get-text

find method returns None if the requested tag is not found, so you have to check for that:

for link in tb.find_all('b'):
    name = link.find('a')
    if name is not None:
        print(name.get_text())

Also .get_text method just strips every tag and returns the text inside, so the argument you pass might not do what you expect it to do. Soup just treats it as a separator. Here's this method's signature:

def get_text(self, separator="", strip=False,
             types=default):

Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text

回复收藏 0 原文

~没有更多了~

关于作者

恬淡成诗

暂无简介

文章

27 人气

关注发私信

饮湿

文章 0 评论 0

关注

明月

文章 0 评论 0

关注

02

文章 0 评论 0

关注

hs1283

文章 0 评论 0

关注

风向决定发型

文章 0 评论 0

关注

落花浅忆

文章 0 评论 0

友情链接

文江博客

网页抓取：我得到了所需的结果，但 get_text 在读取空行时提供错误。有什么想法吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

网页抓取：我得到了所需的结果，但 get_text 在读取空行时提供错误。有什么想法吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。