网页抓取:我得到了所需的结果,但 get_text 在读取空行时提供错误。有什么想法吗?
import requests
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States"
page = requests.get(url)
#print(page.status_code)
#print(page.content)
soup = BeautifulSoup(page.content, 'html.parser')
#print(soup.prettify())
tb = soup.find('table', class_='wikitable')
"""for link in tb.find_all('b'):
name = link.find('a')
print(name)"""
for link in tb.find_all('b'):
name = link.find('a')
print(name.get_text('title'))
我相信它正在读取表格,然后当它到达空行时我收到错误。
import requests
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States"
page = requests.get(url)
#print(page.status_code)
#print(page.content)
soup = BeautifulSoup(page.content, 'html.parser')
#print(soup.prettify())
tb = soup.find('table', class_='wikitable')
"""for link in tb.find_all('b'):
name = link.find('a')
print(name)"""
for link in tb.find_all('b'):
name = link.find('a')
print(name.get_text('title'))
I believe it is reading the table and then when it gets to an empty line I am getting an error.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果未找到请求的标签,find 方法将返回
None
,因此您必须检查这一点:此外
.get_text
方法只是剥离每个标签并返回其中的文本,因此您传递的论点可能不会达到您期望的效果。 Soup 只是将其视为分隔符。这是此方法的签名:文档:https://www.crummy。 com/software/BeautifulSoup/bs4/doc/#get-text
find method returns
None
if the requested tag is not found, so you have to check for that:Also
.get_text
method just strips every tag and returns the text inside, so the argument you pass might not do what you expect it to do. Soup just treats it as a separator. Here's this method's signature:Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text