Python:HREF标签TypeError
我尝试运行我的网络刮擦代码。有时候它可以正常工作,但是有时会给我带来追溯类型错误代码。我想知道是什么原因导致错误代码?
这是错误消息:
Traceback (most recent call last):
File "D:\python-learning\listings.py", line 22, in <module>
pageLink='https://www.vancouverforsale.ca'+getData(pageLink)
File "D:\python-learning\listings.py", line 17, in getData
return nextLink['href']
TypeError: 'NoneType' object is not subscriptable
from bs4 import BeautifulSoup
import lxml
import requests
def getData(url):
html_text = requests.get(url).text
soup = BeautifulSoup(html_text,'lxml')
listings = soup.find_all('div', class_ = 'row property results')
for listing in listings:
address = listing.find('a', class_ = 'address').text
price = listing.find('a', class_ = 'price').text
print(address)
print(price)
#find next page
nextLink=soup.find('a', string='Next »')
return nextLink['href']
pageLink='https://www.vancouverforsale.ca/search/results/?
city=Langley®ion=all&list_price_min=50000&list_price_max=a
ll&beds_min=all&baths_min=all&type=con'
count=0
while count<3:
pageLink='https://www.vancouverforsale.ca'+getData(pageLink)
count+=1
I tried running my web scraping code. Sometime it works fine, but sometime it will give me Traceback Type Error code. I am wondering what is causing the error code?
Here is the error message:
Traceback (most recent call last):
File "D:\python-learning\listings.py", line 22, in <module>
pageLink='https://www.vancouverforsale.ca'+getData(pageLink)
File "D:\python-learning\listings.py", line 17, in getData
return nextLink['href']
TypeError: 'NoneType' object is not subscriptable
from bs4 import BeautifulSoup
import lxml
import requests
def getData(url):
html_text = requests.get(url).text
soup = BeautifulSoup(html_text,'lxml')
listings = soup.find_all('div', class_ = 'row property results')
for listing in listings:
address = listing.find('a', class_ = 'address').text
price = listing.find('a', class_ = 'price').text
print(address)
print(price)
#find next page
nextLink=soup.find('a', string='Next »')
return nextLink['href']
pageLink='https://www.vancouverforsale.ca/search/results/?
city=Langley®ion=all&list_price_min=50000&list_price_max=a
ll&beds_min=all&baths_min=all&type=con'
count=0
while count<3:
pageLink='https://www.vancouverforsale.ca'+getData(pageLink)
count+=1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您必须检查
nextlink
不是none
尝试获取['href']
以及
是nextlink
nextlink无
,然后它可以返回无
,您必须在主循环中检查它完整的工作代码
结果:
pep 8- python代码的样式指南
You have to check if
nextLink
is notNone
before you try to get['href']
and when
nextLink
isNone
then it can returnNone
and you have to check it in main loopFull working code
Result:
PEP 8 -- Style Guide for Python Code