Python：HREF标签TypeError

发布于 2025-01-23 10:30:27 字数 1243 浏览 2 评论 0原文

我尝试运行我的网络刮擦代码。有时候它可以正常工作，但是有时会给我带来追溯类型错误代码。我想知道是什么原因导致错误代码？

这是错误消息：

Traceback (most recent call last):
File "D:\python-learning\listings.py", line 22, in <module> 
  pageLink='https://www.vancouverforsale.ca'+getData(pageLink)
File "D:\python-learning\listings.py", line 17, in getData 
  return nextLink['href']
TypeError: 'NoneType' object is not subscriptable

from bs4 import BeautifulSoup
import lxml
import requests

def getData(url):
    html_text = requests.get(url).text
    soup = BeautifulSoup(html_text,'lxml')
    listings = soup.find_all('div', class_ = 'row property results')
    for listing in listings:
        address = listing.find('a', class_ = 'address').text
        price = listing.find('a', class_ = 'price').text
        print(address)
        print(price)

#find next page
nextLink=soup.find('a', string='Next »')
return nextLink['href'] 

pageLink='https://www.vancouverforsale.ca/search/results/?
city=Langley&region=all&list_price_min=50000&list_price_max=a
ll&beds_min=all&baths_min=all&type=con'

count=0
while count<3:
    pageLink='https://www.vancouverforsale.ca'+getData(pageLink)
    count+=1

原文

I tried running my web scraping code. Sometime it works fine, but sometime it will give me Traceback Type Error code. I am wondering what is causing the error code?

Here is the error message:

Traceback (most recent call last):
File "D:\python-learning\listings.py", line 22, in <module> 
  pageLink='https://www.vancouverforsale.ca'+getData(pageLink)
File "D:\python-learning\listings.py", line 17, in getData 
  return nextLink['href']
TypeError: 'NoneType' object is not subscriptable

from bs4 import BeautifulSoup
import lxml
import requests

def getData(url):
    html_text = requests.get(url).text
    soup = BeautifulSoup(html_text,'lxml')
    listings = soup.find_all('div', class_ = 'row property results')
    for listing in listings:
        address = listing.find('a', class_ = 'address').text
        price = listing.find('a', class_ = 'price').text
        print(address)
        print(price)

#find next page
nextLink=soup.find('a', string='Next »')
return nextLink['href'] 

pageLink='https://www.vancouverforsale.ca/search/results/?
city=Langley®ion=all&list_price_min=50000&list_price_max=a
ll&beds_min=all&baths_min=all&type=con'

count=0
while count<3:
    pageLink='https://www.vancouverforsale.ca'+getData(pageLink)
    count+=1

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

王权女流氓 2025-01-30 10:30:27

您必须检查nextlink不是none尝试获取['href']

next_link = soup.find('a', string='Next »')
if next_link:
    return 'https://www.vancouverforsale.ca' + next_link['href']

以及nextlink nextlink是无，然后它可以返回无，您必须在主循环中检查它

for count in range(3):
    page_link = get_data(page_link)
    if not page_link:
        break

完整的工作代码

import requests
from bs4 import BeautifulSoup
#import urllib.parse

# PEP8: `lower_case_names` for functions and variables

def get_data(url):   
    
    response = requests.get(url)
    #print(response.status_code)
    soup = BeautifulSoup(response.text, 'lxml')
    
    listings = soup.find_all('div', class_='row property results')
    for listing in listings:
        address = listing.find('a', class_='address').text.strip()  # PEP8: `=` without spaces inside `()`
        price = listing.find('a', class_='price').text.replace('▲', '').replace('▼', '').strip()
        print('address:', address)
        print('price  :', price)
        print('---')

    # find next page
    next_link = soup.find('a', string='Next »')
    if next_link:
        #return urllib.parse.urljoin('https://www.vancouverforsale.ca', next_link['href'])
        return 'https://www.vancouverforsale.ca' + next_link['href']
    
# --- main ---

page_link = 'https://www.vancouverforsale.ca/search/results/?city=Langley®ion=all&list_price_min=50000&list_price_max=all&beds_min=all&baths_min=all&type=con'

#while True:
for count in range(3):
    page_link = get_data(page_link)
    if not page_link:
        break

结果：

address: 19681 75 Avenue, Langley
price  : $1,695,000
---
address: 20806 52a Avenue, Langley
price  : $1,649,900
---
address: 20804 52a Avenue, Langley
price  : $1,649,900
---
address: 7138 210 Street Unit 43, Langley
price  : $1,638,000
---
address: 8567 204 Street Unit 13, Langley
price  : $1,624,999
---
address: 19842 75b Avenue, Langley
price  : $1,599,000
---
address: 8567 204 Street Unit 1, Langley
price  : $1,598,000
---
address: 8258 202 Street, Langley
price  : $1,588,800
---
address: 7138 210 Street Unit 59, Langley
price  : $1,579,000
---
address: 8567 204 Street Unit 3, Langley
price  : $1,499,900
---
address: 7429 197 Street, Langley
price  : $1,489,900
---
address: 22981 Billy Brown Road, Langley
price  : $1,399,000
---
address: 23168 Billy Brown Road, Langley
price  : $1,399,000
---
address: 26718 32 Avenue, Langley
price  : $1,399,000
---
address: 20327 82 Avenue, Langley
price  : $1,395,000
---
address: 8567 204 Street Unit 7, Langley
price  : $1,390,000
---
address: 20873 71b Avenue, Langley
price  : $1,388,000
---
address: 20321 80 Avenue Unit 27, Langley
price  : $1,370,000
---
address: 20924 80a Avenue, Langley
price  : $1,350,000
---
address: 20463 70 Avenue Unit 2, Langley
price  : $1,349,900
---
address: 23189 Francis Avenue Unit 203, Langley
price  : $1,349,000
---
address: 20576 84a Avenue, Langley
price  : $1,349,000
---
address: 20451 84 Avenue Unit 10, Langley
price  : $1,348,000
---
address: 7138 210 Street Unit 85, Langley
price  : $1,348,000
---
address: 19897 75a Avenue Unit 46, Langley
price  : $1,325,000
---
address: 9567 217a Street Unit 3, Langley
price  : $1,299,900
---
address: 20321 80 Avenue Unit 45, Langley
price  : $1,299,900
---
address: 9762 182a Street Unit 21, Langley
price  : $1,298,888
---
address: 8450 204 Street Unit 29, Langley
price  : $1,258,000
---
address: 20770 97b Avenue Unit 3, Langley
price  : $1,250,000
---

pep 8- python代码的样式指南

You have to check if nextLink is not None before you try to get ['href']

next_link = soup.find('a', string='Next »')
if next_link:
    return 'https://www.vancouverforsale.ca' + next_link['href']

and when nextLink is None then it can return None and you have to check it in main loop

for count in range(3):
    page_link = get_data(page_link)
    if not page_link:
        break

Full working code

import requests
from bs4 import BeautifulSoup
#import urllib.parse

# PEP8: `lower_case_names` for functions and variables

def get_data(url):   
    
    response = requests.get(url)
    #print(response.status_code)
    soup = BeautifulSoup(response.text, 'lxml')
    
    listings = soup.find_all('div', class_='row property results')
    for listing in listings:
        address = listing.find('a', class_='address').text.strip()  # PEP8: `=` without spaces inside `()`
        price = listing.find('a', class_='price').text.replace('▲', '').replace('▼', '').strip()
        print('address:', address)
        print('price  :', price)
        print('---')

    # find next page
    next_link = soup.find('a', string='Next »')
    if next_link:
        #return urllib.parse.urljoin('https://www.vancouverforsale.ca', next_link['href'])
        return 'https://www.vancouverforsale.ca' + next_link['href']
    
# --- main ---

page_link = 'https://www.vancouverforsale.ca/search/results/?city=Langley®ion=all&list_price_min=50000&list_price_max=all&beds_min=all&baths_min=all&type=con'

#while True:
for count in range(3):
    page_link = get_data(page_link)
    if not page_link:
        break

Result:

address: 19681 75 Avenue, Langley
price  : $1,695,000
---
address: 20806 52a Avenue, Langley
price  : $1,649,900
---
address: 20804 52a Avenue, Langley
price  : $1,649,900
---
address: 7138 210 Street Unit 43, Langley
price  : $1,638,000
---
address: 8567 204 Street Unit 13, Langley
price  : $1,624,999
---
address: 19842 75b Avenue, Langley
price  : $1,599,000
---
address: 8567 204 Street Unit 1, Langley
price  : $1,598,000
---
address: 8258 202 Street, Langley
price  : $1,588,800
---
address: 7138 210 Street Unit 59, Langley
price  : $1,579,000
---
address: 8567 204 Street Unit 3, Langley
price  : $1,499,900
---
address: 7429 197 Street, Langley
price  : $1,489,900
---
address: 22981 Billy Brown Road, Langley
price  : $1,399,000
---
address: 23168 Billy Brown Road, Langley
price  : $1,399,000
---
address: 26718 32 Avenue, Langley
price  : $1,399,000
---
address: 20327 82 Avenue, Langley
price  : $1,395,000
---
address: 8567 204 Street Unit 7, Langley
price  : $1,390,000
---
address: 20873 71b Avenue, Langley
price  : $1,388,000
---
address: 20321 80 Avenue Unit 27, Langley
price  : $1,370,000
---
address: 20924 80a Avenue, Langley
price  : $1,350,000
---
address: 20463 70 Avenue Unit 2, Langley
price  : $1,349,900
---
address: 23189 Francis Avenue Unit 203, Langley
price  : $1,349,000
---
address: 20576 84a Avenue, Langley
price  : $1,349,000
---
address: 20451 84 Avenue Unit 10, Langley
price  : $1,348,000
---
address: 7138 210 Street Unit 85, Langley
price  : $1,348,000
---
address: 19897 75a Avenue Unit 46, Langley
price  : $1,325,000
---
address: 9567 217a Street Unit 3, Langley
price  : $1,299,900
---
address: 20321 80 Avenue Unit 45, Langley
price  : $1,299,900
---
address: 9762 182a Street Unit 21, Langley
price  : $1,298,888
---
address: 8450 204 Street Unit 29, Langley
price  : $1,258,000
---
address: 20770 97b Avenue Unit 3, Langley
price  : $1,250,000
---

PEP 8 -- Style Guide for Python Code

回复收藏 0 原文

~没有更多了~