我可以解析页面
我不能解析广告标题。我想从OLX解析页面(就像在独联体国家一样),然后将其写入CSV文件。我已经写了两个功能,一个获取页面,另一个查找名称,我想测试它,但是我有某种错误,如果您帮助我,我会很感激。
import requests
from bs4 import BeautifulSoup
import csv
HOST = 'https://www.olx.ua/'
URL = 'https://www.olx.ua/d/zhivotnye/sobaki/'
HEADERS = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'
}
def get_html(url, params=''):
r = requests.get(url, headers=HEADERS, params=params)
return r
def get_content(html):
soup = BeautifulSoup(html, 'html.parser')
items = soup.find_all('div', class_='css-19ucd76') # Parsing the entire ad.
animal = []
for item in items:
animal.append(
{
'Title': item.find('div', class__='css-u2ayx9').get_text(strip=True) # Name parsing.
}
)
return animal
html = get_html(URL)
print(get_content(html.text))
CSS -19UCD76
- 这是一个广告。 css -u2ayx9
- 这是标题。
Это ошибка: 'title': item.find('div', class__='css-u2ayx9').get_text(strip=True)
AttributeError: 'NoneType' object has no attribute 'get_text'
I can't parse ad titles. I want to parse pages from OLX (It's like ebay only in the CIS countries), and write them to a csv file. I already wrote two functions, one gets the page, the other looks for names, I wanted to test it, but I have some kind of error, I will be grateful if you help me.
import requests
from bs4 import BeautifulSoup
import csv
HOST = 'https://www.olx.ua/'
URL = 'https://www.olx.ua/d/zhivotnye/sobaki/'
HEADERS = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'
}
def get_html(url, params=''):
r = requests.get(url, headers=HEADERS, params=params)
return r
def get_content(html):
soup = BeautifulSoup(html, 'html.parser')
items = soup.find_all('div', class_='css-19ucd76') # Parsing the entire ad.
animal = []
for item in items:
animal.append(
{
'Title': item.find('div', class__='css-u2ayx9').get_text(strip=True) # Name parsing.
}
)
return animal
html = get_html(URL)
print(get_content(html.text))
css-19ucd76
- This is an ad.css-u2ayx9
- This is the title.
Это ошибка: 'title': item.find('div', class__='css-u2ayx9').get_text(strip=True)
AttributeError: 'NoneType' object has no attribute 'get_text'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试避免在此类类中选择HTML中的动态零件,并更改您的策略以选择更多静态零件,例如标签或ID。
使用的
CSS选择器
获取卡sop.select('div [dagation-cy =“ l-card”]')
和通过h6 标签名称:
示例
输出
Try to avoid selecting on dynamic part in HTML as such classes and change your strategy to select more static parts like tags or by id.
Used
css selectors
get the cardssoup.select('div[data-cy="l-card"]')
and selected title byh6
tag name:Example
Output