刮擦网络信息
我正在尝试从网站上刮擦信息,并将其保存到CSV格式中。但是,即使我从网站得到回复,我也无法将任何数据检索到Excel中。我回来的只是我的标题,上面有空列。我的代码有什么问题?任何帮助都将受到赞赏。
from csv import writer
from bs4 import BeautifulSoup
import requests
url = "https://www.sephora.com/beauty/new-beauty-products"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/102.0.5005.63 Safari/537.36'}
page = requests.get(url, headers=headers)
print(page.status_code)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('div', class_="s-card-container")
with open('Competition Pricing Sepho.csv', 'w+', encoding='utf8', newline='') as f:
thewriter = writer(f)
header = ['Title', 'Price']
thewriter.writerow(header)
for list in lists:
Title = list.find('span', class_="css-12vkztw").text
Price = list.find('span', class_="css-0").text
info = [Title, Price]
thewriter.writerow(info)
I am trying to scrape information from a website and save it into CSV format. However, even though I am getting a response from the website I am not able to retrieve any data into excel. all I am getting back are my headers with empty columns. What could be wrong with my code? Any help is appreciated.
from csv import writer
from bs4 import BeautifulSoup
import requests
url = "https://www.sephora.com/beauty/new-beauty-products"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/102.0.5005.63 Safari/537.36'}
page = requests.get(url, headers=headers)
print(page.status_code)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('div', class_="s-card-container")
with open('Competition Pricing Sepho.csv', 'w+', encoding='utf8', newline='') as f:
thewriter = writer(f)
header = ['Title', 'Price']
thewriter.writerow(header)
for list in lists:
Title = list.find('span', class_="css-12vkztw").text
Price = list.find('span', class_="css-0").text
info = [Title, Price]
thewriter.writerow(info)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
页面是动态的,而不是静态HTML。
输出:
Page is dynamic and not in the static html.
Output: