用硒刮擦未显示所有数据(可能的重复)
我试图制作一个简单的代码,以刮取动态网站(这里有硒的新手)。我打算刮擦的数据是产品名称和价格。我运行了代码,它起作用了,但是只显示了10个条目,而每个页面有60个条目。这是代码:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.tokopedia.com/p/komputer-laptop/media-penyimpanan-data') # the link
product_name = driver.find_elements(By.CSS_SELECTOR, value='span.css-1bjwylw')
product_price = driver.find_elements(By.CSS_SELECTOR, value='span.css-o5uqvq')
list_product = []
list_price = []
for i in range(len(product_name)):
list_product.append(product_name[i].text)
for j in range(len(product_price)):
list_price.append(product_price[i].text)
driver.quit()
df = pd.DataFrame(columns=['product', 'price'])
df['product'] = list_product
df['price'] = list_price
print(df)
我使用了Chromedriver安装程序,而不是先下载驱动程序,然后找到它,因为我只是认为这只是一种简单的方法。另外,我使用服务而不是选项(许多使用选项的教程),因为我遇到了一些错误,并且服务效果很好。哦,我使用了pycharm,如果那样有意义,也许。
任何帮助或建议将不胜感激,谢谢!
I was trying to make a simple code for scraping a dynamic website (a newbie with Selenium here). The data I intended to scrape is the product name and the price. I ran over the code and it worked, but only showed 10 entries, while there are 60 entries for each page. Here is the code:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.tokopedia.com/p/komputer-laptop/media-penyimpanan-data') # the link
product_name = driver.find_elements(By.CSS_SELECTOR, value='span.css-1bjwylw')
product_price = driver.find_elements(By.CSS_SELECTOR, value='span.css-o5uqvq')
list_product = []
list_price = []
for i in range(len(product_name)):
list_product.append(product_name[i].text)
for j in range(len(product_price)):
list_price.append(product_price[i].text)
driver.quit()
df = pd.DataFrame(columns=['product', 'price'])
df['product'] = list_product
df['price'] = list_price
print(df)
I used the chromedriver installer instead of downloading the driver first and then locating it because I just thought it was just a simpler way. Also, I used Service instead of Options (many tutorial using Options) because I got some errors with it, and with Service it worked out fine. Oh, and I used PyCharm, if that just makes sense of something, maybe.
Any help or suggestions will be very much appreciated, thank you!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据我的说法,您需要首先向下滚动到页面的底部,以加载所有60个数据。由于网站是动态的,并且在滚动数据下方的数据时也会加载。您可以使用JavaScript脚本通过WebDriver进行滚动如下:
driver.execute_script(“ window.scrollto(0,document.body.scrollheight);”)
添加以下driver.get.get()
和find_elements()
之前。不要忘记在
卷轴之后使用睡眠
,因为它需要加载时间。According to me you need to scroll down to bottom of the page first for all 60 of data to be loaded. As website is dynamic and as you scroll below data gets loaded. You can use javascript script for scrolling via webdriver as follows:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
add this belowdriver.get()
and beforefind_elements()
.Don't forget to use sleep after
scroll
as it require time to get loaded.