用硒刮擦未显示所有数据（可能的重复）

发布于 2025-02-09 20:49:16 字数 1140 浏览 2 评论 0原文

我试图制作一个简单的代码，以刮取动态网站（这里有硒的新手）。我打算刮擦的数据是产品名称和价格。我运行了代码，它起作用了，但是只显示了10个条目，而每个页面有60个条目。这是代码：

import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.tokopedia.com/p/komputer-laptop/media-penyimpanan-data') # the link

product_name = driver.find_elements(By.CSS_SELECTOR, value='span.css-1bjwylw')
product_price = driver.find_elements(By.CSS_SELECTOR, value='span.css-o5uqvq')

list_product = []
list_price = []

for i in range(len(product_name)):
    list_product.append(product_name[i].text)

for j in range(len(product_price)):
    list_price.append(product_price[i].text)

driver.quit()

df = pd.DataFrame(columns=['product', 'price'])
df['product'] = list_product
df['price'] = list_price
print(df)

我使用了Chromedriver安装程序，而不是先下载驱动程序，然后找到它，因为我只是认为这只是一种简单的方法。另外，我使用服务而不是选项（许多使用选项的教程），因为我遇到了一些错误，并且服务效果很好。哦，我使用了pycharm，如果那样有意义，也许。

任何帮助或建议将不胜感激，谢谢！

原文

I was trying to make a simple code for scraping a dynamic website (a newbie with Selenium here). The data I intended to scrape is the product name and the price. I ran over the code and it worked, but only showed 10 entries, while there are 60 entries for each page. Here is the code:

import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.tokopedia.com/p/komputer-laptop/media-penyimpanan-data') # the link

product_name = driver.find_elements(By.CSS_SELECTOR, value='span.css-1bjwylw')
product_price = driver.find_elements(By.CSS_SELECTOR, value='span.css-o5uqvq')

list_product = []
list_price = []

for i in range(len(product_name)):
    list_product.append(product_name[i].text)

for j in range(len(product_price)):
    list_price.append(product_price[i].text)

driver.quit()

df = pd.DataFrame(columns=['product', 'price'])
df['product'] = list_product
df['price'] = list_price
print(df)

I used the chromedriver installer instead of downloading the driver first and then locating it because I just thought it was just a simpler way. Also, I used Service instead of Options (many tutorial using Options) because I got some errors with it, and with Service it worked out fine. Oh, and I used PyCharm, if that just makes sense of something, maybe.

Any help or suggestions will be very much appreciated, thank you!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小兔几 2025-02-16 20:49:16

根据我的说法，您需要首先向下滚动到页面的底部，以加载所有60个数据。由于网站是动态的，并且在滚动数据下方的数据时也会加载。您可以使用JavaScript脚本通过WebDriver进行滚动如下：driver.execute_script（“ window.scrollto（0，document.body.scrollheight）;”） 添加以下driver.get.get（）和find_elements（）之前。

不要忘记在卷轴之后使用睡眠，因为它需要加载时间。

回复收藏 0 原文

~没有更多了~