当python中的列表中附加元素时,使用时循环仅返回第一个迭代的结果

发布于 2025-02-12 18:04:13 字数 2089 浏览 1 评论 0原文

使用硒在Python中学习网络刮擦。我想刮掉亚马逊商品的价格和名称,并将其存储在列表中。我正在使用循环进行此操作,直到无法单击下一页,以便会丢弃TimeException错误。当我调试时,我可以清楚地看到一切正常,我的列表越来越长,但是当它破裂并打印列表时,我看到我的程序仅保存了第一个循环迭代。不太了解那里发生了什么。这是我的代码:

from selenium.webdriver.common.by import By
from time import sleep

# paste url that you want to scrape
url = "https://www.amazon.se/-/en/s?k=mirror+sticker&language=en_GB&crid=3LCT7C6GU8FUS&qid=1656847509&sprefix=mirror+sticker%2Caps%2C91&ref=sr_pg_1"
# this will open up new window with the url provided above
# put the path to the driver.exe file in the brackets
driver = webdriver.Chrome("chromedriver.exe")
driver.get(url)
sleep(3)  # wait 3 seconds
driver.find_element(By.ID, "sp-cc-accept").click() # cookies

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait


def get_text_store(web_elements_lst, storage_lst):  # text (names and prices) of webelements
    for element in web_elements_lst:
        if element.get_attribute("textContent") != "":
            storage_lst.append(element.get_attribute("textContent"))  # if not empty, append
        else:
            storage_lst.append("No data")  # if empty str


names_txt = [] # here I'll store str names
prices_txt = [] # here I store str prices
while True:
    try:
        web_elements_names = driver.find_elements(By.CLASS_NAME,
                                                  "a-size-base-plus.a-color-base.a-text-normal")  # names (webelems)
        web_elements_prices = driver.find_elements(By.CLASS_NAME, "a-price-whole")  # prices (webelems)
        get_text_store(web_elements_names, names_txt)  # text from webelems names
        get_text_store(web_elements_prices, prices_txt)  # text from webelems prices
        WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, "//a[text()='Next']"))).click()  # go to the next page
    except TimeoutException:
        print("Timeout Exception")
        break
print(names_txt)
print(prices_txt)```

Learning web scraping in Python using Selenium. I want to scrape the prices and names of the goods from Amazon and store them in a list. I'm doing it using while loop until it is impossible to click to the next page an thus it will throw TimeException error. When I debug I can clearly see that everything works fine, my lists get longer and longer but then when it breaks and I print the lists, I see that my program saved only the first loop iteration in there. Don't really understand what is going on there. Here is my code:

from selenium.webdriver.common.by import By
from time import sleep

# paste url that you want to scrape
url = "https://www.amazon.se/-/en/s?k=mirror+sticker&language=en_GB&crid=3LCT7C6GU8FUS&qid=1656847509&sprefix=mirror+sticker%2Caps%2C91&ref=sr_pg_1"
# this will open up new window with the url provided above
# put the path to the driver.exe file in the brackets
driver = webdriver.Chrome("chromedriver.exe")
driver.get(url)
sleep(3)  # wait 3 seconds
driver.find_element(By.ID, "sp-cc-accept").click() # cookies

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait


def get_text_store(web_elements_lst, storage_lst):  # text (names and prices) of webelements
    for element in web_elements_lst:
        if element.get_attribute("textContent") != "":
            storage_lst.append(element.get_attribute("textContent"))  # if not empty, append
        else:
            storage_lst.append("No data")  # if empty str


names_txt = [] # here I'll store str names
prices_txt = [] # here I store str prices
while True:
    try:
        web_elements_names = driver.find_elements(By.CLASS_NAME,
                                                  "a-size-base-plus.a-color-base.a-text-normal")  # names (webelems)
        web_elements_prices = driver.find_elements(By.CLASS_NAME, "a-price-whole")  # prices (webelems)
        get_text_store(web_elements_names, names_txt)  # text from webelems names
        get_text_store(web_elements_prices, prices_txt)  # text from webelems prices
        WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, "//a[text()='Next']"))).click()  # go to the next page
    except TimeoutException:
        print("Timeout Exception")
        break
print(names_txt)
print(prices_txt)```

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烟若柳尘 2025-02-19 18:04:13

如下所示,添加睡眠

while True:
    try:
        web_elements_names = driver.find_elements(By.CLASS_NAME,
                                                  "a-size-base-plus.a-color-base.a-text-normal")  # names (webelems)
        web_elements_prices = driver.find_elements(By.CLASS_NAME, "a-price-whole")  # prices (webelems)
        get_text_store(web_elements_names, names_txt)  # text from webelems names
        get_text_store(web_elements_prices, prices_txt)  # text from webelems prices
        WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, "//a[text()='Next']"))).click()  # go to the next page
        sleep(5)
    except TimeoutException:
        print("Timeout Exception")
        break

Add a sleep as shown below

while True:
    try:
        web_elements_names = driver.find_elements(By.CLASS_NAME,
                                                  "a-size-base-plus.a-color-base.a-text-normal")  # names (webelems)
        web_elements_prices = driver.find_elements(By.CLASS_NAME, "a-price-whole")  # prices (webelems)
        get_text_store(web_elements_names, names_txt)  # text from webelems names
        get_text_store(web_elements_prices, prices_txt)  # text from webelems prices
        WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, "//a[text()='Next']"))).click()  # go to the next page
        sleep(5)
    except TimeoutException:
        print("Timeout Exception")
        break
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文