当前位置：文江博客话题详情

Python Selenium xpath web-scraping

Selenium / 在网站上使用分页？

发布于 2025-01-17 03:35:39 字数 3863 浏览 0 评论 0原文

我想触发该网站的分页： https://www.kicker.de/bundesliga/topspieler/2008-09

我在 chrome-inspector 中找到了带有此 XPATH 的元素：

driver.find_element(By.XPATH,"//a[@class='kick__pagination__button kick__icon-Pfeil04 kick__pagination--icon']").click()

现在我想单击此元素以进一步翻一页 - 但出现错误。

这是我的代码：

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from sys import platform
import os, sys
from datetime import datetime, timedelta
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent

if __name__ == '__main__':   
  print(f"Checking chromedriver...")
  os.environ['WDM_LOG_LEVEL'] = '0' 
  ua = UserAgent()
  userAgent = ua.random
  options = Options()
  options.add_argument('--headless')
  options.add_experimental_option ('excludeSwitches', ['enable-logging'])
  options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 1})    
  options.add_argument("--disable-infobars")
  options.add_argument("--disable-extensions")  
  options.add_argument("start-maximized")
  options.add_argument('window-size=1920x1080')                               
  options.add_argument('--no-sandbox')
  options.add_argument('--disable-gpu')  
  options.add_argument(f'user-agent={userAgent}')   
  srv=Service(ChromeDriverManager().install())
  driver = webdriver.Chrome (service=srv, options=options)    
  waitWebDriver = WebDriverWait (driver, 10)         

  seasonList = ["2008-09","2009-10","2010-11","2011-12","2012-13","2013-14","2014-15",
                "2015-16","2016-17","2017-18","2018-19","2020-21", "2021-22"]
  for season in seasonList:
    tmpSeason = f"{season[:4]}/20{season[5:]}"
    link = f"https://www.kicker.de/bundesliga/topspieler/{season}" 
    print(f"Working for link {link}...")        
    driver.get (link)       
    time.sleep(WAIT) 
    
    while True:
      soup = BeautifulSoup (driver.page_source, 'html.parser')  
      tmpTABLE = soup.find("table")
      tmpTR = tmpTABLE.find_all("tr")      

      driver.find_element(By.XPATH,"//a[@class='kick__pagination__button kick__icon-Pfeil04 kick__pagination--icon']").click()            
      time.sleep(WAIT)

但我收到此错误：

Traceback (most recent call last):
  File "C:\Users\Polzi\Documents\DEV\Fiverr\ORDER\fireworkenter\collGrades.py", line 116, in <module>
    driver.find_element(By.XPATH,"//a[@class='kick__pagination__button kick__icon-Pfeil04 kick__pagination--icon']").click()
  File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\webelement.py", line 80, in click
    self._execute(Command.CLICK_ELEMENT)
  File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\webelement.py", line 693, in _execute
    return self._parent.execute(command, params)
  File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 400, in execute
    self.error_handler.check_response(response)
  File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 236, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
  (Session info: headless chrome=99.0.4844.82)

如何使用硒转到下一页？

i want to trigger the pagination on this site:
https://www.kicker.de/bundesliga/topspieler/2008-09

I found the element with this XPATH in the chrome-inspector:

driver.find_element(By.XPATH,"//a[@class='kick__pagination__button kick__icon-Pfeil04 kick__pagination--icon']").click()

Now i want to click this element to go one page further - but i get an error.

This is my code:

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from sys import platform
import os, sys
from datetime import datetime, timedelta
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent

if __name__ == '__main__':   
  print(f"Checking chromedriver...")
  os.environ['WDM_LOG_LEVEL'] = '0' 
  ua = UserAgent()
  userAgent = ua.random
  options = Options()
  options.add_argument('--headless')
  options.add_experimental_option ('excludeSwitches', ['enable-logging'])
  options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 1})    
  options.add_argument("--disable-infobars")
  options.add_argument("--disable-extensions")  
  options.add_argument("start-maximized")
  options.add_argument('window-size=1920x1080')                               
  options.add_argument('--no-sandbox')
  options.add_argument('--disable-gpu')  
  options.add_argument(f'user-agent={userAgent}')   
  srv=Service(ChromeDriverManager().install())
  driver = webdriver.Chrome (service=srv, options=options)    
  waitWebDriver = WebDriverWait (driver, 10)         

  seasonList = ["2008-09","2009-10","2010-11","2011-12","2012-13","2013-14","2014-15",
                "2015-16","2016-17","2017-18","2018-19","2020-21", "2021-22"]
  for season in seasonList:
    tmpSeason = f"{season[:4]}/20{season[5:]}"
    link = f"https://www.kicker.de/bundesliga/topspieler/{season}" 
    print(f"Working for link {link}...")        
    driver.get (link)       
    time.sleep(WAIT) 
    
    while True:
      soup = BeautifulSoup (driver.page_source, 'html.parser')  
      tmpTABLE = soup.find("table")
      tmpTR = tmpTABLE.find_all("tr")      

      driver.find_element(By.XPATH,"//a[@class='kick__pagination__button kick__icon-Pfeil04 kick__pagination--icon']").click()            
      time.sleep(WAIT)

But i get this error:

Traceback (most recent call last):
  File "C:\Users\Polzi\Documents\DEV\Fiverr\ORDER\fireworkenter\collGrades.py", line 116, in <module>
    driver.find_element(By.XPATH,"//a[@class='kick__pagination__button kick__icon-Pfeil04 kick__pagination--icon']").click()
  File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\webelement.py", line 80, in click
    self._execute(Command.CLICK_ELEMENT)
  File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\webelement.py", line 693, in _execute
    return self._parent.execute(command, params)
  File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 400, in execute
    self.error_handler.check_response(response)
  File "C:\Users\Polzi\Documents\DEV\.venv\NormalScraping\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 236, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
  (Session info: headless chrome=99.0.4844.82)

How can i go to the next page using selenium?

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

白云悠悠 2025-01-24 03:35:39

转到下一页，您可以单击下一页元素，为 element_to_be_clickable()，您可以使用以下任一方法定位器策略：

使用CSS_SELECTOR：

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.kick__pagination__button--active +a"))).click()

使用XPATH：

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(@class, 'kick__pagination__button--active')]//以下::a[1 ]“）））。点击（）

注意：您必须添加以下导入：

从 selenium.webdriver.support.ui 导入 WebDriverWait
从 selenium.webdriver.common.by 导入
从 selenium.webdriver.support 导入预期条件作为 EC

The go to the next page you can click on the next page element inducing WebDriverWait for the element_to_be_clickable() and you can use either of the following locator strategies:

Using CSS_SELECTOR:

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.kick__pagination__button--active +a"))).click()

Using XPATH:

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(@class, 'kick__pagination__button--active')]//following::a[1]"))).click()

Note: You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

回复收藏 0 原文

~没有更多了~

关于作者

梦毁影碎の

暂无简介

文章

评论

27 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

饮湿

文章 0 评论 0

明月

文章 0 评论 0

02

文章 0 评论 0

hs1283

文章 0 评论 0

风向决定发型

文章 0 评论 0

落花浅忆

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文