如何在弹出窗口中将硒链接到webccrape的第二页。
我正在尝试为各种结果覆盖。第一页工作正常,但是当我切换到下一页时,不幸的是,它只是再次将结果的第一页网络覆盖。结果不会返回新的URL,因此它不起作用,而是在URL打开页面顶部的窗口。我似乎也无法弄清楚如何附加第一页的结果以添加第二页,它们以单独的列表出现。以下是我拥有的代码。
from selenium import webdriver
import time
import requests
from bs4 import BeautifulSoup
from selenium.webdriver.common.keys import Keys
#original webscraping code to get the names of locations from page 1
url = r'https://autochek.africa/en/ng/fix-your-car/service/scheduled-car-service'
driver = webdriver.Chrome()
driver.get(url)
xpath_get_locations = r'/html/body/div[1]/div/div[2]/div/div[1]/div/div[2]/div[2]/div/div/form/div[7]/div/label'
driver.find_element_by_xpath(xpath_get_locations).click()
soup = BeautifulSoup(driver.page_source, 'html.parser')
location_results = [i.text for i in soup.find_all('div', {'class': 'jsx-1642469937 state'})]
print(location_results)
time.sleep(3)
#finished page 1, finding the next button to go to page 2
xpath_find_next_button = r'/html/body/div[1]/div/div[2]/div/div[1]/div/div[2]/div[2]/div[2]/div[2]/div/div/div[3]/ul/li[13]'
driver.find_element_by_xpath(xpath_find_next_button).click()
#getting the locations from page 2
second_page_results = [i.text for i in soup.find_all('div', {'class': 'jsx-1642469937 state'})]
print(second_page_results)
time.sleep(2)
I am trying to webscrape various pages of results. The first page works fine but when I switch to the next page, unfortunately,it just webscrapes the first page of results again. The results dont return a new URL so that way doesn't work but rather its a window on top of the url opened page. I also cant seem to figure out how to append the results of the first page to add the second page, they come out as separate lists. Below is the code I have.
from selenium import webdriver
import time
import requests
from bs4 import BeautifulSoup
from selenium.webdriver.common.keys import Keys
#original webscraping code to get the names of locations from page 1
url = r'https://autochek.africa/en/ng/fix-your-car/service/scheduled-car-service'
driver = webdriver.Chrome()
driver.get(url)
xpath_get_locations = r'/html/body/div[1]/div/div[2]/div/div[1]/div/div[2]/div[2]/div/div/form/div[7]/div/label'
driver.find_element_by_xpath(xpath_get_locations).click()
soup = BeautifulSoup(driver.page_source, 'html.parser')
location_results = [i.text for i in soup.find_all('div', {'class': 'jsx-1642469937 state'})]
print(location_results)
time.sleep(3)
#finished page 1, finding the next button to go to page 2
xpath_find_next_button = r'/html/body/div[1]/div/div[2]/div/div[1]/div/div[2]/div[2]/div[2]/div[2]/div/div/div[3]/ul/li[13]'
driver.find_element_by_xpath(xpath_find_next_button).click()
#getting the locations from page 2
second_page_results = [i.text for i in soup.find_all('div', {'class': 'jsx-1642469937 state'})]
print(second_page_results)
time.sleep(2)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
加载新页面或在页面上运行一些JavaScript代码后,您必须再次运行
才能使用新的HTML。
或Skip
BeautifulSoup
,并在selenium
中完成所有操作。使用
find_elements _...
在WordElements
中使用char s 。一句:
(
xpath
不需要前缀r
,因为它不使用\
)顺便说 更可读的
xpath
。使用按钮
Next>
而不是搜索按钮2
,3
等会更简单。编辑:
完整的工作代码,使用
时
-loop访问所有页面。我添加了模块
webdriver_manager
自动下载浏览器的驱动程序(Fresh)驱动程序。我使用
find_elemens(by.xpath,...)
是因为find_elemens_by_xpath(...)
isdeprected
。After loading new page or running some JavaScript code on page you have to run again
to work with new HTML.
Or skip
BeautifulSoup
and do all inSelenium
.Use
find_elements_...
with chars
in wordelements
.By The Way:
(
xpath
doesn't need prefixr
because it doesn't use\
)Shorter and more readable
xpath
.And it would be simpler to use button
Next >
instead of searching buttons2
,3
, etc.EDIT:
Full working code which uses
while
-loop to visit all pages.I added module
webdriver_manager
which automatically downloads (fresh) driver for browser.I use
find_elemens(By.XPATH, ...)
becausefind_elemens_by_xpath(...)
isdeprecated
.