Web刮板没有取消完整的页面
我想刮擦并显示此经销商网页中所有汽车的名称:
https://www.herbchambers.com/used-inventory/index.htm?geozip=02108&georadius=0
我找到了相应的x-path删除其中的模式,以找到页面上每个汽车名称的
x-paths
。
x = 1
while True:
the_xpath = f"/html/body/div[2]/div/div/div[8]/div/div[2]/div[1]/div/ul/li[{x}]/div[1]/div[2]/h2/a"
car_name = driver.find_element(By.XPATH, the_xpath)
car_name.location_once_scrolled_into_view
print(car_name.text)
x += 1
它可以很好地工作,并打印出前7-9辆汽车的名称(每次都不同)。但是,它总是用 nosuchelementException
终止,而无需完成整个页面。
我想知道是否有人可以帮助我解决这个问题,并弄清楚为什么它只能完成一半。
I want to scrape and display the names of all the cars from this dealership webpage:
https://www.herbchambers.com/used-inventory/index.htm?geoZip=02108&geoRadius=0
I located the corresponding x-path
and figured out the pattern within it, to find the x-paths
of every single car name on the page.
x = 1
while True:
the_xpath = f"/html/body/div[2]/div/div/div[8]/div/div[2]/div[1]/div/ul/li[{x}]/div[1]/div[2]/h2/a"
car_name = driver.find_element(By.XPATH, the_xpath)
car_name.location_once_scrolled_into_view
print(car_name.text)
x += 1
It works perfectly fine and prints the names of the first 7-9 cars (varies every time). However, it then always terminates with the NoSuchElementException
, without finishing the entire page.
I was wondering if anyone could help me solve this issue and figure out why it only works half way.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通常,与硒一起工作时,定义路径的方式是有风险的元素(如果存在的javaScript生成),每个首先要为颜色的名称为“一个”,如果您想通过定义每个分区的第三个元素,则如果不存在汽车的颜色,并且其元素未生成,则价格为价格。现在将发生崩溃,因为汽车只有2个要素的
最佳方法是找到具有代表信息的值的属性像这样
generally your way of defining the path is risky when working with selenium as absence of any element inside the box that contains the car will make a mess and selenium will return that error for example if there's divisons and each division represent a car then we have 3 elements (generated by javascript if present) inside each first for name one for color and one for price if you want to collect the price by defining the 3rd element of each division if the color of a car not present and its element not generated then you will have a crash as now that car only has 2 elements
the best approach is to find attribute with values representing the information for example in your case you first find div by class 'vehicle-card-details-container' this div contains h2 then a like this