我正在尝试下载一堆图像并使用 Selenium 将它们分类到文件夹中。为此,我需要获取与 URL 中的每个图像关联的两个 ID。但是,我在从 src 属性中抓取图像链接时遇到问题。无论我尝试通过标签、Xpath 还是其他方法抓取,最终结果都只是“无”。
以下是已检查图像页面的示例:
<html style="height: 100%;"
><head><meta name="viewport" content="width=device-width, minimum-scale=0.1">
<title>index.php (2448×3264)</title>
</head>
<body style="margin: 0px; background: #0e0e0e; height: 100%">
<img style="-webkit-user-select: none;margin: auto;cursor: zoom-in;background-color: hsl(0, 0%, 90%);transition: background-color 300ms;" src="https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=LQCMY&fieldname=DT006_picture&p=show" width="444" height="593">
</body>
</html>
对于此示例,我需要从上面的 URL 中获取“LQCMY”和“DT006_picture”作为字符串。下面的代码显示了我尝试抓取 URL 链接的尝试(由于我之前点击的屏幕被锁定在我无法给出的密码后面,因此进行了编辑)。
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Image = '/html/body/div[1]/div[2]/div/table/tbody/tr[1]/td[1]/a'
driver.find_element_by_xpath(Image).click()
Image_URL = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH, Image))).get_attribute('src')
print(Image_URL)
是否存在某些无法抓取的 src,或者我抓取了错误的标签?
我尝试过按标签抓取,但也返回“无”。
Image_URL = driver.find_element_by_xpath(Image).get_attribute('src')
其他帖子说 WebDriverWait 会有所帮助,但我尝试调整等待时间,但仍然收到“无”
I'm trying to download a bunch of images and categorize them into folders using Selenium. To do so, I need to grab two ID's associated with each image within the URL. However I'm having trouble scraping the image link from the src attribute. Whether I try to grab by tag, Xpath, or other method the end result is merely "None".
Here's an example of an inspected image page:
<html style="height: 100%;"
><head><meta name="viewport" content="width=device-width, minimum-scale=0.1">
<title>index.php (2448×3264)</title>
</head>
<body style="margin: 0px; background: #0e0e0e; height: 100%">
<img style="-webkit-user-select: none;margin: auto;cursor: zoom-in;background-color: hsl(0, 0%, 90%);transition: background-color 300ms;" src="https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=LQCMY&fieldname=DT006_picture&p=show" width="444" height="593">
</body>
</html>
For this example, I would need to grab "LQCMY" and "DT006_picture" as strings from the URL above. The code below shows my attempt at scraping the URL link (edited down since prior screens I click through are locked behind passwords that I can't give out).
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Image = '/html/body/div[1]/div[2]/div/table/tbody/tr[1]/td[1]/a'
driver.find_element_by_xpath(Image).click()
Image_URL = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH, Image))).get_attribute('src')
print(Image_URL)
Are there certain src's that can't be scraped, or am I scraping the wrong tag?
I've tried grabbing by tag but that also returns "None" as well.
Image_URL = driver.find_element_by_xpath(Image).get_attribute('src')
Other posts said WebDriverWait would help, but I've tried adjusting the wait time and am still receiving "None" too
发布评论
评论(1)
要打印src属性的值,您可以使用以下任一定位策略:
使用css_selector:
使用xpath:
理想情况下,您需要为 WebDriverWait /stackoverflow.com/a/50474905/7429447">visibility_of_element_ located() 并且您可以使用以下任一方法定位器策略:
使用CSS_SELECTOR:
使用XPATH:
注意:您必须添加以下导入:
To print the value of the src attribute you can use either of the following locator strategies:
Using css_selector:
Using xpath:
Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
Using XPATH:
Note : You have to add the following imports :