当前位置：文江博客话题详情

Python Selenium - 如何使用 Selenium 和 Python 从 src 属性中抓取 URL

发布于 2025-01-19 09:24:47 字数 1577 浏览 0 评论 0 原文

我正在尝试下载一堆图像并使用 Selenium 将它们分类到文件夹中。为此，我需要获取与 URL 中的每个图像关联的两个 ID。但是，我在从 src 属性中抓取图像链接时遇到问题。无论我尝试通过标签、Xpath 还是其他方法抓取，最终结果都只是“无”。

以下是已检查图像页面的示例：

<html style="height: 100%;"
    ><head><meta name="viewport" content="width=device-width, minimum-scale=0.1"> 
        <title>index.php (2448×3264)</title>
       </head>
    <body style="margin: 0px; background: #0e0e0e; height: 100%">
        <img style="-webkit-user-select: none;margin: auto;cursor: zoom-in;background-color: hsl(0, 0%, 90%);transition: background-color 300ms;" src="https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=LQCMY&amp;fieldname=DT006_picture&amp;p=show" width="444" height="593">
   </body>
 </html>

对于此示例，我需要从上面的 URL 中获取“LQCMY”和“DT006_picture”作为字符串。下面的代码显示了我尝试抓取 URL 链接的尝试（由于我之前点击的屏幕被锁定在我无法给出的密码后面，因此进行了编辑）。

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Image = '/html/body/div[1]/div[2]/div/table/tbody/tr[1]/td[1]/a'
driver.find_element_by_xpath(Image).click()
Image_URL = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH, Image))).get_attribute('src')
print(Image_URL)

是否存在某些无法抓取的 src，或者我抓取了错误的标签？

我尝试过按标签抓取，但也返回“无”。

Image_URL = driver.find_element_by_xpath(Image).get_attribute('src')

其他帖子说 WebDriverWait 会有所帮助，但我尝试调整等待时间，但仍然收到“无”

原文

I'm trying to download a bunch of images and categorize them into folders using Selenium. To do so, I need to grab two ID's associated with each image within the URL. However I'm having trouble scraping the image link from the src attribute. Whether I try to grab by tag, Xpath, or other method the end result is merely "None".

Here's an example of an inspected image page:

<html style="height: 100%;"
    ><head><meta name="viewport" content="width=device-width, minimum-scale=0.1"> 
        <title>index.php (2448×3264)</title>
       </head>
    <body style="margin: 0px; background: #0e0e0e; height: 100%">
        <img style="-webkit-user-select: none;margin: auto;cursor: zoom-in;background-color: hsl(0, 0%, 90%);transition: background-color 300ms;" src="https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=LQCMY&fieldname=DT006_picture&p=show" width="444" height="593">
   </body>
 </html>

For this example, I would need to grab "LQCMY" and "DT006_picture" as strings from the URL above. The code below shows my attempt at scraping the URL link (edited down since prior screens I click through are locked behind passwords that I can't give out).

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Image = '/html/body/div[1]/div[2]/div/table/tbody/tr[1]/td[1]/a'
driver.find_element_by_xpath(Image).click()
Image_URL = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH, Image))).get_attribute('src')
print(Image_URL)

Are there certain src's that can't be scraped, or am I scraping the wrong tag?

I've tried grabbing by tag but that also returns "None" as well.

Image_URL = driver.find_element_by_xpath(Image).get_attribute('src')

Other posts said WebDriverWait would help, but I've tried adjusting the wait time and am still receiving "None" too

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱你是孤单的心事 2025-01-26 09:24:47

要打印src属性的值，您可以使用以下任一定位策略：

使用css_selector：

print(driver.find_element_by_css_selector("body img[style*='webkit-user-select'][src^='https://haalsi.net/haalsi_pride2/custom/picture/index.php? id=']").get_attribute("src"))

使用xpath：

print(driver.find_element_by_xpath("//body//img[contains(@style, 'webkit-user-select') andstarts-with(@src, 'https://haalsi.net/) haalsi_pride2/custom/picture/index.php?id=')]").get_attribute("src"))

理想情况下，您需要为 WebDriverWait /stackoverflow.com/a/50474905/7429447">visibility_of_element_ located() 并且您可以使用以下任一方法定位器策略：

使用CSS_SELECTOR：

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_ located((By.CSS_SELECTOR, "body img[style*='webkit-user-select'][src^='https:// haalsi.net/haalsi_pride2/custom/picture/index.php?id=']"))).get_attribute("src"))

使用XPATH：

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_ located((By.XPATH, "//body//img[contains(@style, 'webkit-user-select') ) 并开始-与（@src， 'https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=')]"))).get_attribute("src"))

注意：您必须添加以下导入：

从 selenium.webdriver.support.ui 导入 WebDriverWait
从 selenium.webdriver.common.by 导入
从 selenium.webdriver.support 导入预期条件作为 EC

您可以在Python Selenium - 获取href值中找到相关讨论

To print the value of the src attribute you can use either of the following locator strategies:

Using css_selector:

print(driver.find_element_by_css_selector("body img[style*='webkit-user-select'][src^='https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=']").get_attribute("src"))

Using xpath:

print(driver.find_element_by_xpath("//body//img[contains(@style, 'webkit-user-select') and starts-with(@src, 'https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=')]").get_attribute("src"))

Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:

Using CSS_SELECTOR:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "body img[style*='webkit-user-select'][src^='https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=']"))).get_attribute("src"))

Using XPATH:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//body//img[contains(@style, 'webkit-user-select') and starts-with(@src, 'https://haalsi.net/haalsi_pride2/custom/picture/index.php?id=')]"))).get_attribute("src"))

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in Python Selenium - get href value

回复收藏 0 原文

~没有更多了~

关于作者

番薯

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

Python Selenium - 如何使用 Selenium 和 Python 从 src 属性中抓取 URL

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

Python Selenium - 如何使用 Selenium 和 Python 从 src 属性中抓取 URL

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。