如何使用 Selenium 和 Python 从特定类中提取 href 信息

发布于 2025-01-18 12:40:11 字数 1089 浏览 0 评论 0原文

我目前正在使用 python 和 selenium 进行一些网络抓取，但我似乎无法从特定类的锚标记中的 href 中提取链接信息。作为参考，它来自 zillow （具体来说，此网址： https:// www.zillow.com/homes/for_rent/San-Francisco,-CA_rb/）。

我尝试了几种不同的选项来选择列出的锚标记，但似乎无法返回我需要的信息：

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))
 -- returns 
None

也尝试过

links = driver.find_elements(By.CLASS_NAME, "list-card-top")
print(links[0].get_attribute('href'))
 -- returns 
None

links = driver.find_elements(By.CLASS_NAME, "list-card-link list-card-link-top-margin")
print(links[0].get_attribute('href'))
 -- returns 
None

，最后

links = driver.find_elements(By.CSS_SELECTOR, "list-card-info.a")
print(links[0].get_attribute('href'))

我知道我可以拉出所有锚标记，但肯定有一个我在这里缺少获取嵌套锚标记值的步骤？或者我拉错了课？不确定我哪里出错了？

原文

I'm currently working on some web scraping using python and selenium, and I can't seem to pull the link information from a href in an anchor tag for a specific class. for reference, its from zillow (specifically, this url : https://www.zillow.com/homes/for_rent/San-Francisco,-CA_rb/ ).

I've tried a few different options in order to select the anchor tag listed but can't seem to return the information i need :

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))
 -- returns 
None

also tried

links = driver.find_elements(By.CLASS_NAME, "list-card-top")
print(links[0].get_attribute('href'))
 -- returns 
None

links = driver.find_elements(By.CLASS_NAME, "list-card-link list-card-link-top-margin")
print(links[0].get_attribute('href'))
 -- returns 
None

and lastly

links = driver.find_elements(By.CSS_SELECTOR, "list-card-info.a")
print(links[0].get_attribute('href'))

I know I can pull all the anchor tags, but certainly there is a step im missing here to get the nested anchor tag value? or am i pulling the wrong class? not sure where im going wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尬尬 2025-01-25 12:40:11

要打印 href 属性的价值，您必须诱导 webdriverwait 对于 visibility_of_all_elements_located（）：//stackoverflow.com/a/71640649/7429447“> list slicing 您可以使用以下任何一个定位器策略：

使用 css_selector ：：

 驱动程序%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088 ％2C％22NORTH％22％3A37.84716973355808％7D％2C％22区域选择％22％22％3A％5B％7B％22 RegionId％22％3A20330％2C％2C％22Regiontype％22 gregiontype％22％22％3A6％7D％7D％7D％7D％2C％2C％22％22％22％22％22％22％22％22％ ％2C％22filterState％22％3A％7B％22FSBA％22％3A％7b％22Value％22％22％3AFALSE％7D％2C％22NC％22％22％22％3A％7B％22Value％22％22％3AFALSE％3afalse％3AFALSE％3AFALSE％7d％2C％2C％22 fore％22 ％3A％7B％22Value％22％3AFALSE％7D％2C％22CMSN％22％22％3A％7b％22VALUE％22％22％3AFALSE％7D％2C％2C％22fr％22％22％3A％3A％7B％22VALUE％22VALUE％22％22％3ATRUE％3ATrue％7D％2C ％22AH％22％22％3A％7b％22VALUE％22％3Atrue％7D％7D％2C％22 issiscible％22％22％3Atrue％2C％22mapzoom％22mapzoom％22％3A11％7D'）
print（[my_elem.get_attribute（“ href”）在webdriverwait中为my_elem（驱动程序，20）.until（ec.visibility_of_all_elements_located（（（by.css_selector ]”）））））

在单行中

使用 xpath ：

 驱动程序%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088 ％2C％22NORTH％22％3A37.84716973355808％7D％2C％22区域选择％22％22％3A％5B％7B％22 RegionId％22％3A20330％2C％2C％22Regiontype％22 gregiontype％22％22％3A6％7D％7D％7D％7D％2C％2C％22％22％22％22％22％22％22％22％ ％2C％22filterState％22％3A％7B％22FSBA％22％3A％7b％22Value％22％22％3AFALSE％7D％2C％22NC％22％22％22％3A％7B％22Value％22％22％3AFALSE％3afalse％3AFALSE％3AFALSE％7d％2C％2C％22 fore％22 ％3A％7B％22Value％22％3AFALSE％7D％2C％22CMSN％22％22％3A％7b％22VALUE％22％22％3AFALSE％7D％2C％2C％22fr％22％22％3A％3A％7B％22VALUE％22VALUE％22％22％3ATRUE％3ATrue％7D％2C ％22AH％22％22％3A％7b％22VALUE％22％3ATrue％7D％7D％2C％22islistible％22％22％3Atrue％2C％22mapzoom％22mapzoom％22％3A11％7D'）
print（[my_elem.get_attribute（“ href”）for webdriverwait中的my_elem（驱动程序，20）.until（ec.visibility_of_all_ellements_located（（（by.xpath，“ // div，”@div [@class class ='list-card-top'a List-card-top'] [@href]”）））））

控制台输出：

  ['https://www.zillow.com/homedetails/san-francisco-ca-94134/15166498_zpid/'，' -francisco-ca-btfktx/'，'https://www.zillow.com/b/solaire-san-francisco-ca-65g7kk/'，'https://www.zillow.com/homedetails/117-saint -Charles-ave-san-francisco-ca-94132/15195262_zpid/'，' ：//www.zillow.com/homedetails/123-carl-san-francisco-ca-94117/2078490576_zpid/'，'，' ca-bdnypc/'，'https://www.zillow.com/b/l-seven-san-san-francisco-ca-9njtd7/'，'，' ST-SAN-FRANCISCO-CA-94114/332858409_ZPID/']

注意：您必须添加以下导入：

 来自selenium.webdriver.support.ui导入webdriverwait
从selenium.webdriver.common.通过进口
从selenium.webdriver.support进口预期_conditions作为ec

To print the value of the href attribute you have to induce WebDriverWait for the visibility_of_all_elements_located() and using list slicing you can use either of the following Locator Strategies:

Using CSS_SELECTOR:

driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088%2C%22north%22%3A37.84716973355808%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A11%7D')
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[class='list-card-top'] > a[href]")))])

Using XPATH in a single line:

driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088%2C%22north%22%3A37.84716973355808%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A11%7D')
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='list-card-top']/a[@href]")))])

Console Output:

['https://www.zillow.com/homedetails/San-Francisco-CA-94134/15166498_zpid/', 'https://www.zillow.com/b/avery-450-san-francisco-ca-BTfktx/', 'https://www.zillow.com/b/solaire-san-francisco-ca-65g7KK/', 'https://www.zillow.com/homedetails/117-Saint-Charles-Ave-San-Francisco-CA-94132/15195262_zpid/', 'https://www.zillow.com/homedetails/433-40th-Ave-San-Francisco-CA-94121/15092586_zpid/', 'https://www.zillow.com/homedetails/123-Carl-St-San-Francisco-CA-94117/2078490576_zpid/', 'https://www.zillow.com/b/fifteen-fifty-san-francisco-ca-BdnYPc/', 'https://www.zillow.com/b/l-seven-san-francisco-ca-9NJtD7/', 'https://www.zillow.com/homedetails/4642-18th-St-San-Francisco-CA-94114/332858409_zpid/']

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

回复收藏 0 原文

青春有你 2025-01-25 12:40:11

您可以使用XPATH查找链接（标签）并使用get_attribute（'href'）从标签中获取链接。

Like this:

href = driver.find_element(By.XPATH, '//div[@class="list-card-top"]/a').get_attribute('href')
print(href)

Another example:

href = driver.find_element(By.XPATH, '//div[@class="list-card-info"]/a').get_attribute('href')
print(href)

If you want to use By.CLASS_NAME, you could do it like this:

link = driver.find_element(By.CLASS_NAME, "list-card-top")
a = link.find_element(By.TAG_NAME, 'a').get_attribute('href')
print(href)

In your case:

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))

You're trying to find an attribute named 'href' in that div element with班级列表-Info。实际上，我们想从该div内的A标签中获取“ HREF”。

You could use XPATH to find the link (a tag) and use get_attribute('href') to get the link from the tag.

Like this:

href = driver.find_element(By.XPATH, '//div[@class="list-card-top"]/a').get_attribute('href')
print(href)

Another example:

href = driver.find_element(By.XPATH, '//div[@class="list-card-info"]/a').get_attribute('href')
print(href)

If you want to use By.CLASS_NAME, you could do it like this:

link = driver.find_element(By.CLASS_NAME, "list-card-top")
a = link.find_element(By.TAG_NAME, 'a').get_attribute('href')
print(href)

In your case:

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))

You're trying to find an attribute named 'href' in that div element with class list-card-info. We actually want to get the 'href' from the a tag inside that div.

回复收藏 0 原文

~没有更多了~