如何使用 Selenium 和 Python 从特定类中提取 href 信息

发布于 2025-01-18 12:40:11 字数 1089 浏览 0 评论 0原文

我目前正在使用 python 和 selenium 进行一些网络抓取,但我似乎无法从特定类的锚标记中的 href 中提取链接信息。作为参考,它来自 zillow (具体来说,此网址: https:// www.zillow.com/homes/for_rent/San-Francisco,-CA_rb/)。

输入图片此处描述

我尝试了几种不同的选项来选择列出的锚标记,但似乎无法返回我需要的信息:

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))
 -- returns 
None

也尝试过

links = driver.find_elements(By.CLASS_NAME, "list-card-top")
print(links[0].get_attribute('href'))
 -- returns 
None

links = driver.find_elements(By.CLASS_NAME, "list-card-link list-card-link-top-margin")
print(links[0].get_attribute('href'))
 -- returns 
None

,最后

links = driver.find_elements(By.CSS_SELECTOR, "list-card-info.a")
print(links[0].get_attribute('href'))

我知道我可以拉出所有锚标记,但肯定有一个我在这里缺少获取嵌套锚标记值的步骤?或者我拉错了课?不确定我哪里出错了?

I'm currently working on some web scraping using python and selenium, and I can't seem to pull the link information from a href in an anchor tag for a specific class. for reference, its from zillow (specifically, this url : https://www.zillow.com/homes/for_rent/San-Francisco,-CA_rb/ ).

enter image description here

I've tried a few different options in order to select the anchor tag listed but can't seem to return the information i need :

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))
 -- returns 
None

also tried

links = driver.find_elements(By.CLASS_NAME, "list-card-top")
print(links[0].get_attribute('href'))
 -- returns 
None

links = driver.find_elements(By.CLASS_NAME, "list-card-link list-card-link-top-margin")
print(links[0].get_attribute('href'))
 -- returns 
None

and lastly

links = driver.find_elements(By.CSS_SELECTOR, "list-card-info.a")
print(links[0].get_attribute('href'))

I know I can pull all the anchor tags, but certainly there is a step im missing here to get the nested anchor tag value? or am i pulling the wrong class? not sure where im going wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

尬尬 2025-01-25 12:40:11

要打印 href 属性的价值,您必须诱导 webdriverwait 对于 visibility_of_all_elements_located() ://stackoverflow.com/a/71640649/7429447“> list slicing 您可以使用以下任何一个定位器策略

  • 使用 css_selector ::

     驱动程序%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088 %2C%22NORTH%22%3A37.84716973355808%7D%2C%22区域选择%22%22%3A%5B%7B%22 RegionId%22%3A20330%2C%2C%22Regiontype%22 gregiontype%22%22%3A6%7D%7D%7D%7D%2C%2C%22%22%22%22%22%22%22%22% %2C%22filterState%22%3A%7B%22FSBA%22%3A%7b%22Value%22%22%3AFALSE%7D%2C%22NC%22%22%22%3A%7B%22Value%22%22%3AFALSE%3afalse%3AFALSE%3AFALSE%7d%2C%2C%22 fore%22 %3A%7B%22Value%22%3AFALSE%7D%2C%22CMSN%22%22%3A%7b%22VALUE%22%22%3AFALSE%7D%2C%2C%22fr%22%22%3A%3A%7B%22VALUE%22VALUE%22%22%3ATRUE%3ATrue%7D%2C %22AH%22%22%3A%7b%22VALUE%22%3Atrue%7D%7D%2C%22 issiscible%22%22%3Atrue%2C%22mapzoom%22mapzoom%22%3A11%7D')
    print([my_elem.get_attribute(“ href”)在webdriverwait中为my_elem(驱动程序,20).until(ec.visibility_of_all_elements_located(((by.css_selector ]”)))))
     
  • 在单行中

    使用 xpath

     驱动程序%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088 %2C%22NORTH%22%3A37.84716973355808%7D%2C%22区域选择%22%22%3A%5B%7B%22 RegionId%22%3A20330%2C%2C%22Regiontype%22 gregiontype%22%22%3A6%7D%7D%7D%7D%2C%2C%22%22%22%22%22%22%22%22% %2C%22filterState%22%3A%7B%22FSBA%22%3A%7b%22Value%22%22%3AFALSE%7D%2C%22NC%22%22%22%3A%7B%22Value%22%22%3AFALSE%3afalse%3AFALSE%3AFALSE%7d%2C%2C%22 fore%22 %3A%7B%22Value%22%3AFALSE%7D%2C%22CMSN%22%22%3A%7b%22VALUE%22%22%3AFALSE%7D%2C%2C%22fr%22%22%3A%3A%7B%22VALUE%22VALUE%22%22%3ATRUE%3ATrue%7D%2C %22AH%22%22%3A%7b%22VALUE%22%3ATrue%7D%7D%2C%22islistible%22%22%3Atrue%2C%22mapzoom%22mapzoom%22%3A11%7D')
    print([my_elem.get_attribute(“ href”)for webdriverwait中的my_elem(驱动程序,20).until(ec.visibility_of_all_ellements_located(((by.xpath,“ // div,”@div [@class class ='list-card-top'a List-card-top'] [@href]”)))))
     
  • 控制台输出:

      ['https://www.zillow.com/homedetails/san-francisco-ca-94134/15166498_zpid/',' -francisco-ca-btfktx/','https://www.zillow.com/b/solaire-san-francisco-ca-65g7kk/','https://www.zillow.com/homedetails/117-saint -Charles-ave-san-francisco-ca-94132/15195262_zpid/',' ://www.zillow.com/homedetails/123-carl-san-francisco-ca-94117/2078490576_zpid/',',' ca-bdnypc/','https://www.zillow.com/b/l-seven-san-san-francisco-ca-9njtd7/',',' ST-SAN-FRANCISCO-CA-94114/332858409_ZPID/']
     
  • 注意:您必须添加以下导入:

     来自selenium.webdriver.support.ui导入webdriverwait
    从selenium.webdriver.common.通过进口
    从selenium.webdriver.support进口预期_conditions作为ec
     

To print the value of the href attribute you have to induce WebDriverWait for the visibility_of_all_elements_located() and using list slicing you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088%2C%22north%22%3A37.84716973355808%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A11%7D')
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[class='list-card-top'] > a[href]")))])
    
  • Using XPATH in a single line:

    driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088%2C%22north%22%3A37.84716973355808%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A11%7D')
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='list-card-top']/a[@href]")))])
    
  • Console Output:

    ['https://www.zillow.com/homedetails/San-Francisco-CA-94134/15166498_zpid/', 'https://www.zillow.com/b/avery-450-san-francisco-ca-BTfktx/', 'https://www.zillow.com/b/solaire-san-francisco-ca-65g7KK/', 'https://www.zillow.com/homedetails/117-Saint-Charles-Ave-San-Francisco-CA-94132/15195262_zpid/', 'https://www.zillow.com/homedetails/433-40th-Ave-San-Francisco-CA-94121/15092586_zpid/', 'https://www.zillow.com/homedetails/123-Carl-St-San-Francisco-CA-94117/2078490576_zpid/', 'https://www.zillow.com/b/fifteen-fifty-san-francisco-ca-BdnYPc/', 'https://www.zillow.com/b/l-seven-san-francisco-ca-9NJtD7/', 'https://www.zillow.com/homedetails/4642-18th-St-San-Francisco-CA-94114/332858409_zpid/']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
青春有你 2025-01-25 12:40:11

您可以使用XPATH查找链接(标签)并使用get_attribute('href')从标签中获取链接。

Like this:

href = driver.find_element(By.XPATH, '//div[@class="list-card-top"]/a').get_attribute('href')
print(href)

Another example:

href = driver.find_element(By.XPATH, '//div[@class="list-card-info"]/a').get_attribute('href')
print(href)

If you want to use By.CLASS_NAME, you could do it like this:

link = driver.find_element(By.CLASS_NAME, "list-card-top")
a = link.find_element(By.TAG_NAME, 'a').get_attribute('href')
print(href)

In your case:

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))

You're trying to find an attribute named 'href' in that div element with班级列表-Info。实际上,我们想从该div内的A标签中获取“ HREF”。

You could use XPATH to find the link (a tag) and use get_attribute('href') to get the link from the tag.

Like this:

href = driver.find_element(By.XPATH, '//div[@class="list-card-top"]/a').get_attribute('href')
print(href)

Another example:

href = driver.find_element(By.XPATH, '//div[@class="list-card-info"]/a').get_attribute('href')
print(href)

If you want to use By.CLASS_NAME, you could do it like this:

link = driver.find_element(By.CLASS_NAME, "list-card-top")
a = link.find_element(By.TAG_NAME, 'a').get_attribute('href')
print(href)

In your case:

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))

You're trying to find an attribute named 'href' in that div element with class list-card-info. We actually want to get the 'href' from the a tag inside that div.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文