如何使用 Selenium 和 Python 从特定类中提取 href 信息
我目前正在使用 python 和 selenium 进行一些网络抓取,但我似乎无法从特定类的锚标记中的 href 中提取链接信息。作为参考,它来自 zillow (具体来说,此网址: https:// www.zillow.com/homes/for_rent/San-Francisco,-CA_rb/)。
我尝试了几种不同的选项来选择列出的锚标记,但似乎无法返回我需要的信息:
links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))
-- returns
None
也尝试过
links = driver.find_elements(By.CLASS_NAME, "list-card-top")
print(links[0].get_attribute('href'))
-- returns
None
links = driver.find_elements(By.CLASS_NAME, "list-card-link list-card-link-top-margin")
print(links[0].get_attribute('href'))
-- returns
None
,最后
links = driver.find_elements(By.CSS_SELECTOR, "list-card-info.a")
print(links[0].get_attribute('href'))
我知道我可以拉出所有锚标记,但肯定有一个我在这里缺少获取嵌套锚标记值的步骤?或者我拉错了课?不确定我哪里出错了?
I'm currently working on some web scraping using python and selenium, and I can't seem to pull the link information from a href in an anchor tag for a specific class. for reference, its from zillow (specifically, this url : https://www.zillow.com/homes/for_rent/San-Francisco,-CA_rb/ ).
I've tried a few different options in order to select the anchor tag listed but can't seem to return the information i need :
links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))
-- returns
None
also tried
links = driver.find_elements(By.CLASS_NAME, "list-card-top")
print(links[0].get_attribute('href'))
-- returns
None
links = driver.find_elements(By.CLASS_NAME, "list-card-link list-card-link-top-margin")
print(links[0].get_attribute('href'))
-- returns
None
and lastly
links = driver.find_elements(By.CSS_SELECTOR, "list-card-info.a")
print(links[0].get_attribute('href'))
I know I can pull all the anchor tags, but certainly there is a step im missing here to get the nested anchor tag value? or am i pulling the wrong class? not sure where im going wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
要打印 href 属性的价值,您必须诱导 webdriverwait 对于 visibility_of_all_elements_located() ://stackoverflow.com/a/71640649/7429447“> list slicing 您可以使用以下任何一个定位器策略:
使用 css_selector ::
使用 xpath :
控制台输出:
注意:您必须添加以下导入:
To print the value of the href attribute you have to induce WebDriverWait for the visibility_of_all_elements_located() and using list slicing you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
Using XPATH in a single line:
Console Output:
Note : You have to add the following imports :
您可以使用XPATH查找链接(标签)并使用
get_attribute('href')
从标签中获取链接。Like this:
Another example:
If you want to use
By.CLASS_NAME
, you could do it like this:In your case:
You're trying to find an attribute named 'href' in that div element with班级列表-Info。实际上,我们想从该div内的A标签中获取“ HREF”。
You could use XPATH to find the link (a tag) and use
get_attribute('href')
to get the link from the tag.Like this:
Another example:
If you want to use
By.CLASS_NAME
, you could do it like this:In your case:
You're trying to find an attribute named 'href' in that div element with class list-card-info. We actually want to get the 'href' from the a tag inside that div.