使用python中的硒刮擦阴影根(打开)标签

发布于 2025-02-08 15:05:56 字数 1855 浏览 3 评论 0原文

我必须使用Python删除最后更新日期的网页。下面给出了HTML页面的片段。

<dx bookmark="Save" copy="Copy URL" downvote="Vote down" edit="Edit this page" feedback="Log an issue" helpful="Is this page helpful?" locale="en" message="Contribute to this guide" lastupdate="Last Updated: {{value}}" share="Share" upvote="Vote up">

#shadow root (open)
<dx-container>
  <div data-id="bookmark" class="is-margin-bottom"></div>
  <div data-id="share" class="is-margin-bottom"><a class="copy" title="Share"><figure class="image 16x16 is-inline-block is-marginless is-margin-right"><img src="/assets/img/share.svg"></figure><span>Share</span></a></div>
  <div data-id="vote" class=""></div>
  <div data-id="updated" class="">Last Updated: June 13, 2022</div>
</dx-container>

<render slot="render"></render>

</dx>

DX标签的整个XPath:/html/body/div [2]/div [2]/div [2]/div [3]/div/div/div/div [2]/dx

到目前为止,我已经使用selenium编写了以下代码:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())
url = ' ' #To be read from a .txt file

driver.get(url)
def expand_shadow_element(element):
  shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
  return shadow_root

root1 = driver.find_element_by_tag_name('dx')
shadow_root1 = expand_shadow_element(root1)

root2 = shadow_root1.find_element_by_xpath('dx-container/div')    
print(root2) 

我正在遇到以下错误:

    root2 = shadow_root1.find_element_by_xpath('dx-container/div')        
AttributeError: 'ShadowRoot' object has no attribute 'find_element_by_xpath'

关于如何获得最后更新日期的任何想法?

I have to scrap a web page for the last updated date using python. A snippet of the html page is given below.

<dx bookmark="Save" copy="Copy URL" downvote="Vote down" edit="Edit this page" feedback="Log an issue" helpful="Is this page helpful?" locale="en" message="Contribute to this guide" lastupdate="Last Updated: {{value}}" share="Share" upvote="Vote up">

#shadow root (open)
<dx-container>
  <div data-id="bookmark" class="is-margin-bottom"></div>
  <div data-id="share" class="is-margin-bottom"><a class="copy" title="Share"><figure class="image 16x16 is-inline-block is-marginless is-margin-right"><img src="/assets/img/share.svg"></figure><span>Share</span></a></div>
  <div data-id="vote" class=""></div>
  <div data-id="updated" class="">Last Updated: June 13, 2022</div>
</dx-container>

<render slot="render"></render>

</dx>

Entire XPath of dx tag : /html/body/div[2]/div[2]/div[2]/div[3]/div/div[2]/dx

So far I have written the following code using Selenium :

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())
url = ' ' #To be read from a .txt file

driver.get(url)
def expand_shadow_element(element):
  shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
  return shadow_root

root1 = driver.find_element_by_tag_name('dx')
shadow_root1 = expand_shadow_element(root1)

root2 = shadow_root1.find_element_by_xpath('dx-container/div')    
print(root2) 

I am running into the following error :

    root2 = shadow_root1.find_element_by_xpath('dx-container/div')        
AttributeError: 'ShadowRoot' object has no attribute 'find_element_by_xpath'

Any idea on how to get the last updated date?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

深爱不及久伴 2025-02-15 15:05:56

您需要在Shadowroot(打开)上方获取元素,该元素是其中的DX书签,然后使用Shadowroot,然后xpath。另外,请确保使用驱动程序。在较高版本的硒中,也有一个内置功能,用于阴影根。

4.1+

root1=driver.find_element(By.XPATH,"//dx[@bookmark='Save']").shadow_root
root2=root1.find_element(By.XPATH,".//dx-container/div")

其他

root1=driver.find_element(By.XPATH,"//dx[@bookmark='Save']")
shadow_root1 = expand_shadow_element(root1)
root2=shadow_root1.find_element(By.XPATH,".//dx-container/div")

You need to get the element above the shadowroot(open) which is the dx bookmark from there use shadowroot, and then xpath it. Also make sure to use driver.find_element(By.XPATH,"") as the other is depreciated. In the higher versions of selenium there is a built in function for shadow roots as well.

4.1+

root1=driver.find_element(By.XPATH,"//dx[@bookmark='Save']").shadow_root
root2=root1.find_element(By.XPATH,".//dx-container/div")

Other's

root1=driver.find_element(By.XPATH,"//dx[@bookmark='Save']")
shadow_root1 = expand_shadow_element(root1)
root2=shadow_root1.find_element(By.XPATH,".//dx-container/div")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文