使用python中的硒刮擦阴影根(打开)标签
我必须使用Python删除最后更新日期的网页。下面给出了HTML页面的片段。
<dx bookmark="Save" copy="Copy URL" downvote="Vote down" edit="Edit this page" feedback="Log an issue" helpful="Is this page helpful?" locale="en" message="Contribute to this guide" lastupdate="Last Updated: {{value}}" share="Share" upvote="Vote up">
#shadow root (open)
<dx-container>
<div data-id="bookmark" class="is-margin-bottom"></div>
<div data-id="share" class="is-margin-bottom"><a class="copy" title="Share"><figure class="image 16x16 is-inline-block is-marginless is-margin-right"><img src="/assets/img/share.svg"></figure><span>Share</span></a></div>
<div data-id="vote" class=""></div>
<div data-id="updated" class="">Last Updated: June 13, 2022</div>
</dx-container>
<render slot="render"></render>
</dx>
DX标签的整个XPath:/html/body/div [2]/div [2]/div [2]/div [3]/div/div/div/div [2]/dx
到目前为止,我已经使用selenium编写了以下代码:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
url = ' ' #To be read from a .txt file
driver.get(url)
def expand_shadow_element(element):
shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
return shadow_root
root1 = driver.find_element_by_tag_name('dx')
shadow_root1 = expand_shadow_element(root1)
root2 = shadow_root1.find_element_by_xpath('dx-container/div')
print(root2)
我正在遇到以下错误:
root2 = shadow_root1.find_element_by_xpath('dx-container/div')
AttributeError: 'ShadowRoot' object has no attribute 'find_element_by_xpath'
关于如何获得最后更新日期的任何想法?
I have to scrap a web page for the last updated date using python. A snippet of the html page is given below.
<dx bookmark="Save" copy="Copy URL" downvote="Vote down" edit="Edit this page" feedback="Log an issue" helpful="Is this page helpful?" locale="en" message="Contribute to this guide" lastupdate="Last Updated: {{value}}" share="Share" upvote="Vote up">
#shadow root (open)
<dx-container>
<div data-id="bookmark" class="is-margin-bottom"></div>
<div data-id="share" class="is-margin-bottom"><a class="copy" title="Share"><figure class="image 16x16 is-inline-block is-marginless is-margin-right"><img src="/assets/img/share.svg"></figure><span>Share</span></a></div>
<div data-id="vote" class=""></div>
<div data-id="updated" class="">Last Updated: June 13, 2022</div>
</dx-container>
<render slot="render"></render>
</dx>
Entire XPath of dx tag : /html/body/div[2]/div[2]/div[2]/div[3]/div/div[2]/dx
So far I have written the following code using Selenium :
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
url = ' ' #To be read from a .txt file
driver.get(url)
def expand_shadow_element(element):
shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
return shadow_root
root1 = driver.find_element_by_tag_name('dx')
shadow_root1 = expand_shadow_element(root1)
root2 = shadow_root1.find_element_by_xpath('dx-container/div')
print(root2)
I am running into the following error :
root2 = shadow_root1.find_element_by_xpath('dx-container/div')
AttributeError: 'ShadowRoot' object has no attribute 'find_element_by_xpath'
Any idea on how to get the last updated date?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要在Shadowroot(打开)上方获取元素,该元素是其中的DX书签,然后使用Shadowroot,然后xpath。另外,请确保使用驱动程序。在较高版本的硒中,也有一个内置功能,用于阴影根。
4.1+
其他
You need to get the element above the shadowroot(open) which is the dx bookmark from there use shadowroot, and then xpath it. Also make sure to use driver.find_element(By.XPATH,"") as the other is depreciated. In the higher versions of selenium there is a built in function for shadow roots as well.
4.1+
Other's