硒:在Webelement中的特定文本之后获取所有元素

发布于 2025-02-13 21:39:31 字数 508 浏览 5 评论 0原文

我们有以下HTML:

<div>
  <img alt="Guest" >
  Bobby gave:   
  <img>
  <img>
  <img>
  <img>
   and took   
  <img>
  <img>
</div>

我想在第一个文本和第二文本之间获得所有图像元素。然后在第二文本之后分开所有IMG元素。

IMG元素的数量各不相同,因此以下硒代码无法正常工作:

message = driver.find_element(By.tag_name, 'div')
imgs_1 = message.find_elements(By.tag_name, 'img')[1:4]
imgs_2 = message.find_elements(By.tag_name, 'img')[5:]

XPATH或其他内容的任何建议?

We have the following Html:

<div>
  <img alt="Guest" >
  Bobby gave:   
  <img>
  <img>
  <img>
  <img>
   and took   
  <img>
  <img>
</div>

I want to get all image elements between the first text and the second text. And then seperatly all the img elements after the second text.

The amount of img elements varies so the following selenium code wont work:

message = driver.find_element(By.tag_name, 'div')
imgs_1 = message.find_elements(By.tag_name, 'img')[1:4]
imgs_2 = message.find_elements(By.tag_name, 'img')[5:]

Any suggestions with xpath or something else?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

糖粟与秋泊 2025-02-20 21:39:32

您应该能够缩短此代码或为您的需求进行自定义。解释您想要的事情很漫长。在这里是:

# Using JS to get all the child nodes from div element
nodes = driver.execute_script('return document.querySelector("div").childNodes')

# Iteration over `nodes` list which modifies nodes list to:
# (1) store text value for #text nodes found in Child nodes
# (2) remove value for #text nodes which are empty lines

for c, node in enumerate(nodes):
    if isinstance(node, dict):
        node = node['textContent'].strip()
        if node:
            nodes[c] = node
        else:
            nodes.pop(c)
    else:
        nodes[c] = node


# a list that would store index of valid #text nodes in `nodes` list
index_texts = []

for c, j in enumerate(nodes):
    if isinstance(j, str):
        index_texts.append(c)

# for convenience only
def node_type(o):
    if isinstance(o, str):
        return "String     :"
    else:
        return "Web_element:"

# this is to show you a replica of elements that you posted in your sample code, 
# but it is now available in a Python list for you.

print("##Print HTML child nodes replica as a Python list##\n")
for node in nodes:
    print(node_type(node), node)

print("------------")

# self-explanatory
print("##Print all elements between first two string/text element##\n")
for x in range(index_texts[0]+1, index_texts[1]):
    v = nodes[x]
    print(node_type(v), v)


print("------------")

# self-explanatory
print("##Print all elements after second string/text element##\n")
for m in range(index_texts[1]+1, len(nodes)):
    v = nodes[m]
    print(node_type(v), v)

输出:

##Print HTML child nodes replica as a Python list##

Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="b4ea1739-3f9c-4ff2-8750-8caf7b30aad5")>
String     : Bobby gave:
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="6f45369d-7c92-4cf3-ae64-1d70f8576708")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="4e73eda0-eae0-4915-b8a8-d846e93d5552")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="bf5a0f24-8772-468a-9a68-cdb183ba23bd")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="cb3eb86f-e53d-4089-8a2c-0deb96d0eff2")>
String     : and took
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="992b0798-9738-4b72-a706-56d8b74b0065")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="21fea20b-51fd-4926-bd5e-e78b781f2850")>
------------
##Print all elements between first two string/text element##

Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="6f45369d-7c92-4cf3-ae64-1d70f8576708")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="4e73eda0-eae0-4915-b8a8-d846e93d5552")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="bf5a0f24-8772-468a-9a68-cdb183ba23bd")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="cb3eb86f-e53d-4089-8a2c-0deb96d0eff2")>
------------
##Print all elements after second string/text element##

Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="992b0798-9738-4b72-a706-56d8b74b0065")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="21fea20b-51fd-4926-bd5e-e78b781f2850")>

现在。我给了你节点列表。如果您的要求有进一步的复杂性,假设您的HTML中有三个文本/字符串值,则需要IMG或第二个文本值之间的其他类型元素,那么您应该能够从该python列表中获取这些元素使用一点逻辑。

You should be able to shorten this code or customize it for your needs. It is lengthy to explain things you want. Here it is:

# Using JS to get all the child nodes from div element
nodes = driver.execute_script('return document.querySelector("div").childNodes')

# Iteration over `nodes` list which modifies nodes list to:
# (1) store text value for #text nodes found in Child nodes
# (2) remove value for #text nodes which are empty lines

for c, node in enumerate(nodes):
    if isinstance(node, dict):
        node = node['textContent'].strip()
        if node:
            nodes[c] = node
        else:
            nodes.pop(c)
    else:
        nodes[c] = node


# a list that would store index of valid #text nodes in `nodes` list
index_texts = []

for c, j in enumerate(nodes):
    if isinstance(j, str):
        index_texts.append(c)

# for convenience only
def node_type(o):
    if isinstance(o, str):
        return "String     :"
    else:
        return "Web_element:"

# this is to show you a replica of elements that you posted in your sample code, 
# but it is now available in a Python list for you.

print("##Print HTML child nodes replica as a Python list##\n")
for node in nodes:
    print(node_type(node), node)

print("------------")

# self-explanatory
print("##Print all elements between first two string/text element##\n")
for x in range(index_texts[0]+1, index_texts[1]):
    v = nodes[x]
    print(node_type(v), v)


print("------------")

# self-explanatory
print("##Print all elements after second string/text element##\n")
for m in range(index_texts[1]+1, len(nodes)):
    v = nodes[m]
    print(node_type(v), v)

Output:

##Print HTML child nodes replica as a Python list##

Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="b4ea1739-3f9c-4ff2-8750-8caf7b30aad5")>
String     : Bobby gave:
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="6f45369d-7c92-4cf3-ae64-1d70f8576708")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="4e73eda0-eae0-4915-b8a8-d846e93d5552")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="bf5a0f24-8772-468a-9a68-cdb183ba23bd")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="cb3eb86f-e53d-4089-8a2c-0deb96d0eff2")>
String     : and took
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="992b0798-9738-4b72-a706-56d8b74b0065")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="21fea20b-51fd-4926-bd5e-e78b781f2850")>
------------
##Print all elements between first two string/text element##

Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="6f45369d-7c92-4cf3-ae64-1d70f8576708")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="4e73eda0-eae0-4915-b8a8-d846e93d5552")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="bf5a0f24-8772-468a-9a68-cdb183ba23bd")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="cb3eb86f-e53d-4089-8a2c-0deb96d0eff2")>
------------
##Print all elements after second string/text element##

Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="992b0798-9738-4b72-a706-56d8b74b0065")>
Web_element: <selenium.webdriver.remote.webelement.WebElement (session="db7d464bace423bc697cebd182a51f4d", element="21fea20b-51fd-4926-bd5e-e78b781f2850")>

Now. I gave you nodes list. If you have further complexity in your requirements, let's say you have three text/string values in your HTML and you want IMG or some other type elements between second and third text values, then you should be able to get those elements from that Python list using a little bit of logic.

会傲 2025-02-20 21:39:31

如果您都知道两个文本值,则可以在'Bobby给出:''之间选择图像,然后

//div/text()[normalize-space()='Bobby gave:']/following-sibling::img[following-sibling::text()[normalize-space(.)='and took']]

在第二个文本节点之后使用XPATH images进行',您可以使用

//div/text()[normalize-space()='and took']/following-sibling::img

If you know both text values then you can select images between 'Bobby gave:' and 'and took' with XPath

//div/text()[normalize-space()='Bobby gave:']/following-sibling::img[following-sibling::text()[normalize-space(.)='and took']]

Images after second text node you can select with

//div/text()[normalize-space()='and took']/following-sibling::img
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文