从 div 类提取数据 Python Selenium

发布于 2025-01-17 08:00:28 字数 1227 浏览 0 评论 0 原文

我试图从 Python Selenium 中的 div 类中提取特定的数字,但不知道该怎么做。我想要获取“post_parent”ID 947630,只要它与以 09007 开头的“post_name”号码匹配即可。

我希望在多个“post_name”类中执行此操作,因此我会向其提供如下内容:search_text =“0900766b80090cb6”,但将来会有多个,因此它必须读取首先“post_name”,然后拉“post_parent”(如果有意义的话)。

感谢任何人提供的任何建议。

    <div class="hidden" id="inline_947631">
    <div class="post_title">Interface Converter</div>
    <div class="post_name">0900766b80090cb6</div>
    <div class="post_author">28</div>
    <div class="comment_status">closed</div>
    <div class="ping_status">closed</div>
    <div class="_status">inherit</div>
    <div class="jj">06</div>
    <div class="mm">07</div>
    <div class="aa">2001</div>
    <div class="hh">15</div>
    <div class="mn">44</div>
    <div class="ss">17</div>
    <div class="post_password"></div>
    <div class="post_parent">947630</div>
    <div class="page_template">default</div>
    <div class="tags_input" id="rs-language-code_947631">de</div>
    </div>

I'm trying to pull a specific number out of a div class in Python Selenium but can't figure out how to do it. I'd want to get the "post_parent" ID 947630 as long as it matches the "post_name" number starting 09007.

I'm looking to do this across multiple "post_name" classes, so I'd feed it something like this: search_text = "0900766b80090cb6", but there will be multiple in the future so it has to read the "post_name" first then pull the "post_parent" if that makes sense.

Appreciate any advice anyone has to offer.

    <div class="hidden" id="inline_947631">
    <div class="post_title">Interface Converter</div>
    <div class="post_name">0900766b80090cb6</div>
    <div class="post_author">28</div>
    <div class="comment_status">closed</div>
    <div class="ping_status">closed</div>
    <div class="_status">inherit</div>
    <div class="jj">06</div>
    <div class="mm">07</div>
    <div class="aa">2001</div>
    <div class="hh">15</div>
    <div class="mn">44</div>
    <div class="ss">17</div>
    <div class="post_password"></div>
    <div class="post_parent">947630</div>
    <div class="page_template">default</div>
    <div class="tags_input" id="rs-language-code_947631">de</div>
    </div>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

下壹個目標 2025-01-24 08:00:29

如果您看到

0900766b80090cb6

这和

947630

是兄弟节点彼此相连。

您可以使用 xpath -> follow-sibling 像这样:

代码:

search_text = "0900766b80090cb6"
post_parent_num = driver.find_element(By.XPATH, f"//div[@class='post_name' and text()='{search_text}']//following-sibling::div[@class='post_parent']").text
print(post_parent_num)

或使用 ExplicitWait:

search_text = "0900766b80090cb6"
post_parent_num = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, f"//div[@class='post_name' and text()='{search_text}']//following-sibling::div[@class='post_parent']"))).get_attribute('innerText')
print(post_parent_num)

导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

更新:

NoSuchElementException:

请检查在开发工具(Google Chrome)中,我们是否在HTML-DOM中有唯一条目。

您应该检查的 xpath :

//div[@class='post_name' and text()='0900766b80090cb6']//following-sibling::div[@class='post_parent']

检查步骤:

在 Chrome 中按 F12 ->转到 element 部分 ->执行 CTRL + F ->然后粘贴xpath并查看您所需的元素是否通过1/1匹配节点突出显示

如果这是唯一的 //div[@class='post_name' and text()='0900766b80090cb6']//following-sibling::div[@class='post_parent'] 那么你需要还要检查以下条件。

  1. 检查它是否在任何 iframe/frame/frameset 中。

    解决方案:先切换到 iframe/frame/frameset,然后与此 Web 元素交互。

  2. 检查它是否在任何 shadow-root 中。

    解决方案:使用driver.execute_script('return document.querySelector)返回一个Web元素,然后进行相应的操作。

  3. 在交互之前确保该元素正确渲染加上一些硬编码延迟显式等待,然后重试。

    解决方案: time.sleep(5)

    WebDriverWait(driver, 20).until(EC.visibility_of_element_ located((By.XPATH, "//div[@class='post_name' and text()='0900766b80090cb6']//以下同级: :div[@class='post_parent']"))).text

  4. 如果您已重定向到新选项卡/或新窗口并且您尚未切换到该特定新选项卡/新窗口,否则您可能会得到NoSuchElement异常。

    解决方案:先切换到相关窗口/选项卡。

  5. 如果您已切换到 iframe 并且新的所需元素不在同一 iframe 上下文中,则首先切换到默认内容,然后与其交互。

    解决方案:切换到默认内容,然后切换到相应的 iframe。

If you see <div class="post_name">0900766b80090cb6</div> this and <div class="post_parent">947630</div> are siblings nodes to each other.

You can use xpath -> following-sibling like this:

Code:

search_text = "0900766b80090cb6"
post_parent_num = driver.find_element(By.XPATH, f"//div[@class='post_name' and text()='{search_text}']//following-sibling::div[@class='post_parent']").text
print(post_parent_num)

or Using ExplicitWait:

search_text = "0900766b80090cb6"
post_parent_num = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, f"//div[@class='post_name' and text()='{search_text}']//following-sibling::div[@class='post_parent']"))).get_attribute('innerText')
print(post_parent_num)

Imports:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Update:

NoSuchElementException:

Please check in the dev tools (Google chrome) if we have unique entry in HTML-DOM or not.

xpath that you should check :

//div[@class='post_name' and text()='0900766b80090cb6']//following-sibling::div[@class='post_parent']

Steps to check:

Press F12 in Chrome -> go to element section -> do a CTRL + F -> then paste the xpath and see, if your desired element is getting highlighted with 1/1 matching node.

If this is unique //div[@class='post_name' and text()='0900766b80090cb6']//following-sibling::div[@class='post_parent'] then you need to check for the below conditions as well.

  1. Check if it's in any iframe/frame/frameset.

    Solution: switch to iframe/frame/frameset first and then interact with this web element.

  2. Check if it's in any shadow-root.

    Solution: Use driver.execute_script('return document.querySelector to have returned a web element and then operates accordingly.

  3. Make sure that the element is rendered properly before interacting with it. Put some hardcoded delay or Explicit wait and try again.

    Solution: time.sleep(5) or

    WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='post_name' and text()='0900766b80090cb6']//following-sibling::div[@class='post_parent']"))).text

  4. If you have redirected to a new tab/ or new windows and you have not switched to that particular new tab/new window, otherwise you will likely get NoSuchElement exception.

    Solution: switch to the relevant window/tab first.

  5. If you have switched to an iframe and the new desired element is not in the same iframe context then first switch to default content and then interact with it.

    Solution: switch to default content and then switch to respective iframe.

且行且努力 2025-01-24 08:00:29

我没有看到“post_parent”ID 947630 和以 09007 开头的“post_name”号码之间有任何特定关系。此外,父级

具有 class="hidden"

但是,要提取特定数字,您可以使用以下任一定位器策略< /em>

  • 使用css_selector

    print(driver.find_element(By.CSS_SELECTOR, "div[id^='inline'] div.post_parent").text)
    
  • 使用xpath

    print(driver.find_element(By.XPATH, "//div[starts-with(@id, 'inline_')]//div[@class='post_parent']").text)
    

理想情况下,您需要为 WebDriverWait /stackoverflow.com/a/57313803/7429447">presence_of_element_ located() 并且您可以使用以下任一方法定位器策略

  • 使用CSS_SELECTOR

    print(WebDriverWait(driver, 20).until(EC.presence_of_element_ located((By.CSS_SELECTOR, "div[id^='inline'] div.post_parent"))).text)
    
  • 使用XPATH

    print(WebDriverWait(driver, 20).until(EC.presence_of_element_ located((By.XPATH, "//div[starts-with(@id, 'inline_')]//div[@class= 'post_parent']"))).text)
    
  • 注意:您必须添加以下导入:

    从 selenium.webdriver.support.ui 导入 WebDriverWait
    从 selenium.webdriver.common.by 导入
    从 selenium.webdriver.support 导入预期条件作为 EC
    

I don't see any specific relation between "post_parent" ID 947630 and "post_name" number starting 09007. Moreover, the parent <div> is having class="hidden".

However, to pull the specific number you can use either of the following locator strategies:

  • Using css_selector:

    print(driver.find_element(By.CSS_SELECTOR, "div[id^='inline'] div.post_parent").text)
    
  • Using xpath:

    print(driver.find_element(By.XPATH, "//div[starts-with(@id, 'inline_')]//div[@class='post_parent']").text)
    

Ideally you need to induce WebDriverWait for the presence_of_element_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR:

    print(WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div[id^='inline'] div.post_parent"))).text)
    
  • Using XPATH:

    print(WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//div[starts-with(@id, 'inline_')]//div[@class='post_parent']"))).text)
    
  • Note: You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
二智少女猫性小仙女 2025-01-24 08:00:29

您可以创建一个方法并使用以下 xpath 根据 post_name 文本获取 post_parent 文本。

def getPostPatent(postname):
    element=driver.find_element(By.XPATH,"//div[@class='post_name' and starts-with(text(),'{}')]/following-sibling::div[@class='post_parent']".format(postname))
    print(element.get_attribute("textContent"))

getPostPatent('09007') 

如果它与文本 starts-with('09007') 匹配,它将返回值

似乎父类被隐藏,您需要使用 textContent 来获取值。

You can create a method and use the following xpath to get the post_parent text based on post_name text.

def getPostPatent(postname):
    element=driver.find_element(By.XPATH,"//div[@class='post_name' and starts-with(text(),'{}')]/following-sibling::div[@class='post_parent']".format(postname))
    print(element.get_attribute("textContent"))

getPostPatent('09007') 

This will return value if it is matches the text starts-with('09007')

It seems parent class is hidden you need to use textContent to get the value.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文