使用Python在Selenium WebDriver中获取WebElement的HTML源

发布于 2024-12-02 09:25:49 字数 509 浏览 3 评论 0原文

我正在使用 Python 绑定来运行 Selenium WebDriver:

from selenium import webdriver
wd = webdriver.Firefox()

我知道我可以像这样获取一个 webelement:

elem = wd.find_element_by_css_selector('#my-id')

而且我知道我可以通过...获取完整页面源...

wd.page_source

但是有没有办法获取“元素源”?

elem.source   # <-- returns the HTML as a string

Python 的 Selenium WebDriver 文档基本上不存在,而且我在代码中没有看到任何似乎启用该功能的内容。

访问元素(及其子元素)的 HTML 的最佳方式是什么?

I'm using the Python bindings to run Selenium WebDriver:

from selenium import webdriver
wd = webdriver.Firefox()

I know I can grab a webelement like so:

elem = wd.find_element_by_css_selector('#my-id')

And I know I can get the full page source with...

wd.page_source

But is there a way to get the "element source"?

elem.source   # <-- returns the HTML as a string

The Selenium WebDriver documentation for Python are basically non-existent and I don't see anything in the code that seems to enable that functionality.

What is the best way to access the HTML of an element (and its children)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(19

芸娘子的小脾气 2024-12-09 09:25:50

InnerHTML 将返回所选元素内的元素,outerHTML 将返回内部 HTML 以及您选择的元素

示例:

现在假设您的元素如下

<tr id="myRow"><td>A</td><td>B</td></tr>

innerHTML 元素输出

<td>A</td><td>B</td>

outerHTML 元素输出

<tr id="myRow"><td>A</td><td>B</td></tr>

实时示例:

http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_ Between_innerhtml_and_outerhtml_in_javascript_example.htm

下面您将找到不同绑定所需的语法。根据需要将 innerHTML 更改为 outerHTML

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

如果您想要整个页面 HTML,请使用以下代码:

driver.getPageSource();

InnerHTML will return the element inside the selected element and outerHTML will return the inside HTML along with the element you have selected

Example:

Now suppose your Element is as below

<tr id="myRow"><td>A</td><td>B</td></tr>

innerHTML element output

<td>A</td><td>B</td>

outerHTML element output

<tr id="myRow"><td>A</td><td>B</td></tr>

Live Example:

http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm

Below you will find the syntax which require as per different binding. Change the innerHTML to outerHTML as per required.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

If you want whole page HTML, use the below code:

driver.getPageSource();
梦里人 2024-12-09 09:25:50

更新 2022 Selenium 检索 HTML

首先,下载 Selenium WebDriver 的 Python 绑定。

  • 可以从 Selenium 包的 PyPI 页面执行此操作。
  • 或者,可以使用 pip 安装 Selenium 包。 Python 3.6在标准库中提供了pip。

方法 1

读取 innerHTML 属性来获取元素内容的来源。 innerHTML 是 DOM 元素的一个属性,其值是开始标记和结束标记之间的 HTML。

例如,下面代码中的innerHTML 属性带有值“text”

<p>
a text
</p>
element.get_attribute('innerHTML')

方法2

读取outerHTML 以获取当前元素的源。 outerHTML 是一个元素属性,其值是开始和结束标记之间的 HTML 以及所选元素本身的 HTML。

例如,代码的 outerHTML 属性携带一个值,其中包含 divspan

<div>
<span>Hello there!</span>
</div>
ele.get_atrribute("outerHTML")

Updated 2022 Selenium Retrieving HTML

To start with, download the Python bindings for Selenium WebDriver.

  • One can do this from the PyPI page for the Selenium package.
  • Alternatively, one can use pip to install the Selenium package. Python 3.6 provides the pip in the standard library.

Method 1

Read the innerHTML attribute to get the source of the element’s content. innerHTML is a property of a DOM element whose value is the HTML between the opening tag and ending tag.

For example, the innerHTML property in the code below carries the value “text”

<p>
a text
</p>
element.get_attribute('innerHTML')

Method 2

Read the outerHTML to get the source with the current element. outerHTML is an element property whose value is the HTML between the opening and closing tags and the HTML of the selected element itself.

For example, the code’s outerHTML property carries a value that contains div and span inside that.

<div>
<span>Hello there!</span>
</div>
ele.get_atrribute("outerHTML")
殤城〤 2024-12-09 09:25:50

我希望这可以帮助:
http://selenium.googlecode.com /svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html

这里描述了Java方法:

java.lang.String    getText() 

但不幸的是它在Python中不可用。因此,您可以将方法名称从 Java 翻译为 Python,并使用现有方法尝试另一种逻辑,而无需获取整个页面源代码...

例如

 my_id = elem[0].get_attribute('my-id')

I hope this could help:
http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html

Here is described Java method:

java.lang.String    getText() 

But unfortunately it's not available in Python. So you can translate the method names to Python from Java and try another logic using present methods without getting the whole page source...

E.g.

 my_id = elem[0].get_attribute('my-id')
日裸衫吸 2024-12-09 09:25:50

这对我来说是无缝的。

element.get_attribute('innerHTML')

This works seamlessly for me.

element.get_attribute('innerHTML')
你是年少的欢喜 2024-12-09 09:25:50

我更喜欢的获取渲染 HTML 的方法如下:

driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text

但是,上述方法会删除所有标签(是的,还有嵌套标签)并仅返回文本内容。如果您也有兴趣获取 HTML 标记,请使用以下方法。

print body_html.getAttribute("innerHTML")

The method to get the rendered HTML I prefer is the following:

driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text

However, the above method removes all the tags (yes, the nested tags as well) and returns only text content. If you interested in getting the HTML markup as well, then use the method below.

print body_html.getAttribute("innerHTML")

如果您对 Python 中的 Selenium Remote Control 解决方案感兴趣,请点击此处获取innerHTML的方法是:

innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")

If you are interested in a solution for Selenium Remote Control in Python, here is how to get innerHTML:

innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")
花之痕靓丽 2024-12-09 09:25:50

使用execute_script get html

bs4(BeautifulSoup)也可以快速访问html标签。

from bs4 import BeautifulSoup
html = adriver.execute_script("return document.documentElement.outerHTML")
bs4_onepage_object=BeautifulSoup(html,"html.parser")
bs4_div_object=bs4_onepage_object.find_all("atag",class_="attribute")

Use execute_script get html

bs4(BeautifulSoup) also can access html tag quickly.

from bs4 import BeautifulSoup
html = adriver.execute_script("return document.documentElement.outerHTML")
bs4_onepage_object=BeautifulSoup(html,"html.parser")
bs4_div_object=bs4_onepage_object.find_all("atag",class_="attribute")
挽手叙旧 2024-12-09 09:25:50

PHP Selenium WebDriver 中,您可以像这样获取页面源:

$html = $driver->getPageSource();

或者获取元素的 HTML像这样:

// innerHTML if you need HTML of the element content
$html = $element->getDomProperty('outerHTML');

In PHP Selenium WebDriver you can get page source like this:

$html = $driver->getPageSource();

Or get HTML of the element like this:

// innerHTML if you need HTML of the element content
$html = $element->getDomProperty('outerHTML');
旧人 2024-12-09 09:25:50

在当前版本的 php-webdriver (1.12.0+) 中,您必须

$element->getDomProperty('innerHTML');

按照本问题中指出的方式使用:https://github.com/php-webdriver/php-webdriver/issues/929

In current versions of php-webdriver (1.12.0+) you have to use

$element->getDomProperty('innerHTML');

as pointed out in this issue: https://github.com/php-webdriver/php-webdriver/issues/929

情域 2024-12-09 09:25:50
WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element); 

这段代码也确实可以从源代码获取 JavaScript!

WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element); 

This code really works to get JavaScript from source as well!

与之呼应 2024-12-09 09:25:50

PHPUnit Selenium 测试中,它是这样的:

$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');

And in PHPUnit Selenium test it's like this:

$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');
无悔心 2024-12-09 09:25:49

您可以读取 innerHTML 属性来获取元素的内容来源,或获取当前元素的outerHTML来源。

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

C#:

element.GetAttribute("innerHTML");

Ruby:

element.attribute("innerHTML")

JavaScript:

element.getAttribute('innerHTML');

PHP:

$element->getAttribute('innerHTML');

它已经过测试并可与 ChromeDriver 配合使用。

You can read the innerHTML attribute to get the source of the content of the element or outerHTML for the source with the current element.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

C#:

element.GetAttribute("innerHTML");

Ruby:

element.attribute("innerHTML")

JavaScript:

element.getAttribute('innerHTML');

PHP:

$element->getAttribute('innerHTML');

It was tested and worked with the ChromeDriver.

三岁铭 2024-12-09 09:25:49

以下是如何使用 Selenium Python 获取 HTML 源代码:

elem = driver.find_element("xpath", "//*")
source_code = elem.get_attribute("outerHTML")

以下是如何将该 HTML 保存到文件中:

with open('c:/html_source_code.html', 'w') as f:
    f.write(source_code.encode('utf-8'))

Here's how to get the HTML source code using Selenium Python:

elem = driver.find_element("xpath", "//*")
source_code = elem.get_attribute("outerHTML")

Here's how to save that HTML to a file:

with open('c:/html_source_code.html', 'w') as f:
    f.write(source_code.encode('utf-8'))
Oo萌小芽oO 2024-12-09 09:25:49

实际上并没有一种直接的方法来获取 webelement 的 HTML 源代码。您必须使用 JavaScript。我不太确定 python 绑定,但你可以在 Java 中轻松做到这一点。我确信Python中一定有类似JavascriptExecutor类的东西。

 WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);

There is not really a straightforward way of getting the HTML source code of a webelement. You will have to use JavaScript. I am not too sure about python bindings, but you can easily do like this in Java. I am sure there must be something similar to JavascriptExecutor class in Python.

 WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);
半世蒼涼 2024-12-09 09:25:49

在 Ruby 中,使用 selenium-webdriver (2.32.1),有一个包含整个页面源的 page_source 方法。

In Ruby, using selenium-webdriver (2.32.1), there is a page_source method that contains the entire page source.

凉城 2024-12-09 09:25:49

其他答案提供了有关检索 Web 元素。然而,一个重要的方面是,现代网站越来越多地实现 JavaScriptReactJS, jQuery, AjaxVue.jsEmber.jsGWT等来渲染动态元素DOM 树。因此,在检索标记之前,有必要等待元素及其子元素完全呈现。


因此

,理想情况下,您需要引入 WebDriverWait< /a> 对于visibility_of_element_ located(),您可以使用以下任一定位器策略

  • 使用get_attribute("outerHTML ”)

    element = WebDriverWait(driver, 20).until(EC.visibility_of_element_ located((By.CSS_SELECTOR, "#my-id")))
    print(element.get_attribute("outerHTML"))
    
  • 使用execute_script()

    element = WebDriverWait(driver, 20).until(EC.visibility_of_element_ located((By.CSS_SELECTOR, "#my-id")))
    print(driver.execute_script("返回参数[0].outerHTML;", element))
    
  • 注意:您必须添加以下导入:

    从 selenium.webdriver.support.ui 导入 WebDriverWait
    从 selenium.webdriver.common.by 导入
    从 selenium.webdriver.support 导入预期条件作为 EC
    

The other answers provide a lot of details about retrieving the markup of a WebElement. However, an important aspect is, modern websites are increasingly implementing JavaScript, ReactJS, jQuery, Ajax, Vue.js, Ember.js, GWT, etc. to render the dynamic elements within the DOM tree. Hence there is a necessity to wait for the element and its children to completely render before retrieving the markup.


Python

Hence, ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using get_attribute("outerHTML"):

    element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#my-id")))
    print(element.get_attribute("outerHTML"))
    
  • Using execute_script():

    element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#my-id")))
    print(driver.execute_script("return arguments[0].outerHTML;", element))
    
  • Note: You have to add the following imports:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
甜扑 2024-12-09 09:25:49

它看起来已经过时了,但无论如何还是放在这里吧。在您的情况下执行此操作的正确方法:

elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)

html = elem.get_attribute('innerHTML')

两者都为我工作(selenium-server-standalone-2.35.0)。

It looks outdated, but let it be here anyway. The correct way to do it in your case:

elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)

or

html = elem.get_attribute('innerHTML')

Both are working for me (selenium-server-standalone-2.35.0).

请持续率性 2024-12-09 09:25:49

事实上,使用属性方法更容易、更直接。

将 Ruby 与 Selenium 和 PageObject gem 结合使用,要获取与某个元素关联的类,该行将是 element.attribute(Class)

如果您想要将其他属性绑定到该元素,则同样的概念适用。例如,如果我想要一个元素的字符串element.attribute(String)

Using the attribute method is, in fact, easier and more straightforward.

Using Ruby with the Selenium and PageObject gems, to get the class associated with a certain element, the line would be element.attribute(Class).

The same concept applies if you wanted to get other attributes tied to the element. For example, if I wanted the string of an element, element.attribute(String).

自由如风 2024-12-09 09:25:49

Java 与 Selenium 2.53.0

driver.getPageSource();

Java with Selenium 2.53.0

driver.getPageSource();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文