使用Python在Selenium WebDriver中获取WebElement的HTML源
我正在使用 Python 绑定来运行 Selenium WebDriver:
from selenium import webdriver
wd = webdriver.Firefox()
我知道我可以像这样获取一个 webelement:
elem = wd.find_element_by_css_selector('#my-id')
而且我知道我可以通过...获取完整页面源...
wd.page_source
但是有没有办法获取“元素源”?
elem.source # <-- returns the HTML as a string
Python 的 Selenium WebDriver 文档基本上不存在,而且我在代码中没有看到任何似乎启用该功能的内容。
访问元素(及其子元素)的 HTML 的最佳方式是什么?
I'm using the Python bindings to run Selenium WebDriver:
from selenium import webdriver
wd = webdriver.Firefox()
I know I can grab a webelement like so:
elem = wd.find_element_by_css_selector('#my-id')
And I know I can get the full page source with...
wd.page_source
But is there a way to get the "element source"?
elem.source # <-- returns the HTML as a string
The Selenium WebDriver documentation for Python are basically non-existent and I don't see anything in the code that seems to enable that functionality.
What is the best way to access the HTML of an element (and its children)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(19)
InnerHTML 将返回所选元素内的元素,outerHTML 将返回内部 HTML 以及您选择的元素
示例:
现在假设您的元素如下
innerHTML 元素输出
outerHTML 元素输出
实时示例:
http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_ Between_innerhtml_and_outerhtml_in_javascript_example.htm
下面您将找到不同绑定所需的语法。根据需要将
innerHTML
更改为outerHTML
。Python:
Java:
如果您想要整个页面 HTML,请使用以下代码:
InnerHTML will return the element inside the selected element and outerHTML will return the inside HTML along with the element you have selected
Example:
Now suppose your Element is as below
innerHTML element output
outerHTML element output
Live Example:
http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm
Below you will find the syntax which require as per different binding. Change the
innerHTML
toouterHTML
as per required.Python:
Java:
If you want whole page HTML, use the below code:
更新 2022 Selenium 检索 HTML
首先,下载 Selenium WebDriver 的 Python 绑定。
方法 1
读取
innerHTML
属性来获取元素内容的来源。innerHTML
是 DOM 元素的一个属性,其值是开始标记和结束标记之间的 HTML。例如,下面代码中的innerHTML 属性带有值“text”
方法2
读取
outerHTML
以获取当前元素的源。outerHTML
是一个元素属性,其值是开始和结束标记之间的 HTML 以及所选元素本身的 HTML。例如,代码的
outerHTML
属性携带一个值,其中包含div
和span
。Updated 2022 Selenium Retrieving HTML
To start with, download the Python bindings for Selenium WebDriver.
Method 1
Read the
innerHTML
attribute to get the source of the element’s content.innerHTML
is a property of a DOM element whose value is the HTML between the opening tag and ending tag.For example, the innerHTML property in the code below carries the value “text”
Method 2
Read the
outerHTML
to get the source with the current element.outerHTML
is an element property whose value is the HTML between the opening and closing tags and the HTML of the selected element itself.For example, the code’s
outerHTML
property carries a value that containsdiv
andspan
inside that.我希望这可以帮助:
http://selenium.googlecode.com /svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html
这里描述了Java方法:
但不幸的是它在Python中不可用。因此,您可以将方法名称从 Java 翻译为 Python,并使用现有方法尝试另一种逻辑,而无需获取整个页面源代码...
例如
I hope this could help:
http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html
Here is described Java method:
But unfortunately it's not available in Python. So you can translate the method names to Python from Java and try another logic using present methods without getting the whole page source...
E.g.
这对我来说是无缝的。
This works seamlessly for me.
我更喜欢的获取渲染 HTML 的方法如下:
但是,上述方法会删除所有标签(是的,还有嵌套标签)并仅返回文本内容。如果您也有兴趣获取 HTML 标记,请使用以下方法。
The method to get the rendered HTML I prefer is the following:
However, the above method removes all the tags (yes, the nested tags as well) and returns only text content. If you interested in getting the HTML markup as well, then use the method below.
如果您对 Python 中的 Selenium Remote Control 解决方案感兴趣,请点击此处获取innerHTML的方法是:
If you are interested in a solution for Selenium Remote Control in Python, here is how to get innerHTML:
使用execute_script get html
bs4(BeautifulSoup)也可以快速访问html标签。
Use execute_script get html
bs4(BeautifulSoup) also can access html tag quickly.
在 PHP Selenium WebDriver 中,您可以像这样获取页面源:
或者获取元素的 HTML像这样:
In PHP Selenium WebDriver you can get page source like this:
Or get HTML of the element like this:
在当前版本的 php-webdriver (1.12.0+) 中,您必须
按照本问题中指出的方式使用:https://github.com/php-webdriver/php-webdriver/issues/929
In current versions of php-webdriver (1.12.0+) you have to use
as pointed out in this issue: https://github.com/php-webdriver/php-webdriver/issues/929
这段代码也确实可以从源代码获取 JavaScript!
This code really works to get JavaScript from source as well!
在 PHPUnit Selenium 测试中,它是这样的:
And in PHPUnit Selenium test it's like this:
您可以读取
innerHTML
属性来获取元素的内容来源,或获取当前元素的outerHTML
来源。Python:
Java:
C#:
Ruby:
JavaScript:
PHP:
它已经过测试并可与
ChromeDriver
配合使用。You can read the
innerHTML
attribute to get the source of the content of the element orouterHTML
for the source with the current element.Python:
Java:
C#:
Ruby:
JavaScript:
PHP:
It was tested and worked with the
ChromeDriver
.以下是如何使用 Selenium Python 获取 HTML 源代码:
以下是如何将该 HTML 保存到文件中:
Here's how to get the HTML source code using Selenium Python:
Here's how to save that HTML to a file:
实际上并没有一种直接的方法来获取
webelement
的 HTML 源代码。您必须使用 JavaScript。我不太确定 python 绑定,但你可以在 Java 中轻松做到这一点。我确信Python中一定有类似JavascriptExecutor类的东西。There is not really a straightforward way of getting the HTML source code of a
webelement
. You will have to use JavaScript. I am not too sure about python bindings, but you can easily do like this in Java. I am sure there must be something similar toJavascriptExecutor
class in Python.在 Ruby 中,使用 selenium-webdriver (2.32.1),有一个包含整个页面源的
page_source
方法。In Ruby, using selenium-webdriver (2.32.1), there is a
page_source
method that contains the entire page source.其他答案提供了有关检索 Web 元素。然而,一个重要的方面是,现代网站越来越多地实现 JavaScript、ReactJS, jQuery, Ajax、Vue.js、Ember.js,GWT等来渲染动态元素DOM 树。因此,在检索标记之前,有必要等待元素及其子元素完全呈现。
因此
,理想情况下,您需要引入 WebDriverWait< /a> 对于
visibility_of_element_ located()
,您可以使用以下任一定位器策略:使用
get_attribute("outerHTML ”)
:使用
execute_script()
:注意:您必须添加以下导入:
The other answers provide a lot of details about retrieving the markup of a WebElement. However, an important aspect is, modern websites are increasingly implementing JavaScript, ReactJS, jQuery, Ajax, Vue.js, Ember.js, GWT, etc. to render the dynamic elements within the DOM tree. Hence there is a necessity to wait for the element and its children to completely render before retrieving the markup.
Python
Hence, ideally you need to induce WebDriverWait for the
visibility_of_element_located()
and you can use either of the following Locator Strategies:Using
get_attribute("outerHTML")
:Using
execute_script()
:Note: You have to add the following imports:
它看起来已经过时了,但无论如何还是放在这里吧。在您的情况下执行此操作的正确方法:
或
两者都为我工作(selenium-server-standalone-2.35.0)。
It looks outdated, but let it be here anyway. The correct way to do it in your case:
or
Both are working for me (selenium-server-standalone-2.35.0).
事实上,使用属性方法更容易、更直接。
将 Ruby 与 Selenium 和 PageObject gem 结合使用,要获取与某个元素关联的类,该行将是
element.attribute(Class)
。如果您想要将其他属性绑定到该元素,则同样的概念适用。例如,如果我想要一个元素的字符串,
element.attribute(String)
。Using the attribute method is, in fact, easier and more straightforward.
Using Ruby with the Selenium and PageObject gems, to get the class associated with a certain element, the line would be
element.attribute(Class)
.The same concept applies if you wanted to get other attributes tied to the element. For example, if I wanted the string of an element,
element.attribute(String)
.Java 与 Selenium 2.53.0
Java with Selenium 2.53.0