Python Selenium 访问 HTML 源

发布于 2024-12-11 03:54:43 字数 362 浏览 0 评论 0原文

如何使用 Python 中的 Selenium 模块获取变量中的 HTML 源?

我想做这样的事情:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")
if "whatever" in html_source:
    # Do something
else:
    # Do something else

我该怎么做?我不知道如何访问 HTML 源。

How can I get the HTML source in a variable using the Selenium module with Python?

I wanted to do something like this:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")
if "whatever" in html_source:
    # Do something
else:
    # Do something else

How can I do this? I don't know how to access the HTML source.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

篱下浅笙歌 2024-12-18 03:54:43

您需要访问 page_source 属性:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")

html_source = browser.page_source
if "whatever" in html_source:
    # do something
else:
    # do something else

You need to access the page_source property:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")

html_source = browser.page_source
if "whatever" in html_source:
    # do something
else:
    # do something else
梦幻之岛 2024-12-18 03:54:43
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
html_source_code = driver.execute_script("return document.body.innerHTML;")
html_soup: BeautifulSoup = BeautifulSoup(html_source_code, 'html.parser')

现在您可以应用 BeautifulSoup 函数来提取数据...

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
html_source_code = driver.execute_script("return document.body.innerHTML;")
html_soup: BeautifulSoup = BeautifulSoup(html_source_code, 'html.parser')

Now you can apply BeautifulSoup function to extract data...

情魔剑神 2024-12-18 03:54:43

driver.page_source将帮助您获取页面源代码。您可以检查文本是否存在于页面源中。

from selenium import webdriver
driver = webdriver.Firefox()
driver.get("some url")
if "your text here" in driver.page_source:
    print('Found it!')
else:
    print('Did not find it.')

如果要将页面源存储在变量中,请在 driver.get 之后添加以下行:

var_pgsource=driver.page_source

并将 if 条件更改为:

if "your text here" in var_pgsource:

driver.page_source will help you get the page source code. You can check if the text is present in the page source or not.

from selenium import webdriver
driver = webdriver.Firefox()
driver.get("some url")
if "your text here" in driver.page_source:
    print('Found it!')
else:
    print('Did not find it.')

If you want to store the page source in a variable, add below line after driver.get:

var_pgsource=driver.page_source

and change the if condition to:

if "your text here" in var_pgsource:
贱贱哒 2024-12-18 03:54:43

通过 Selenium2Library,您可以使用 get_source()

import Selenium2Library
s = Selenium2Library.Selenium2Library()
s.open_browser("localhost:7080", "firefox")
source = s.get_source()

With Selenium2Library you can use get_source()

import Selenium2Library
s = Selenium2Library.Selenium2Library()
s.open_browser("localhost:7080", "firefox")
source = s.get_source()
止于盛夏 2024-12-18 03:54:43

通过使用页面源代码,您将获得完整的 HTML 代码。
因此,首先决定您需要在其中检索数据或单击元素的代码块或标签。

options = driver.find_elements_by_name_("XXX")
for option in options:
    if option.text == "XXXXXX":
        print(option.text)
        option.click()

您可以通过名称、XPath、id、链接和 CSS 路径查找元素。

By using the page source you will get the whole HTML code.
So first decide the block of code or tag in which you require to retrieve the data or to click the element..

options = driver.find_elements_by_name_("XXX")
for option in options:
    if option.text == "XXXXXX":
        print(option.text)
        option.click()

You can find the elements by name, XPath, id, link and CSS path.

听风念你 2024-12-18 03:54:43

要回答有关获取用于 urllib 的 URL 的问题,只需执行以下 JavaScript 代码:

url = browser.execute_script("return window.location;")

To answer your question about getting the URL to use for urllib, just execute this JavaScript code:

url = browser.execute_script("return window.location;")
月寒剑心 2024-12-18 03:54:43

您可以简单地使用 WebDriver 对象,并通过其 @property 字段 page_source 访问页面源代码...

尝试以下代码片段: -)

from selenium import webdriver
driver = webdriver.Firefox('path/to/executable')
driver.get('https://some-domain.com')
source = driver.page_source
if 'stuff' in source:
    print('found...')
else:
    print('not in source...')

You can simply use the WebDriver object, and access to the page source code via its @property field page_source...

Try this code snippet :-)

from selenium import webdriver
driver = webdriver.Firefox('path/to/executable')
driver.get('https://some-domain.com')
source = driver.page_source
if 'stuff' in source:
    print('found...')
else:
    print('not in source...')
风筝在阴天搁浅。 2024-12-18 03:54:43

完整代码:

from selenium import webdriver

# Initialize the WebDriver
driver = webdriver.Chrome()  # Use the appropriate WebDriver for your browser

# Navigate to the desired URL
driver.get("https://www.example.com/")

# Access the page's HTML source
html_source = driver.page_source

if "whatever" in html_source:
   # do something
else:
   # do something else

# if you want to display complete source code.
print(html_source)

# Close the WebDriver
driver.quit()

Complete code:

from selenium import webdriver

# Initialize the WebDriver
driver = webdriver.Chrome()  # Use the appropriate WebDriver for your browser

# Navigate to the desired URL
driver.get("https://www.example.com/")

# Access the page's HTML source
html_source = driver.page_source

if "whatever" in html_source:
   # do something
else:
   # do something else

# if you want to display complete source code.
print(html_source)

# Close the WebDriver
driver.quit()
凶凌 2024-12-18 03:54:43

我建议使用 urllib 获取源代码,如果您打算解析,使用类似 Beautiful Soup 的内容。

import urllib

url = urllib.urlopen("http://example.com") # Open the URL.
content = url.readlines() # Read the source and save it to a variable.

I'd recommend getting the source with urllib and, if you're going to parse, use something like Beautiful Soup.

import urllib

url = urllib.urlopen("http://example.com") # Open the URL.
content = url.readlines() # Read the source and save it to a variable.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文