如何阅读python中的JS生成页面

发布于 2025-02-05 07:47:03 字数 841 浏览 2 评论 0 原文

请注意:可以通过使用Selenium库轻松解决此问题,但我不想使用硒,因为主机没有安装浏览器,也不愿意。

重要:我知道Render()将首次下载Chromium,我对此表示满意。

问:当JS代码生成时,如何获取页面源?例如,此HP打印机:

220.116.57.59

有人在线发布并建议使用:

from requests_html import HTMLSession

r = session.get('https://220.116.57.59', timeout=3, verify=False)
session = HTMLSession()
base_url = r.url
r.html.render()

但是打印 r.text 不打印完整页面源,并指示JS被禁用:

<div id="pgm-no-js-text">
<p>JavaScript is required to access this website.</p>

<p>Please enable JavaScript or use a browser that supports JavaScript.</p>
</div>

原始答案: https://stackoverflow.com/a/50612469/19278887 (最后一部分)

Please Note: This problem can be solved easily by using selenium library but I don't want to use selenium since the Host doesn't have a browser installed and not willing to.

Important: I know that render() will download chromium at first time and I'm ok with that.

Q: How can I get the page source when it's generated by JS code? For example this HP printer:

220.116.57.59

Someone posted online and suggested using:

from requests_html import HTMLSession

r = session.get('https://220.116.57.59', timeout=3, verify=False)
session = HTMLSession()
base_url = r.url
r.html.render()

But printing r.text doesn't print full page source and indicates that JS is disabled:

<div id="pgm-no-js-text">
<p>JavaScript is required to access this website.</p>

<p>Please enable JavaScript or use a browser that supports JavaScript.</p>
</div>

Original Answer: https://stackoverflow.com/a/50612469/19278887 (last part)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情何以堪。 2025-02-12 07:47:04

获取配置端点,然后将XML解析以获取所需的数据。

例如:

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0"
}

with requests.Session() as s:
    soup = (
        BeautifulSoup(
            s.get(
                "http://220.116.57.59/IoMgmt/Adapters",
                headers=headers,
            ).text,
            features="xml",
        ).find_all("io:HardwareConfig")
    )
print("\n".join(c.find("MacAddress").getText() for c in soup if c.find("MacAddress") is not None))

输出:

E4E749735068
E4E74973506B
E6E74973D06B

Grab the config endpoints and then parse the XML for the data you want.

For example:

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0"
}

with requests.Session() as s:
    soup = (
        BeautifulSoup(
            s.get(
                "http://220.116.57.59/IoMgmt/Adapters",
                headers=headers,
            ).text,
            features="xml",
        ).find_all("io:HardwareConfig")
    )
print("\n".join(c.find("MacAddress").getText() for c in soup if c.find("MacAddress") is not None))

Output:

E4E749735068
E4E74973506B
E6E74973D06B
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文