如何阅读python中的JS生成页面

发布于 2025-02-05 07:47:03 字数 841 浏览 2 评论 0 原文

请注意：可以通过使用Selenium库轻松解决此问题，但我不想使用硒，因为主机没有安装浏览器，也不愿意。

重要：我知道Render（）将首次下载Chromium，我对此表示满意。

问：当JS代码生成时，如何获取页面源？例如，此HP打印机：

220.116.57.59

有人在线发布并建议使用：

from requests_html import HTMLSession

r = session.get('https://220.116.57.59', timeout=3, verify=False)
session = HTMLSession()
base_url = r.url
r.html.render()

但是打印 r.text 不打印完整页面源，并指示JS被禁用：

<div id="pgm-no-js-text">
<p>JavaScript is required to access this website.</p>

<p>Please enable JavaScript or use a browser that supports JavaScript.</p>
</div>

原始答案： https://stackoverflow.com/a/50612469/19278887 （最后一部分）

原文

Please Note: This problem can be solved easily by using selenium library but I don't want to use selenium since the Host doesn't have a browser installed and not willing to.

Important: I know that render() will download chromium at first time and I'm ok with that.

Q: How can I get the page source when it's generated by JS code? For example this HP printer:

220.116.57.59

Someone posted online and suggested using:

from requests_html import HTMLSession

r = session.get('https://220.116.57.59', timeout=3, verify=False)
session = HTMLSession()
base_url = r.url
r.html.render()

But printing r.text doesn't print full page source and indicates that JS is disabled:

<div id="pgm-no-js-text">
<p>JavaScript is required to access this website.</p>

<p>Please enable JavaScript or use a browser that supports JavaScript.</p>
</div>

Original Answer: https://stackoverflow.com/a/50612469/19278887 (last part)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情何以堪。 2025-02-12 07:47:04

获取配置端点，然后将XML解析以获取所需的数据。

例如：

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0"
}

with requests.Session() as s:
    soup = (
        BeautifulSoup(
            s.get(
                "http://220.116.57.59/IoMgmt/Adapters",
                headers=headers,
            ).text,
            features="xml",
        ).find_all("io:HardwareConfig")
    )
print("\n".join(c.find("MacAddress").getText() for c in soup if c.find("MacAddress") is not None))

输出：

E4E749735068
E4E74973506B
E6E74973D06B

Grab the config endpoints and then parse the XML for the data you want.

For example:

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0"
}

with requests.Session() as s:
    soup = (
        BeautifulSoup(
            s.get(
                "http://220.116.57.59/IoMgmt/Adapters",
                headers=headers,
            ).text,
            features="xml",
        ).find_all("io:HardwareConfig")
    )
print("\n".join(c.find("MacAddress").getText() for c in soup if c.find("MacAddress") is not None))

Output: