请注意:可以通过使用Selenium库轻松解决此问题,但我不想使用硒,因为主机没有安装浏览器,也不愿意。
重要:我知道Render()将首次下载Chromium,我对此表示满意。
问:当JS代码生成时,如何获取页面源?例如,此HP打印机:
220.116.57.59
有人在线发布并建议使用:
from requests_html import HTMLSession
r = session.get('https://220.116.57.59', timeout=3, verify=False)
session = HTMLSession()
base_url = r.url
r.html.render()
但是打印 r.text
不打印完整页面源,并指示JS被禁用:
<div id="pgm-no-js-text">
<p>JavaScript is required to access this website.</p>
<p>Please enable JavaScript or use a browser that supports JavaScript.</p>
</div>
原始答案: https://stackoverflow.com/a/50612469/19278887 (最后一部分)
Please Note: This problem can be solved easily by using selenium library but I don't want to use selenium since the Host doesn't have a browser installed and not willing to.
Important: I know that render() will download chromium at first time and I'm ok with that.
Q: How can I get the page source when it's generated by JS code? For example this HP printer:
220.116.57.59
Someone posted online and suggested using:
from requests_html import HTMLSession
r = session.get('https://220.116.57.59', timeout=3, verify=False)
session = HTMLSession()
base_url = r.url
r.html.render()
But printing r.text
doesn't print full page source and indicates that JS is disabled:
<div id="pgm-no-js-text">
<p>JavaScript is required to access this website.</p>
<p>Please enable JavaScript or use a browser that supports JavaScript.</p>
</div>
Original Answer: https://stackoverflow.com/a/50612469/19278887 (last part)
发布评论
评论(1)
获取配置端点,然后将XML解析以获取所需的数据。
例如:
输出:
Grab the config endpoints and then parse the XML for the data you want.
For example:
Output: