onload javascript 后 HTML 的状态
许多网页使用 onload JavaScript 来操作它们的 DOM。有没有办法可以在这些 JavaScript 操作之后自动访问 HTML 的状态?
像 wget 这样的工具在这里没有用,因为它只是下载原始源。 有没有办法使用网络浏览器渲染引擎?
理想情况下,我想要一个可以通过 Python 进行交互的解决方案。
谢谢!
many webpages use onload JavaScript to manipulate their DOM. Is there a way I can automate accessing the state of the HTML after these JavaScript operations?
A took like wget is not useful here because it just downloads the original source.
Is there perhaps a way to use a web browser rendering engine?
Ideally I am after a solution that I can interface with from Python.
thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我知道做这些事情的唯一好方法是自动化浏览器,例如通过 Selenium RC。如果您不知道如何推断该页面已完成相关 JavaScript 的运行,那么,只有一个真正的实时用户访问该页面,您只需等待一段时间,抓取一个快照,再等待一段时间,抓取另一个,并检查它们之间没有任何变化,以说服自己它真的完成了。
The only good way I know to do such things is to automate a browser, for example via Selenium RC. If you have no idea of how to deduce that the page has finished running the relevant javascript, then, just a real live user visiting that page, you'll just have to wait a while, grab a snapshot, wait some more, grab another, and check there was no change between them to convince yourself that it's really finished.
请参阅 stackoverflow 上的相关信息:
Please see related info at stackoverflow: