获取 URL 的最终/定时渲染
我正在寻找一种方法,给出一个 URL,在 JavaScript 运行后获取网页的源代码。 例如:
我有一个带有 . 加载页面时,一些 JavaScript 会填充 div。 通过浏览器查看页面的源代码不会提供 div 内的信息。
据我所知,为了让浏览器渲染页面,div 必须填充 (X|D)HTML,这意味着渲染后页面的源仍然只是嵌套标记,所以理论上应该有是页面源代码的“最终”版本。
我考虑过使用像 WebKit 或 Gecko 这样的渲染引擎,并以某种方式调整它们来做到这一点,但这是一项相当大的任务,我真的不想重复已经完成的事情。 有谁知道执行此任务的方法。
问候。
更新:我的目标是使用 Selenium (如已接受答案的评论中提到的)在几个页面上自动执行此操作。 我的项目是一个网络蜘蛛,根据设计,它需要定位许多页面,在这些页面中,我想要访问的内容只有在 JavaScript 填充所有内容之后才可用。
I am looking for a way to, give a URL, get the source of a webpage back after the JavaScript has been run on it. For example:
I have a webpage with a .
On loading the page, some JavaScript populates the div.
Viewing the source of the page through a browser will not give the information which is within the div.
As far as I know, in order for the browser to render the page the div must have been filled with (X|D)HTML which would mean that the source of the page after being rendered is still just nested markup, so theoretically there should be a "final" version of the page source.
I have considered using a rendering engine like WebKit or Gecko and somehow adapting these to do this, however this is a fairly large task and I don't really want to duplicate something which has already been done. Does anyone know of a way of performing this task.
Regards.
Update: I am aiming to use Selenium (as mentioned in the comments to the accepted answer) to do this automatically for several pages. My project is a web spider which by design needs to target a number of pages in which the content I am aiming to reach is not available until after the JavaScript has populated everything.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Firefox 的插件如 WebDev 工具栏,或 Firebug 有“查看生成的源代码”等选项。
就时间安排而言,您唯一的选择就是拥有一段 JavaScript 代码。 您可以在页面加载时尽快设置开始时间,并在页面完成时再次检查(无论是 dom 就绪还是页面完全下载)。 然而,它将会有很大的变化,如果你试图计时以提高速度(这是很好的了解,并且这样做) - 只需获取 Firebug + Yslow 会更有用。
Such addons for Firefox as the WebDev toolbar, or Firebug have options like 'View generated source'.
As far as timing it goes, just about the only option you have is to have a snippet of javascript code. You could set a start-time as soon as is possible on the page-load, and check again when the page is completed (either for dom-ready or page completely downloaded). It's going to be highly variable however, and if you are trying to time it in order to improve the speed (which is good to know, and to do) - just getting Firebug + Yslow would be far more useful.
在 Firefox 中,您可以通过等待浏览器完成渲染,然后按 ctrl-A 选择页面上的所有内容,最后从右键单击菜单中选择“显示选择源”来获得最终渲染的 DIV。
这显示了页面的操作/填充的 DOM 代码。
Within Firefox you can get the final rendered DIV by waiting the browser to finish rendering, then pressing ctrl-A to select all content on the page and finally selecting "Show selection source" from the right-click menu.
This shows you the manipulated/populated DOM-code of the page.