如何通过seleniumRC保存网页
我使用seleniumRC打开一个url,那么如何保存这个网页呢?如何实现像urllib.urlretrieve那样呢?但urllib无法操作页面中的javascript。还有一个问题:它会保存我所看到的 seleniumRC 打开的整个页面吗?
I use seleniumRC to open a url, then how to save this web page? How to realize it like urllib.urlretrieve do it? But urllib can't operate javascript in the page. One more question: Will it save the whole page with what I see as seleniumRC open it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
听起来您混淆了两个截然不同的库。
urllib:
您可以使用 python 的 urllib 库从有效 URL 检索原始标记。该库不会调用页面上的任何嵌入式 JavaScript,因为该库从不尝试解析或呈现任何内容。
Selenium RC:
Selenium RC 用于自动化测试。测试的执行通过 JavaScript 在 Web 浏览器中进行,但这是一个测试套件 — 您会收到有关测试状态的信息。 Selenium RC 不提供任何保存渲染页面图像的功能。
除非我误解了你的问题,否则你似乎正在寻找一个库,它允许你检索渲染的 HTML 页面的图像(包括 javascript DOM 操作)。如果确实如此,我建议查看 PyWebShot,它似乎提供了确切的功能。您可以在此处查看其实际操作的屏幕截图(以及以及一些有关它的附加信息)。
如果它不一定需要是一个 python 库,那么有许多 Web 服务可以提供屏幕截图:
It sounds like you are confusing two very different libraries.
urllib:
You can use python's urllib library to retrieve the raw markup from a valid URL. The library doesn't invoke any embedded javascript on the page, because the library never attempts to parse or render anything.
Selenium RC:
Selenium RC is used to automate testing. Execution of your tests occurs in a web browser via javascript, but this is a testing suite — you receive information about the status of your tests. Selenium RC does not provide any functionality to save an image of the rendered page.
Unless I've misinterpreted your question, you seem to be looking for a library that will allow you to retrieve an image of a rendered HTML page (including javascript DOM manipulation). If this is indeed the case, I would suggest looking into PyWebShot, which seems to provide exactly that functionality. You can view screenshots of it in action here (along with some additional info about it).
If it doesn't necessarily need to be a python library, there are a number of web services around that provide screenshots: