如何以编程方式获取已爬网网页的快照(在 Ruby 中)?
以编程方式拍摄网页快照的最佳解决方案是什么?
情况是这样的:我想爬取一堆网页并定期拍摄它们的缩略图快照,比如每隔几个月一次,而不必手动访问每个网页。我还希望能够拍摄可能完全是 Flash/Flex 的网站的 jpg/png 快照,因此我必须等到它加载后才能以某种方式拍摄快照。
如果我可以生成的缩略图数量没有限制(在合理范围内,比如每天 1000 个),那就太好了。
有什么想法如何在 Ruby 中做到这一点吗?看起来相当艰难。
在以下浏览器中执行此操作:Safari 或 Firefox,最好是 Safari。
非常感谢。
What is the best solution to programmatically take a snapshot of a webpage?
The situation is this: I would like to crawl a bunch of webpages and take thumbnail snapshots of them periodically, say once every few months, without having to manually go to each one. I would also like to be able to take jpg/png snapshots of websites that might be completely Flash/Flex, so I'd have to wait until it loaded to take the snapshot somehow.
It would be nice if there was no limit to the number of thumbnails I could generate (within reason, say 1000 per day).
Any ideas how to do this in Ruby? Seems pretty tough.
Browsers to do this in: Safari or Firefox, preferably Safari.
Thanks so much.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这实际上取决于您的操作系统。您需要的是一种连接网络浏览器并将其保存到图像的方法。
如果您使用的是 Mac - 我想您最好的选择是使用 MacRuby (或 RubyCocoa - 虽然我相信这在不久的将来会被弃用),然后使用WebKit框架来加载页面并将其渲染为图像。
这绝对是可能的,为了获得灵感,您可能希望查看 Paparazzi! 和 webkit2png 项目。
另一种不依赖于操作系统的选项可能是使用 BrowserShots API。
This really depends on your operating system. What you need is a way to hook into a web browser and save that to an image.
If you are on a Mac - I would imagine your best bet would be to use MacRuby (or RubyCocoa - although I believe this is going to be deprecated in the near future) and then to use the WebKit framework to load the page and render it as an image.
This is definitely possible, for inspiration you may wish to look at the Paparazzi! and webkit2png projects.
Another option, which isn't dependent on the OS, might be to use the BrowserShots API.
Ruby 中没有用于渲染网页的内置库。
使用硒和Ruby 是一种可能性。您可以将 Firefox 作为无头浏览器运行(即在服务器上)。
这里是浏览器截图的源代码。 http://sourceforge.net/projects/browsershots/files/
如果您使用 Linux 时,您可以使用 http://khtml2png.sourceforge.net/ 并通过 Ruby 编写脚本。
一些尝试自动化的付费服务
There is no built in library in Ruby for rendering a web page.
Using Selenium & Ruby is one possibility. You can run Firefox as a headless browser (ie on a server).
Here is the source code for browser shots. http://sourceforge.net/projects/browsershots/files/
If you are using Linux you could use http://khtml2png.sourceforge.net/ and script it via Ruby.
Some paid services to try and automate
正如……所见,即?火狐?歌剧?无数的 webkit 引擎之一?
如果可以自动化 http://browsershots.org 就好了:)
as viewed by.... ie? firefox? opera? one of the myriad webkit engines?
if only it were possible to automate http://browsershots.org :)
使用selenium-rc,它带有快照功能。
Use selenium-rc, it comes with snapshot capabilities.
通过 jruby,您可以使用 SWT 的浏览器库。
With jruby you can use SWT's browser library.