如何以编程方式获取已爬网网页的快照(在 Ruby 中)?

发布于 2024-08-09 23:13:46 字数 309 浏览 6 评论 0原文

以编程方式拍摄网页快照的最佳解决方案是什么?

情况是这样的:我想爬取一堆网页并定期拍摄它们的缩略图快照,比如每隔几个月一次,而不必手动访问每个网页。我还希望能够拍摄可能完全是 Flash/Flex 的网站的 jpg/png 快照,因此我必须等到它加载后才能以某种方式拍摄快照。

如果我可以生成的缩略图数量没有限制(在合理范围内,比如每天 1000 个),那就太好了。

有什么想法如何在 Ruby 中做到这一点吗?看起来相当艰难。

在以下浏览器中执行此操作:Safari 或 Firefox,最好是 Safari。

非常感谢。

What is the best solution to programmatically take a snapshot of a webpage?

The situation is this: I would like to crawl a bunch of webpages and take thumbnail snapshots of them periodically, say once every few months, without having to manually go to each one. I would also like to be able to take jpg/png snapshots of websites that might be completely Flash/Flex, so I'd have to wait until it loaded to take the snapshot somehow.

It would be nice if there was no limit to the number of thumbnails I could generate (within reason, say 1000 per day).

Any ideas how to do this in Ruby? Seems pretty tough.

Browsers to do this in: Safari or Firefox, preferably Safari.

Thanks so much.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

话少情深 2024-08-16 23:13:46

这实际上取决于您的操作系统。您需要的是一种连接网络浏览器并将其保存到图像的方法。

如果您使用的是 Mac - 我想您最好的选择是使用 MacRuby (或 RubyCocoa - 虽然我相信这在不久的将来会被弃用),然后使用WebKit框架来加载页面并将其渲染为图像。

这绝对是可能的,为了获得灵感,您可能希望查看 Paparazzi!webkit2png 项目。

另一种不依赖于操作系统的选项可能是使用 BrowserShots API

This really depends on your operating system. What you need is a way to hook into a web browser and save that to an image.

If you are on a Mac - I would imagine your best bet would be to use MacRuby (or RubyCocoa - although I believe this is going to be deprecated in the near future) and then to use the WebKit framework to load the page and render it as an image.

This is definitely possible, for inspiration you may wish to look at the Paparazzi! and webkit2png projects.

Another option, which isn't dependent on the OS, might be to use the BrowserShots API.

属性 2024-08-16 23:13:46

Ruby 中没有用于渲染网页的内置库。

There is no built in library in Ruby for rendering a web page.

倒带 2024-08-16 23:13:46

正如……所见,即?火狐?歌剧?无数的 webkit 引擎之一?

如果可以自动化 http://browsershots.org 就好了:)

as viewed by.... ie? firefox? opera? one of the myriad webkit engines?

if only it were possible to automate http://browsershots.org :)

南街九尾狐 2024-08-16 23:13:46

使用selenium-rc,它带有快照功能。

Use selenium-rc, it comes with snapshot capabilities.

烟火散人牵绊 2024-08-16 23:13:46

通过 jruby,您可以使用 SWT 的浏览器库。

With jruby you can use SWT's browser library.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文