检索图像的尺寸而不下载整个图像
我使用 open-uri 下载远程图像,然后使用 imagesize gem 获取尺寸。问题是,当需要处理大量图像时,速度会变得非常慢。
如何下载足够的信息来了解各种图像格式的尺寸?
还有其他方法可以优化吗?
I'm using open-uri to download remote images and then the imagesize gem to get the dimensions. The problem is this gets painfully slow when more than a handful of images needs to be processed.
How can I download enough information to know the dimensions for various image formats?
Are there any more ways to optimize this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我相信,如果您使用原始套接字(发出简单的http请求),则无需下载超过几个字节(并中止连接)来确定图像的尺寸。
例如,如果我将 PNG 文件的前 33 个字节(GIF 为 13 个字节)推入 exiftool,它将给出图像大小
I believe if you go raw socket (issue bare bones http request), there's no need to download more than a few bytes (and abort the connection) to determine dimensions of images.
e.g. if i push the first 33 bytes of a PNG file (13 bytes for a GIF) into exiftool, it will give me the image size
我不知道有什么方法可以指定使用普通 HTTPd 请求下载多少字节。这是一个全有或全无的情况。
某些文件类型确实允许文件的各个部分,但是,您必须控制服务器才能启用它。
我已经很长时间没有达到这个级别了,但是,理论上您可以使用带有 Net::HTTP 或 Open-URI 的块,并计算字节数,直到收到适当的数字来获取图像大小块,然后关闭连接。您的 TCP 堆栈可能不会对您太满意,尤其是如果您经常这样做的话。如果我没记错的话,它不会处理内存,直到连接超时,并且会耗尽可用的连接,无论是在你这边还是在服务器上。而且,如果我运行一个网站并发现我的服务器性能因您的应用程序过早关闭连接而受到损害,我会禁止您。
最终,最好的解决方案是与您正在掠夺的网站的所有者交谈,看看他们是否有 API 来告诉您文件大小是多少。他们的连接端可以比您的端更快地发现这一点,因为您必须检索整个文件。如果没有别的事,请给他们写一些可以实现这一目标的东西。也许他们会明白,通过启用它,您将不会消耗他们所有的带宽来检索图像。
I'm not aware of any way to specify how many bytes to download with a normal HTTPd request. It's an all or nothing situation.
Some file types do allow sections of the files, but, you would have to have control of the server in order to enable that.
It's been a long time since I've played at this level, but, theoretically you could use a block with Net::HTTP or Open-URI, and count bytes until you've received the appropriate number to get to the image size block, then close the connection. Your TCP stack would probably not be too happy with you, especially if you were doing that a lot. If I remember right, it wouldn't dispose of the memory until the connection had timed out and would eat up available connections, either on your side or the server's. And, if I ran a site and found my server's performance being compromised by your app prematurely closing connections I'd ban you.
Ultimately, your best solution is to talk to whoever owns the site you are pillaging, and see if they have an API to tell you what the file sizes are. Their side of the connection can find that out a lot faster than your side since you have to retrieve the entire file. If nothing else, offer to write them something that can accomplish that. Maybe they'll understand that, by enabling it, you won't be consuming all their bandwidth retrieving images.