当前位置：文江博客话题详情

检索图像的尺寸而不下载整个图像

发布于 2024-11-06 06:22:31 字数 127 浏览 11 评论 0原文

我使用 open-uri 下载远程图像，然后使用 imagesize gem 获取尺寸。问题是，当需要处理大量图像时，速度会变得非常慢。

如何下载足够的信息来了解各种图像格式的尺寸？

还有其他方法可以优化吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

歌入人心 2024-11-13 06:22:31

我相信，如果您使用原始套接字（发出简单的http请求），则无需下载超过几个字节（并中止连接）来确定图像的尺寸。

require 'uri'
require 'socket'
raise "Usage: url [bytes-to-read [output-filename]]" if ARGV.length < 1
uri   = URI.parse(ARGV.shift)
bytes = (ARGV.shift || 50).to_i
file  = ARGV.shift
$stderr.puts "Downloading #{bytes} bytes from #{uri.to_s}"
Socket.tcp(uri.host, uri.port) do |sock|
  # http request
  sock.print "GET #{uri.path} HTTP/1.0\r\nHost: #{uri.host}\r\n\r\n"
  sock.close_write
  # http response headers
  while sock.readline.chomp != ""; end
  # http response body, we need first N bytes
  if file
    open(file,"wb") {|f| f.write(sock.read(bytes)) }
  else
    puts sock.read(bytes)
  end
end

例如，如果我将 PNG 文件的前 33 个字节（GIF 为 13 个字节）推入 exiftool，它将给出图像大小

$ ruby download_partial.rb http://yardoc.org/images/ss5.png 33 | exiftool - | grep ^Image
Downloading 33 bytes from http://yardoc.org/images/ss5.png
Image Width                     : 1000
Image Height                    : 300
Image Size                      : 1000x300

I believe if you go raw socket (issue bare bones http request), there's no need to download more than a few bytes (and abort the connection) to determine dimensions of images.

require 'uri'
require 'socket'
raise "Usage: url [bytes-to-read [output-filename]]" if ARGV.length < 1
uri   = URI.parse(ARGV.shift)
bytes = (ARGV.shift || 50).to_i
file  = ARGV.shift
$stderr.puts "Downloading #{bytes} bytes from #{uri.to_s}"
Socket.tcp(uri.host, uri.port) do |sock|
  # http request
  sock.print "GET #{uri.path} HTTP/1.0\r\nHost: #{uri.host}\r\n\r\n"
  sock.close_write
  # http response headers
  while sock.readline.chomp != ""; end
  # http response body, we need first N bytes
  if file
    open(file,"wb") {|f| f.write(sock.read(bytes)) }
  else
    puts sock.read(bytes)
  end
end

e.g. if i push the first 33 bytes of a PNG file (13 bytes for a GIF) into exiftool, it will give me the image size

$ ruby download_partial.rb http://yardoc.org/images/ss5.png 33 | exiftool - | grep ^Image
Downloading 33 bytes from http://yardoc.org/images/ss5.png
Image Width                     : 1000
Image Height                    : 300
Image Size                      : 1000x300

回复收藏 0 原文

幼儿园老大 2024-11-13 06:22:31

我不知道有什么方法可以指定使用普通 HTTPd 请求下载多少字节。这是一个全有或全无的情况。

某些文件类型确实允许文件的各个部分，但是，您必须控制服务器才能启用它。

我已经很长时间没有达到这个级别了，但是，理论上您可以使用带有 Net::HTTP 或 Open-URI 的块，并计算字节数，直到收到适当的数字来获取图像大小块，然后关闭连接。您的 TCP 堆栈可能不会对您太满意，尤其是如果您经常这样做的话。如果我没记错的话，它不会处理内存，直到连接超时，并且会耗尽可用的连接，无论是在你这边还是在服务器上。而且，如果我运行一个网站并发现我的服务器性能因您的应用程序过早关闭连接而受到损害，我会禁止您。

最终，最好的解决方案是与您正在掠夺的网站的所有者交谈，看看他们是否有 API 来告诉您文件大小是多少。他们的连接端可以比您的端更快地发现这一点，因为您必须检索整个文件。如果没有别的事，请给他们写一些可以实现这一目标的东西。也许他们会明白，通过启用它，您将不会消耗他们所有的带宽来检索图像。

回复收藏 0 原文

~没有更多了~