快速远程PHP技术检测图像404
在包含图像之前检测远程图像是否不存在时,哪种 PHP 脚本技术运行速度最快?我的意思是,我不想下载远程图像的所有字节——只要足以检测它是否存在即可。
虽然在主题上有一点点偏差,但我想下载足够的字节来确定 JPEG 的宽度和高度信息。
对于我正在从事的系统设计而言,速度非常重要。
What PHP script technique runs the fastest in detecting if a remote image does not exist before I include the image? I mean, I don't want to download all the bytes of the remote image -- just enough to detect if it exists.
And while on the subject but with just a slight deviation, I'd like to download just enough bytes to determine a JPEG's width and height information.
Speed is very important in my concern here on this system design I'm working on.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我也修改了@Volomike的代码来获取宽度。给你...
所以,使用它我们...
I've modified the @Volomike's code to get width too. Here you go...
So, using it we have...
运行一个
cURL
来执行一个HEAD
请求,而不是完整的GET
我没有对此进行测试,但希望您能明白这个想法:
有关 cURL 的更多信息,请参阅 cURL 文档。
Run a
cURL
that does aHEAD
request insted of a fullGET
I didn't test this, but hopefully you'll get the idea:
See cURL docuentation for more information about cURL.
您应该能够确定 JPEG 的尺寸,而无需加载其全部内容。对于基线 JPEG,即非逐行扫描 JPEG,以字节为单位扫描,直到遇到 0xFFC0。跳过接下来的三个字节。接下来的两个字节表示高度。它们后面还有两个表示宽度的字节。
例如,在“FF C0 00 11 08 01 DE 02 D0”中,01DE代表高度为478,02D0代表宽度为720。
You should be able to determine a JPEG's dimensions without loading up its entire contents. For baseline JPEGs, that is, non-progressive-scan JPEGs, scan in bytes until you come across 0xFFC0. Skip the next three bytes. The next two bytes indicate the height. They are followed by two more bytes that indicate the width.
For example, in "FF C0 00 11 08 01 DE 02 D0", 01DE represents a height of 478 and 02D0 represents a width of 720.
我会发送一个包含 RANGE 标头 尽可能限制实际数据传输(远程服务器可能不接受 RANGE 请求,但仍然值得一试)。无论您使用套接字(直接)还是使用curl 来发出请求,可能都没有太大区别。但是......如果没有基准,你永远不会知道。对于curl,请查看http://docs.php.net/ 中的“CURLOPT_RANGE”选项function.curl-setopt
它可能不适合您的配置文件(“一个小时几个小时,在只有少量 CPU 可用功率的服务器上。”),但您可能想尝试一次处理多个 url,即多个活动连接,并且仅处理那些不会阻塞读取操作的连接。如果限制因素主要/仅是CPU功率...忘记这部分。
套接字:看看 stream_select
curl:参见curl_multi_exec()
如果curl模块不可用,您还可以将 http url 包装器与 stream_context_create() 发送包含 RANGE 标头的请求。
看起来您已经知道收到数据后如何处理它。
I'd send a GET request that contains a RANGE header to limit the actual data transfer where possible (the remote server might not honour the RANGE request but it's still worth a try). It probably doesn't make much difference whether you use sockets (directly) or curl to make the requests. But... you never know without benchmarks. For curl take a look at the "CURLOPT_RANGE" option at http://docs.php.net/function.curl-setopt
It probably doesn't fit your profile ("several an hour, on a server with only slim CPU power available.") but you might want to try handling multiple urls at a time, i.e. having multiple active connections and only handle those that won't block on a read operation. If the limiting factor is mostly/only cpu power ...forget this part.
sockets: Take a look at stream_select
curl: see curl_multi_exec()
If the curl module is unavailable you can also use the http url wrapper in combination with stream_context_create() to send a request containing a RANGE header.
Looks like you've already figured out what to do with the data once you've received it.
我认为以下例程将仅检索 JPG、GIF 和 PNG 的图像高度,或者在 404 或其他图像类型上返回 === FALSE 条件。该例程还使用最少的服务器资源来执行此操作,因为即使添加了字节限制,file_get_contents() 路由似乎也会实际下载文件,就像 getimagesize() 下载文件一样。与此相比,您可以看到性能受到的影响。
该例程的工作方式是从文件中仅下载 300 字节。不幸的是,与 GIF 或 PNG 不同,JPEG 在文件中将其高度值推得很远,因此我不得不以字节为单位读取文件。然后,它使用这些字节扫描该标头中的 JFIF、PNG 或 GIF,让我们知道它是什么类型。一旦我们有了这个,我们就可以在每个上使用独特的例程来解析标头。请注意,JPEG 必须首先使用带有 H* 的 unpack(),然后扫描 ffc2 或 ffc0 并进行处理。然而,GIF 必须首先使用 h* 进行 unpack()(差别很大)。
这个函数是我通过反复试验创建的,可能是错误的。我在几张图像上运行了它,效果似乎很好。如果您发现其中有问题,请考虑告诉我。
无论如何,这个系统将让我确定图像高度并丢弃该图像并找到另一个(如果太高)。无论我找到什么随机图像,我都会在 HTML 的 IMG 标记中设置宽度,它会自动调整高度 - 但只有当图像低于特定高度时才看起来不错。此外,它还会执行 404 检查,看看另一台服务器返回给我的图像是否不再存在或禁止跨站点链接。由于我手动将图像设置为固定宽度,因此我不在乎读取图像宽度。您可以调整此函数,并且通常只需向前查看几个小字节即可找到图像宽度(如果您愿意的话)。
I think the following routine will retrieve just the image heights for JPG, GIF, and PNG, or return an === FALSE condition on a 404 or other image type. The routine also does this with the least server resources because the file_get_contents() route appears to actually download the file even with byte restriction added in, as does getimagesize() download the file. You can see the performance hit compared to this.
The way this routine works is that it downloads just 300 bytes from the file. Unfortunately JPEG pushes its height value pretty far out in a file unlike GIF or PNG and so I had to read the file that far out in bytes. Then, with those bytes, it scans for JFIF, PNG, or GIF in that header to let us know which file type it is. Once we have that, we then use unique routines on each to parse the header. Note that JPEG must first use unpack() with H* and then scan for ffc2 or ffc0 and process. GIF, however, must first unpack() with h* (big difference there).
This function was created by me with trial and error, and could be wrong. I ran it on several images and it appears to work good. If you find a fault in it, consider letting me know.
Anyway, this system will let me determine an image height and discard the image and find another if too tall. On whatever random image I find, I set width in the IMG tag of the HTML and it automatically resizes the height -- but looks good only if the image is under a certain height. As well, it does a 404 check to see if the image that was returned by another server to me was not for an image that no longer exists or which prohibits cross-site linking. And since I am manually setting the images to a fixed width, I don't care to read the image width. You can adapt this function and usually look just a few small bytes forward to find image widths should you want to do so.
将图像存储在本地。这是非常简单且有保证的解决方案。
Store images locally. That's very simple and guaranteed solution.