如何以编程方式检查图像(PNG、JPEG 或 GIF)是否已损坏?
好的。所以我有大约 250,000 张高分辨率图像。我想做的就是检查所有这些并找到已损坏的。如果您知道 4scrape 是什么,那么您就知道图像的本质。
对我来说,损坏的图像是加载到 Firefox 中后显示的
无法显示图像“某某图像”,因为它包含错误。
现在,我可以选择所有 250,000 张图像(约 150GB)并将它们拖放到 Firefox 中。但这会很糟糕,因为我认为 Mozilla 设计的 Firefox 并不是为了打开 250,000 个选项卡。不,我需要一种方法来以编程方式检查图像是否已损坏。
有谁知道可以做这些事情的 PHP 或 Python 库吗?或者适用于 Windows 的现有软件?
我已经删除了明显损坏的图像(例如 0 字节的图像),但我大约 99.9% 确信在我的收藏中还有更多有病的图像。
Okay. So I have about 250,000 high resolution images. What I want to do is go through all of them and find ones that are corrupted. If you know what 4scrape is, then you know the nature of the images I.
Corrupted, to me, is the image is loaded into Firefox and it says
The image “such and such image” cannot be displayed, because it contains errors.
Now, I could select all of my 250,000 images (~150gb) and drag-n-drop them into Firefox. That would be bad though, because I don't think Mozilla designed Firefox to open 250,000 tabs. No, I need a way to programmatically check whether an image is corrupted.
Does anyone know a PHP or Python library which can do something along these lines? Or an existing piece of software for Windows?
I have already removed obviously corrupted images (such as ones that are 0 bytes) but I'm about 99.9% sure that there are more diseased images floating around in my throng of a collection.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
一种简单的方法是尝试使用 PIL(Python 成像库)加载和验证文件。
捕获异常...
来自文档:
im.verify()
尝试确定文件是否损坏,而不实际解码图像数据。如果此方法发现任何问题,它会引发适当的异常。该方法仅适用于新打开的图像;如果图像已经加载,则结果未定义。另外,如果使用此方法后需要加载图片,则必须重新打开图片文件。
An easy way would be to try loading and verifying the files with PIL (Python Imaging Library).
Catch the exceptions...
From the documentation:
im.verify()
Attempts to determine if the file is broken, without actually decoding the image data. If this method finds any problems, it raises suitable exceptions. This method only works on a newly opened image; if the image has already been loaded, the result is undefined. Also, if you need to load the image after using this method, you must reopen the image file.
我建议你查看 imagemagick :http://www.imagemagick.org/
那里有一个工具称为识别,您可以将其与脚本/标准输出结合使用,也可以使用提供的编程接口
i suggest you check out imagemagick for this: http://www.imagemagick.org/
there you have a tool called identify which you can either use in combination with a script/stdout or you can use the programming interface provided
在 PHP 中,使用 exif_imagetype():
编辑:或者您可以尝试使用 ImageCreateFromString() 完全加载图像:
In PHP, with exif_imagetype():
EDIT: Or you can try to fully load the image with ImageCreateFromString():
如果您的确切要求是它在 FireFox 中正确显示,您可能会遇到困难 - 唯一确定的方法是链接到与 FireFox 完全相同的图像加载源代码。
只需尝试使用任意数量的图像库打开文件即可检测到基本图像损坏(文件不完整)。
然而,许多图像可能无法显示,仅仅是因为它们拉伸了您所使用的特定查看器无法处理的文件格式的一部分(特别是 GIF 有很多这样的边缘情况,但您可以找到 JPEG 和罕见的 PNG 文件只能在特定查看器中显示)。还有一些丑陋的 JPEG 边缘情况,其中文件在查看器 X 中似乎未损坏,但实际上文件已被缩短并且仅能正确显示,因为丢失的信息很少(FireFox 可以正确显示一些被截断的 JPEG [你会得到一个灰色的底部],但其他的会导致 FireFox 似乎将它们加载到一半,然后显示错误消息而不是部分图像)
If your exact requirements are that it show correctly in FireFox you may have a difficult time - the only way to be sure would be to link to the exact same image loading source code as FireFox.
Basic image corruption (file is incomplete) can be detected simply by trying to open the file using any number of image libraries.
However many images can fail to display simply because they stretch a part of the file format that the particular viewer you are using can't handle (GIF in particular has a lot of these edge cases, but you can find JPEG and the rare PNG file that can only be displayed in specific viewers). There are also some ugly JPEG edge cases where the file appears to be uncorrupted in viewer X, but in reality the file has been cut short and is only displaying correctly because very little information has been lost (FireFox can show some cut off JPEGs correctly [you get a grey bottom], but others result in FireFox seeming the load them half way and then display the error message instead of the partial image)
如果 imagemagick 可用,您可以使用它:
如果您想检查整个文件夹,
如果您只想检查文件:
You could use imagemagick if it is available:
if you want to do a whole folder
if you want to just check a file: