如何在不加载完整文件的情况下检查文件是否是图像?有没有图像头读取库?

发布于 2024-08-16 13:02:39 字数 932 浏览 11 评论 0原文

编辑:

抱歉,我想我的问题很模糊。我希望有一种方法可以检查文件是否不是图像,而不浪费时间加载整个图像,因为这样我就可以稍后完成其余的加载。我不想只检查文件扩展名。

该应用程序仅查看图像。通过“检查有效性”,我的意思是“检测并跳过目录中的非图像文件”。如果像素数据已损坏,我仍想将其视为图像。

我分配页码并将这些图像配对。有些图像是单个左页或右页。有些图像很宽,是左右页的“展开”。例如,pagesAt(3)和pagesAt(4)可以返回相同的std ::对图像或std ::对相同的宽图像。

有时,存在奇数个“薄”图像,并且第一个图像将单独显示,类似于宽图像。一个例子是单个封面页。

不知道目录中的哪些文件是非图像意味着我无法自信地分配这些页码并将文件配对以进行显示。此外,用户可能决定跳转到第 X 页,当我稍后发现并删除非图像文件并相应地重新分配页码时,第 X 页可能会显示为不同的图像。

原文

以防万一,我使用的是 Qt 库中的 c++ 和 QImage。

我正在迭代一个目录并在图像的路径上使用 QImage 构造函数。当然,这非常慢并且使应用程序感觉没有响应。但是,它确实允许我检测无效的图像文件并尽早忽略它们。

我可以在浏览目录时仅保存图像的路径,并仅在需要时才实际加载它们,但随后我不知道图像是否无效。

我正在考虑将这两者结合起来。即在遍历目录时,仅读取图像的标题以检查有效性,然后在需要时加载图像数据。

那么,

仅加载图像标题会比加载整个图像快得多吗?或者做一些 i/o 来读取标题意味着我最好完成完整加载图像?稍后,我还将解压缩档案中的图像,因此这也适用于仅解压缩标头与解压缩整个文件。

另外,我不知道如何加载/读取图像标题。有没有一个库可以只读取图像的标题?否则,我必须将每个文件作为流打开,并自行为所有文件类型编写图像标题读取器。

edit:

Sorry, I guess my question was vague. I'd like to have a way to check if a file is not an image without wasting time loading the whole image, because then I can do the rest of the loading later. I don't want to just check the file extension.

The application just views the images. By 'checking the validity', I meant 'detecting and skipping the non-image files' also in the directory. If the pixel data is corrupt, I'd like to still treat it as an image.

I assign page numbers and pair up these images. Some images are the single left or right page. Some images are wide and are the "spread" of the left and right pages. For example, pagesAt(3) and pagesAt(4) could return the same std::pair of images or a std::pair of the same wide image.

Sometimes, there is an odd number of 'thin' images, and the first image is to be displayed on its own, similar to a wide image. An example would be a single cover page.

Not knowing which files in the directory are non-images means I can't confidently assign those page numbers and pair up the files for displaying. Also, the user may decide to jump to page X, and when I later discover and remove a non-image file and reassign page numbers accordingly, page X could appear to be a different image.

original:

In case it matters, I'm using c++ and QImage from the Qt library.

I'm iterating through a directory and using the QImage constructor on the paths to the images. This is, of course, pretty slow and makes the application feel unresponsive. However, it does allow me to detect invalid image files and ignore them early on.

I could just save only the paths to the images while going through the directory and actually load them only when they're needed, but then I wouldn't know if the image is invalid or not.

I'm considering doing a combination of these two. i.e. While iterating through the directory, reading only the headers of the images to check validity and then load image data when needed.

So,

Will just loading the image headers be much faster than loading the whole image? Or is doing a bit of i/o to read the header mean I might as well finish off loading image in full? Later on, I'll be uncompressing images from archives as well, so this also applies to uncompressing just the header vs uncompressing the whole file.

Also, I don't know how to load/read just the image headers. Is there a library that can read just the headers of images? Otherwise, I'd have to open each file as a stream and code image header readers for all the filetypes on my own.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

深居我梦 2024-08-23 13:02:39

Unix file 工具(几乎一直存在)就是这样做的。它是一个简单的工具,使用已知文件头和二进制签名的数据库来识别文件的类型(并可能提取一些简单的信息)。

该数据库是一个简单的文本文件(为了提高效率而进行编译),它使用简单的结构化格式(在 man magic 中记录)描述了大量的二进制文件格式。源代码位于 /usr/share/file/magic (在 Ubuntu 中)。例如,PNG 文件格式的条目如下所示:

0       string          \x89PNG\x0d\x0a\x1a\x0a         PNG image
!:mime  image/png
>16     belong          x               \b, %ld x
>20     belong          x               %ld,
>24     byte            x               %d-bit
>25     byte            0               grayscale,
>25     byte            2               \b/color RGB,
>25     byte            3               colormap,
>25     byte            4               gray+alpha,
>25     byte            6               \b/color RGBA,
>28     byte            0               non-interlaced
>28     byte            1               interlaced

您可以仅提取图像文件类型的签名,并构建您自己的“嗅探器”,甚至使用 file 工具中的解析器(这似乎是 BSD 许可的)。

The Unix file tool (which has been around since almost forever) does exactly this. It is a simple tool that uses a database of known file headers and binary signatures to identify the type of the file (and potentially extract some simple information).

The database is a simple text file (which gets compiled for efficiency) that describes a plethora of binary file formats, using a simple structured format (documented in man magic). The source is in /usr/share/file/magic (in Ubuntu). For example, the entry for the PNG file format looks like this:

0       string          \x89PNG\x0d\x0a\x1a\x0a         PNG image
!:mime  image/png
>16     belong          x               \b, %ld x
>20     belong          x               %ld,
>24     byte            x               %d-bit
>25     byte            0               grayscale,
>25     byte            2               \b/color RGB,
>25     byte            3               colormap,
>25     byte            4               gray+alpha,
>25     byte            6               \b/color RGBA,
>28     byte            0               non-interlaced
>28     byte            1               interlaced

You could extract the signatures for just the image file types, and build your own "sniffer", or even use the parser from the file tool (which seems to be BSD-licensed).

如果没有你 2024-08-23 13:02:39

只是补充我的 2 美分:您可以使用 QImageReader 来获取有关图像文件的信息而不实际加载文件。

例如,使用 .format 方法,您可以检查文件的图像格式。

来自官方 Qt 文档 ( http://qt-project.org/doc /qt-4.8/qimagereader.html#format ):

返回 QImageReader 用于读取图像的格式。您可以致电
将设备分配给读卡器后确定此功能
设备的格式。例如:QImageReader reader("image.png");
// reader.format() == "png" 如果阅读器无法读取任何图像
设备(例如,那里没有图像,或者图像已经
已读),或者如果格式不受支持,则此函数返回
空 QByteArray()。

Just to add my 2 cents: you can use QImageReader to get information about image files without actually loading the files.

For example with the .format method you can check a file's image format.

From the official Qt doc ( http://qt-project.org/doc/qt-4.8/qimagereader.html#format ):

Returns the format QImageReader uses for reading images. You can call
this function after assigning a device to the reader to determine the
format of the device. For example: QImageReader reader("image.png");
// reader.format() == "png" If the reader cannot read any image from
the device (e.g., there is no image there, or the image has already
been read), or if the format is unsupported, this function returns an
empty QByteArray().

萌无敌 2024-08-23 13:02:39

我不知道仅加载标题的答案,这可能取决于您尝试加载的图像类型。如果可能的话,您可以考虑使用 Qt::Concurrent 来浏览图像,同时允许程序的其余部分继续。在这种情况下,您可能最初将所有条目表示为未知状态,然后在验证完成后更改为图像或非图像。

I don't know the answer about just loading the header, and it likely depends on the image type that you are trying to load. You might consider using Qt::Concurrent to go through the images while allowing the rest of the program to continue, if it's possible. In this case, you would probably initially represent all of the entries as an unknown state, and then change to image or not-an-image when the verification is done.

铃予 2024-08-23 13:02:39

如果您谈论的是一般图像文件,而不仅仅是特定格式,我愿意打赌在某些情况下图像标题有效,但图像数据无效。您还没有透露有关您的应用程序的任何信息,是否无法在后台添加一个线程,该线程可以在内存中保留一些图像,并根据用户接下来可能加载的内容来交换它们? IE:幻灯片放映应用程序会加载当前图像前后的 1 或 2 个图像。或者可能在图像名称旁边显示一个问号,直到后台线程可以验证数据的有效性。

If you're talking about image files in general, and not just a specific format, I'd be willing to bet there are cases where the image header is valid, but the image data isn't. You haven't said anything about your application, is there no way you could add in a thread in the background that could maybe keep a few images in ram, and swap them in and out depending on what the user may load next? IE: a slide show app would load 1 or 2 images ahead and behind the current one. Or maybe have a question mark displayed next to the image name until the background thread can verify that validity of the data.

酒儿 2024-08-23 13:02:39

虽然在本地文件系统上打开和读取文件头的成本不会太高,但如果文件位于远程(网络)文件系统上,则成本可能会很高。更糟糕的是,如果您正在访问使用分层存储管理保存的文件,则读取文件可能会非常困难。昂贵的。

如果这个应用程序适合您,那么您可以决定不担心这些问题。但是,如果您要向公众分发您的应用程序,那么在绝对必要之前阅读该文件会给某些用户带来问题。

Raymond Chen 为他的博客写了一篇关于此的文章 旧的新事物

While opening and reading the header of a file on a local filesystem should not be too expensive, it can be expensive if the file is on a remote (networked) file system. Even worse, if you are accessing files saved with hierarchical storage management, reading the file can be very expensive.

If this app is just for you, then you can decide not to worry about those issues. But if you are distributing your app to the public, reading the file before you absolutely have to will cause problems for some users.

Raymond Chen wrote an article about this for his blog The Old New Thing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文