There has been a lot of research on image searching and similarity measures. It's not an easy problem. In general, a single int won't be enough to determine if images are very similar. You'll have a high false-positive rate.
However, since there has been a lot of research done, you might take a look at some of it. For example, this paper (PDF) gives a compact image fingerprinting algorithm that is suitable for finding duplicate images quickly and without storing much data. It seems like this is the right approach if you want something robust.
If you're looking for something simpler, but definitely more ad-hoc, this SO question has a few decent ideas.
I would recommend considering moving away from just using an RGB histogram.
A better digest of your image can be obtained if you take a 2d Haar wavelet of the image (its a lot easier than it sounds, its just a lot of averaging and some square roots used to weight your coefficients) and just retain the k largest weighted coefficients in the wavelet as a sparse vector, normalize it, and save that to reduce its size. You should rescale R G and B using perceptual weights beforehand at least or I'd recommend switching to YIQ (or YCoCg, to avoid quantization noise) so you can sample chrominance information with reduced importance.
You can now use the dot product of two of these sparse normalized vectors as a measure of similarity. The image pairs with the largest dot products are going to be very similar in structure. This has the benefit of being slightly resistant to resizing, hue shifting and watermarking, and being really easy to implement and compact.
You can trade off storage and accuracy by increasing or decreasing k.
Sorting by a single numeric score is going to be intractable for this sort of classification problem. If you think about it it would require images to only be able to 'change' along one axis, but they don't. This is why you need a vector of features. In the Haar wavelet case its approximately where the sharpest discontinuities in the image occur. You can compute a distance between images pairwise, but since all you have is a distance metric a linear ordering has no way to express a 'triangle' of 3 images that are all equally distant. (i.e. think of an image that is all green, an image that is all red and an image that is all blue.)
That means that any real solution to your problem will need O(n^2) operations in the number of images you have. Whereas if it had been possible to linearize the measure, you could require just O(n log n), or O(n) if the measure was suitable for, say, a radix sort. That said, you don't need to spend O(n^2) since in practice you don't need to sift through the whole set, you just need to find the stuff thats nearer than some threshold. So by applying one of several techniques to partition your sparse vector space you can obtain much faster asymptotics for the 'finding me k of the images that are more similar than a given threshold' problem than naively comparing every image against every image, giving you what you likely need... if not precisely what you asked for.
In any event, I used this a few years ago to good effect personally when trying to minimize the number of different textures I was storing, but there has also been a lot of research noise in this space showing its efficacy (and in this case comparing it to a more sophisticated form of histogram classification):
If you need better accuracy in detection, the minHash and tf-idf algorithms can be used with the Haar wavelet (or the histogram) to deal with edits more robustly:
Finally, Stanford has an image search based on a more exotic variant of this kind of approach, based on doing more feature extraction from the wavelets to find rotated or scaled sections of images, etc, but that probably goes way beyond the amount of work you'd want to do.
What Fast Multiresolution Image Querying does is split the image into 3 pieces based on the YIQ colorspace (better for matching differences than RGB). Then the image is essentially compressed using a wavelet algorithm until only the most prominent features from each colorspace are available. These points are stored in a data structure. Query images go through the same process, and the prominent features in the query image are matched against those in the stored database. The more matches, the more likely the images are similar.
The algorithm is often used for "query by sketch" functionality. My software only allowed entering query images via URL, so there was no user interface. However, I found it worked exceptionally well for matching thumbnails to the large version of that image.
还有一种称为主成分分析的技术,可让您将 n 维数据减少到任意较小的维数。 因此一张具有 n 个特征的图片可以简化为一个特征。 然而,这仍然不是比较图像的最佳方法。
A picture has many features, so unless you narrow yourself to one, like average brightness, you are dealing with an n-dimensional problem space.
If I asked you to assign a single integer to the cities of the world, so I could tell which ones are close, the results wouldn't be great. You might, for example, choose time zone as your single integer and get good results with certain cities. However, a city near the north pole and another city near the south pole can also be in the same time zone, even though they are at opposite ends of the planet. If I let you use two integers, you could get very good results with latitude and longitude. The problem is the same for image similarity.
All that said, there are algorithms that try to cluster similar images together, which is effectively what you're asking for. This is what happens when you do face detection with Picasa. Even before you identify any faces, it clusters similar ones together so that it's easy to go through a set of similar faces and give most of them the same name.
There is also a technique called Principle Component Analysis, which lets you reduce n-dimensional data down to any smaller number of dimensions. So a picture with n features could be reduced to one feature. However, this is still not the best approach for comparing images.
有一个 C 库(“libphash” - http://phash.org/)将计算“感知哈希”图像的,并允许您通过比较哈希值来检测相似的图像(因此您不必将每个图像直接与其他图像进行比较),但不幸的是,当我尝试时,它似乎不太准确。
There's a C library ("libphash" - http://phash.org/) that will calculate a "perceptual hash" of an image and allow you to detect similar images by comparing hashes (so you don't have to compare each image directly against every other image) but unfortunately it didn't seem to be very accurate when I tried it.
还有一件事。 如果您对 R、G 和 B 进行权重。权重绿色最高,然后是红色,然后是蓝色,以匹配人类视觉敏感度。
You have to decide what is "similar." Contrast? Hue?
Is a picture "similar" to the same picture upside-down?
I bet you can find a lot of "close calls" by breaking images up into 4x4 pieces and getting an average color for each grid cell. You'd have sixteen scores per image. To judge similarity, you would just do a sum of squares of differences between images.
I don't think a single hash makes sense, unless it's against a single concept like hue, or brightness, or contrast.
Here's your idea:
0299393
0599483
0499994 <- possible dupe
0499999 <- possible dupe
1002039
4995994
6004994
First of all, I'm going to assume these are decimal numbers that are R*(2^16)+G*(2^8)+B, or something like that. Obviously that's no good because red is weighted inordinately.
One solution is to perform a RMS/RSS comparison on every pair of pictures required to perform a bubble sort. Second, you could perform an FFT on each image and do some axis averaging to retrieve a single integer for each image which you would use as an index to sort by. You may consider doing whatever comparison on a resized (25%, 10%) version of the original depending on how small a difference you choose to ignore and how much speedup you require. Let me know if these solutions are interesting, and we can discuss or I can provide sample code.
Most modern approaches to detect Near duplicate image detection use interesting points detection and descriptors describing area around such points. Often SIFT is used. Then you can quatize descriptors and use clusters as visual word vocabulary.
So if we see on ratio of common visual words of two images to all visual words of these images you estimate similarity between images. There are a lot of interesting articles. One of them is Near Duplicate Image Detection: minHash and tf-idf Weighting
发布评论
评论(12)
关于图像搜索和相似性度量已经有很多研究。 这不是一个容易的问题。 一般来说,单个
int
不足以确定图像是否非常相似。 你的假阳性率会很高。但是,由于已经进行了大量研究,您可以看一下其中的一些。 例如,本文(PDF )给出了一种紧凑的图像指纹算法,适合快速查找重复图像且无需存储大量数据。 如果您想要强大的东西,这似乎是正确的方法。
如果您正在寻找更简单但绝对更临时的东西,这个问题有一些不错的想法。
There has been a lot of research on image searching and similarity measures. It's not an easy problem. In general, a single
int
won't be enough to determine if images are very similar. You'll have a high false-positive rate.However, since there has been a lot of research done, you might take a look at some of it. For example, this paper (PDF) gives a compact image fingerprinting algorithm that is suitable for finding duplicate images quickly and without storing much data. It seems like this is the right approach if you want something robust.
If you're looking for something simpler, but definitely more ad-hoc, this SO question has a few decent ideas.
我建议考虑不再仅使用 RGB 直方图。
如果您采用图像的 2d Haar 小波(它比听起来容易得多,它只是大量平均和一些用于对系数进行加权的平方根)并且仅保留 k 个最大的值,则可以获得更好的图像摘要将小波中的加权系数作为稀疏向量,对其进行归一化,然后保存以减小其大小。 您应该至少事先使用感知权重重新调整 RG 和 B,或者我建议切换到 YIQ(或 YCoCg,以避免量化噪声),以便您可以降低重要性对色度信息进行采样。
现在,您可以使用其中两个稀疏归一化向量的点积作为相似性度量。 具有最大点积的图像对在结构上将非常相似。 这样做的好处是稍微抵抗调整大小、色调变化和水印,并且非常容易实现和紧凑。
您可以通过增加或减少 k 来权衡存储和准确性。
对于此类分类问题,按单个数字分数进行排序将很棘手。 如果您考虑一下,它会要求图像只能沿一个轴“变化”,但事实并非如此。 这就是为什么您需要特征向量的原因。 在哈尔小波情况下,它大约是图像中最尖锐的不连续性发生的地方。 您可以成对地计算图像之间的距离,但由于您拥有的只是一个距离度量,因此线性排序无法表达距离相等的 3 个图像的“三角形”。 (即想象一张全绿色的图像、一张全红色的图像和一张全蓝色的图像。)
这意味着解决问题的任何真正解决方案都需要 O(n^2) 次操作,其中图像数量为有。 然而,如果可以对度量进行线性化,则可能只需要 O(n log n),或者如果度量适合基数排序,则可能需要 O(n)。 也就是说,您不需要花费 O(n^2),因为在实践中您不需要筛选整个集合,您只需要找到比某个阈值更接近的东西。 因此,通过应用多种技术之一来划分稀疏向量空间,您可以获得更快的渐进性,以解决“找到比给定阈值更相似的 k 个图像”问题,而不是天真地将每个图像与每个图像进行比较,从而获得以下结果:您可能需要...即使不是您所要求的。
无论如何,几年前,当我试图最大程度地减少我存储的不同纹理的数量时,我个人就使用了这个方法,取得了良好的效果,但是在这个领域也有很多研究噪音显示了它的功效(在这种情况下比较它转换为更复杂的直方图分类形式):
http://www.cs。 Princeton.edu/cass/papers/spam_ceas07.pdf
如果您需要更高的检测精度,可以将 minHash 和 tf-idf 算法与 Haar 小波(或直方图)结合使用,以更稳健地处理编辑:
< a href="http://cmp.felk.cvut.cz/~chum/papers/chum_bmvc08.pdf" rel="noreferrer">http://cmp.felk.cvut.cz/~chum/papers/chum_bmvc08。最后
,斯坦福大学有一个基于这种方法的更奇特变体的图像搜索,基于从小波中进行更多特征提取以查找图像的旋转或缩放部分等,但这可能远远超出了您想要完成的工作量。
http://wang14.ist.psu.edu/cgi-bin/zwang /regionsearch_show.cgi
I would recommend considering moving away from just using an RGB histogram.
A better digest of your image can be obtained if you take a 2d Haar wavelet of the image (its a lot easier than it sounds, its just a lot of averaging and some square roots used to weight your coefficients) and just retain the k largest weighted coefficients in the wavelet as a sparse vector, normalize it, and save that to reduce its size. You should rescale R G and B using perceptual weights beforehand at least or I'd recommend switching to YIQ (or YCoCg, to avoid quantization noise) so you can sample chrominance information with reduced importance.
You can now use the dot product of two of these sparse normalized vectors as a measure of similarity. The image pairs with the largest dot products are going to be very similar in structure. This has the benefit of being slightly resistant to resizing, hue shifting and watermarking, and being really easy to implement and compact.
You can trade off storage and accuracy by increasing or decreasing k.
Sorting by a single numeric score is going to be intractable for this sort of classification problem. If you think about it it would require images to only be able to 'change' along one axis, but they don't. This is why you need a vector of features. In the Haar wavelet case its approximately where the sharpest discontinuities in the image occur. You can compute a distance between images pairwise, but since all you have is a distance metric a linear ordering has no way to express a 'triangle' of 3 images that are all equally distant. (i.e. think of an image that is all green, an image that is all red and an image that is all blue.)
That means that any real solution to your problem will need O(n^2) operations in the number of images you have. Whereas if it had been possible to linearize the measure, you could require just O(n log n), or O(n) if the measure was suitable for, say, a radix sort. That said, you don't need to spend O(n^2) since in practice you don't need to sift through the whole set, you just need to find the stuff thats nearer than some threshold. So by applying one of several techniques to partition your sparse vector space you can obtain much faster asymptotics for the 'finding me k of the images that are more similar than a given threshold' problem than naively comparing every image against every image, giving you what you likely need... if not precisely what you asked for.
In any event, I used this a few years ago to good effect personally when trying to minimize the number of different textures I was storing, but there has also been a lot of research noise in this space showing its efficacy (and in this case comparing it to a more sophisticated form of histogram classification):
http://www.cs.princeton.edu/cass/papers/spam_ceas07.pdf
If you need better accuracy in detection, the minHash and tf-idf algorithms can be used with the Haar wavelet (or the histogram) to deal with edits more robustly:
http://cmp.felk.cvut.cz/~chum/papers/chum_bmvc08.pdf
Finally, Stanford has an image search based on a more exotic variant of this kind of approach, based on doing more feature extraction from the wavelets to find rotated or scaled sections of images, etc, but that probably goes way beyond the amount of work you'd want to do.
http://wang14.ist.psu.edu/cgi-bin/zwang/regionsearch_show.cgi
我为此实现了一种非常可靠的算法,称为快速多分辨率图像查询。 我的(古老的、未维护的)代码是这里。
快速多分辨率图像查询的作用是根据 YIQ 色彩空间(比 RGB 更适合匹配差异)将图像分成 3 部分。 然后使用小波算法对图像进行本质上的压缩,直到只有每个色彩空间中最突出的特征可用。 这些点存储在数据结构中。 查询图像经历相同的过程,并且查询图像中的突出特征与存储的数据库中的突出特征进行匹配。 匹配越多,图像就越有可能相似。
该算法通常用于“通过草图查询”功能。 我的软件只允许通过 URL 输入查询图像,因此没有用户界面。 然而,我发现它对于将缩略图与该图像的大版本进行匹配非常有效。
I implemented a very reliable algorithm for this called Fast Multiresolution Image Querying. My (ancient, unmaintained) code for that is here.
What Fast Multiresolution Image Querying does is split the image into 3 pieces based on the YIQ colorspace (better for matching differences than RGB). Then the image is essentially compressed using a wavelet algorithm until only the most prominent features from each colorspace are available. These points are stored in a data structure. Query images go through the same process, and the prominent features in the query image are matched against those in the stored database. The more matches, the more likely the images are similar.
The algorithm is often used for "query by sketch" functionality. My software only allowed entering query images via URL, so there was no user interface. However, I found it worked exceptionally well for matching thumbnails to the large version of that image.
一张图片有很多特征,因此除非您将范围缩小到一个特征(例如平均亮度),否则您正在处理一个 n 维问题空间。
如果我要求你为世界上的城市分配一个整数,这样我就可以知道哪些城市是接近的,结果不会很好。 例如,您可以选择时区作为单个整数,并在某些城市获得良好的结果。 然而,北极附近的一个城市和南极附近的另一个城市也可以位于同一时区,即使它们位于地球的两端。 如果我让您使用两个整数,您可以获得非常好的纬度和经度结果。 图像相似度的问题是相同的。
话虽如此,有些算法会尝试将相似的图像聚集在一起,这实际上就是您所要求的。 这就是您使用 Picasa 进行人脸检测时会发生的情况。 即使在您识别任何面孔之前,它也会将相似的面孔聚集在一起,以便您可以轻松地浏览一组相似的面孔并为其中大多数提供相同的名称。
还有一种称为主成分分析的技术,可让您将 n 维数据减少到任意较小的维数。 因此一张具有 n 个特征的图片可以简化为一个特征。 然而,这仍然不是比较图像的最佳方法。
A picture has many features, so unless you narrow yourself to one, like average brightness, you are dealing with an n-dimensional problem space.
If I asked you to assign a single integer to the cities of the world, so I could tell which ones are close, the results wouldn't be great. You might, for example, choose time zone as your single integer and get good results with certain cities. However, a city near the north pole and another city near the south pole can also be in the same time zone, even though they are at opposite ends of the planet. If I let you use two integers, you could get very good results with latitude and longitude. The problem is the same for image similarity.
All that said, there are algorithms that try to cluster similar images together, which is effectively what you're asking for. This is what happens when you do face detection with Picasa. Even before you identify any faces, it clusters similar ones together so that it's easy to go through a set of similar faces and give most of them the same name.
There is also a technique called Principle Component Analysis, which lets you reduce n-dimensional data down to any smaller number of dimensions. So a picture with n features could be reduced to one feature. However, this is still not the best approach for comparing images.
有一个 C 库(“libphash” - http://phash.org/)将计算“感知哈希”图像的,并允许您通过比较哈希值来检测相似的图像(因此您不必将每个图像直接与其他图像进行比较),但不幸的是,当我尝试时,它似乎不太准确。
There's a C library ("libphash" - http://phash.org/) that will calculate a "perceptual hash" of an image and allow you to detect similar images by comparing hashes (so you don't have to compare each image directly against every other image) but unfortunately it didn't seem to be very accurate when I tried it.
你必须决定什么是“相似”。 对比? 色调?
一张图片是否与同一张颠倒的图片“相似”?
我敢打赌,通过将图像分解为 4x4 的块并获取每个网格单元的平均颜色,您可以找到很多“千钧一发”的结果。 每张图像有十六个分数。 要判断相似性,您只需计算图像之间差异的平方和即可。
我认为单个散列没有意义,除非它反对色调、亮度或对比度等单一概念。
这是你的想法:
首先,我假设这些是十进制数,即 R*(2^16)+G*(2^8)+B 或类似的数字。 显然这不好,因为红色的权重过大。
进入 HSV 空间会更好。 您可以传播以下位HSV 放入哈希中,或者您可以单独解决 H 或 S 或 V,或者每个图像可以有三个哈希。
还有一件事。 如果您对 R、G 和 B 进行权重。权重绿色最高,然后是红色,然后是蓝色,以匹配人类视觉敏感度。
You have to decide what is "similar." Contrast? Hue?
Is a picture "similar" to the same picture upside-down?
I bet you can find a lot of "close calls" by breaking images up into 4x4 pieces and getting an average color for each grid cell. You'd have sixteen scores per image. To judge similarity, you would just do a sum of squares of differences between images.
I don't think a single hash makes sense, unless it's against a single concept like hue, or brightness, or contrast.
Here's your idea:
First of all, I'm going to assume these are decimal numbers that are R*(2^16)+G*(2^8)+B, or something like that. Obviously that's no good because red is weighted inordinately.
Moving into HSV space would be better. You could spread the bits of HSV out into the hash, or you could just settle H or S or V individually, or you could have three hashes per image.
One more thing. If you do weight R, G, and B. Weight green highest, then red, then blue to match human visual sensitivity.
在网络服务时代,您可以尝试 http://tineye.com
In the age of web services you could try http://tineye.com
问题识别相似图像的好方法?似乎为您的问题提供了解决方案。
The question Good way to identify similar images? seems to provide a solution for your question.
我假设其他重复图像搜索软件对图像执行 FFT,并将不同频率的值存储为向量:
然后您可以通过计算权重之间的距离来比较两个图像的相等性两个图像的向量:
i assumed that other duplicate image search software performs an FFT on the images, and stores the values of the different frequencies as a vectors:
and then you can compare two images for equalness by computing the distance between the weight vectors of two images:
一种解决方案是对所需的每对图片执行 RMS/RSS 比较执行冒泡排序。 其次,您可以对每个图像执行 FFT 并进行一些轴平均以检索单个整数对于您将用作排序依据的每个图像。 您可以考虑对调整大小(25%、10%)的原始版本进行任何比较,具体取决于您选择忽略的差异有多小以及您需要多少加速。 让我知道这些解决方案是否有趣,我们可以讨论或者我可以提供示例代码。
One solution is to perform a RMS/RSS comparison on every pair of pictures required to perform a bubble sort. Second, you could perform an FFT on each image and do some axis averaging to retrieve a single integer for each image which you would use as an index to sort by. You may consider doing whatever comparison on a resized (25%, 10%) version of the original depending on how small a difference you choose to ignore and how much speedup you require. Let me know if these solutions are interesting, and we can discuss or I can provide sample code.
大多数现代检测接近重复图像检测的方法都使用有趣的点检测和描述这些点周围区域的描述符。 通常使用 SIFT。 然后,您可以量化描述符并使用集群作为视觉词汇表。
因此,如果我们看到两个图像的常见视觉单词与这些图像的所有视觉单词的比率,您就可以估计图像之间的相似性。 有很多有趣的文章。 其中之一是近重复图像检测:minHash 和 tf-idf 权重
Most modern approaches to detect Near duplicate image detection use interesting points detection and descriptors describing area around such points. Often SIFT is used. Then you can quatize descriptors and use clusters as visual word vocabulary.
So if we see on ratio of common visual words of two images to all visual words of these images you estimate similarity between images. There are a lot of interesting articles. One of them is Near Duplicate Image Detection: minHash and tf-idf Weighting
例如,使用 IMMI 扩展和 IMMI,您可以检查许多不同的方法来衡量图像之间的相似性:
http:// /spl.utko.feec.vutbr.cz/en/component/content/article/46-image-processing-extension-for-rapidminer-5
通过定义一些阈值并选择一些方法,您可以测量相似性。
For example using IMMI extension and IMMI you can examine many different ways how to measure similarity between images:
http://spl.utko.feec.vutbr.cz/en/component/content/article/46-image-processing-extension-for-rapidminer-5
By defining some threshold and selecting some method you can measure similarity.