OpenCV / SURF 如何从描述符中生成图像哈希/指纹/签名?

发布于 08-19 04:34 字数 373 浏览 13 评论 0原文

这里有一些主题对于如何查找相似图片非常有帮助。

我想做的是获取图片的指纹,并在数码相机拍摄的不同照片上找到相同的图片。 SURF 算法似乎是独立于缩放、角度和其他扭曲的最佳方法。

我使用 OpenCV 和 SURF 算法来提取样本图像上的特征。现在我想知道如何将所有这些特征数据(位置、拉普拉斯、大小、方向、粗麻布)转换为指纹或哈希。

该指纹将存储在数据库中,搜索查询必须能够将该指纹与具有几乎相同特征的照片指纹进行比较。

更新:

似乎没有办法将所有描述符向量转换为简单的散列。那么将图像描述符存储到数据库中以便快速查询的最佳方法是什么?

词汇树是一个选择吗?

我将非常感谢任何帮助。

There are some topics here that are very helpful on how to find similar pictures.

What I want to do is to get a fingerprint of a picture and find the same picture on different photos taken by a digital camera. The SURF algorithm seams to be the best way to be independent on scaling, angle and other distortions.

I'm using OpenCV with the SURF algorithm to extract features on the sample image. Now I'm wondering how to convert all this feature data (position, laplacian, size, orientation, hessian) into a fingerprint or hash.

This fingerprint will be stored in a database and a search query must be able to compare that fingerprint with a fingerprint of a photo with almost the same features.

Update:

It seems that there is no way to convert all the descriptor vectors into a simple hash. So what would be the best way to store the image descriptors into the database for fast querying?

Would Vocabulary Trees be an option?

I would be very thankful for any help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

┼──2024-08-26 04:34:11

您提到的特征数据(位置、拉普拉斯算子、大小、方向、粗麻布)不足以满足您的目的(如果您想进行匹配,这些实际上是描述符中不太相关的部分)。您要查看的数据是“描述符”(第四个参数):

void cvExtractSURF (const CvArr* 图像、const CvArr* 掩码、CvSeq** 关键点、CvSeq** 描述符、CvMemStorage* 存储、CvSURFParams 参数)

这些是 128 或 64 个(取决于参数)向量,其中包含以下内容的“指纹”特定特征(每个图像将包含不同数量的此类向量)。
如果您获得最新版本的 Opencv,他们有一个名为 find_obj.cpp 的示例,其中向您展示了如何使用它来匹配

更新

您可能会发现讨论很有帮助

The feature data you mention (position, laplacian, size, orientation, hessian) is insufficient for your purpose (these are actually the less relevant parts of the descriptor if you want to do matching). The data you want to look at are the "descriptors" (the 4th argument):

void cvExtractSURF(const CvArr* image, const CvArr* mask, CvSeq** keypoints, CvSeq** descriptors, CvMemStorage* storage, CvSURFParams params)

These are 128 or 64 (depending on params) vectors which contain the "fingerprints" of the specific feature (each image will contain a variable amount of such vectors).
If you get the latest version of Opencv they have a sample named find_obj.cpp which shows you how it is used for matching

update:

you might find this discussion helpful

疏忽2024-08-26 04:34:11

计算哈希值的简单方法如下。从图像中获取所有描述符(例如,N 个)。每个描述符都是一个由 128 个数字组成的向量(您可以将它们转换为 0 到 255 之间的整数)。所以你有一组 N*128 整数。只需将它们依次写入一个字符串并将其用作哈希值即可。如果您希望哈希值很小,我相信有一些方法可以计算字符串的哈希函数,因此将描述符转换为字符串,然后使用该字符串的哈希值。

如果您想找到完全相同的重复项,这可能会起作用。但看起来(因为你谈论了比例、旋转等)你只想找到“相似”的图像。在这种情况下,使用哈希可能不是一个好方法。您可能会使用一些兴趣点检测器来查找计算 SURF 描述符的点。想象一下它将返回同一组点,但顺序不同。突然间,即使图像和描述符相同,您的哈希值也会非常不同。

因此,如果我必须可靠地找到相似的图像,我会使用不同的方法。例如,我可以对 SURF 描述符进行矢量量化,构建矢量量化值的直方图,并使用直方图交集进行匹配。你真的绝对必须使用哈希函数(也许是为了效率),还是你只是想使用任何东西来查找相似的图像?

A trivial way to compute a hash would be the following. Get all the descriptors from the image (say, N of them). Each descriptor is a vector of 128 numbers (you can convert them to be integers between 0 and 255). So you have a set of N*128 integers. Just write them one after another into a string and use that as a hash value. If you want the hash values to be small, I believe there are ways to compute hash functions of strings, so convert descriptors to string and then use the hash value of that string.

That might work if you want to find exact duplicates. But it seems (since you talk about scale, rotation, etc) you want to just find "similar" images. In that case, using a hash is probably not a good way to go. You probably use some interest point detector to find points at which to compute SURF descriptors. Imagine that it will return the same set of points, but in different order. Suddenly your hash value will be very different, even if the images and descriptors are the same.

So, if I had to find similar images reliably, I'd use a different approach. For example, I could vector-quantize the SURF descriptors, build histograms of vector-quantized values, and use histogram intersection for matching. Do you really absolutely have to use hash functions (maybe for efficiency), or do you just want to use whatever to find similar images?

软糯酥胸2024-08-26 04:34:11

It seems like GIST might be a more appropriate thing to use.

http://people.csail.mit.edu/torralba/code/spatialenvelope/ has MATLAB code.

要走就滚别墨迹2024-08-26 04:34:11

Min-Hashmin-Hashing 是一种可能对您有帮助的技术。它将整个图像编码为可调整大小的表示形式,然后存储在哈希表中。确实存在一些变体,例如几何最小散列分区最小散列捆绑最小散列。由此产生的内存占用并不是最小的技术之一,但这些技术适用于各种场景,例如接近重复的检索,甚至是小对象检索 - 在这种情况下,其他短签名通常表现不佳。

有几篇关于这个主题的论文。条目文献将是:
近似重复图像检测:min-Hash 和 tf-idf 加权
Ondrej Chum、James Philbin、Andrew Zisserman,BMVC 2008 PDF

Min-Hash or min-Hashing is a technique that might help you. It encodes the whole image in a representation with adjustable size that is then stored in hash tables. Several variants like Geometric min-Hashing, Partition min-Hash and Bundle min-Hashing do exist. The resulting memory footprint is not one of the smallest but these techniques works for a variety of scenarios such as near-duplicate retrieval and even small object retrieval - a scenario where other short signatures often do not perform very well.

There are several papers on this topic. Entry literature would be:
Near Duplicate Image Detection: min-Hash and tf-idf Weighting
Ondrej Chum, James Philbin, Andrew Zisserman, BMVC 2008 PDF

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文