当前位置：文江博客话题详情

如何将 SURF 兴趣点与图像数据库相匹配

发布于 2024-12-17 22:35:32 字数 605 浏览 4 评论 0原文

我使用 C# (OpenSurf) 中的 SURF 算法从图像中获取兴趣点列表。每个兴趣点都包含描述符向量、x 坐标 (int)、y 坐标 (int)、比例 (float) 和方向 (float)。

现在，我想将一个图像的兴趣点与数据库中的图像列表进行比较，数据库中也有一个兴趣点列表，以找到最相似的图像。即：[图像(IP)] COMPARETO [图像列表(IP)]。 =>最佳匹配。单独比较图像会产生不令人满意的结果。

在搜索 stackoverflow 或其他网站时，我发现的最佳解决方案是构建 FLANN 索引，同时跟踪兴趣点的来源。但在实现之前，我有一些困惑我的问题：

1）当根据 SURF 兴趣点匹配图像时，我发现一种算法通过比较它们的距离 (x1,y1->x2,y2) 来进行匹配彼此并找到总距离最小的图像。比较兴趣点时是否从未使用描述符或方向？

2）如果使用描述符，那么我如何比较它们？我不知道如何使用索引树比较 64 点的 X 向量（1 个图像）与 64 点的 Y 向量（多个图像）。

我非常感谢您的帮助。我搜索过的所有地方或我找到的API，仅支持将一张图片与另一张图片匹配，但不支持将一张图片与图片列表有效匹配。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

回眸一遍 2024-12-24 22:35:32

这里有很多事情。

为了知道两个图像（几乎）相等，您必须找到两个图像的单应投影，使得投影导致投影特征位置之间的误差最小。暴力破解是可能的，但效率不高，因此一个技巧是假设相似的图像往往也具有相同位置的特征位置（给予或采取一点）。例如，当拼接图像时，要拼接的图像通常仅从稍微不同的角度和/或位置拍摄；即使不是，距离也可能会随着方向的差异而增长（“成比例”）。

这意味着您可以 - 作为一个广泛的阶段 - 通过查找所有图像对之间具有最小空间距离（k 最近邻居）的 k 对点来选择候选图像，并且仅在这些点上执行单应性。只有这样，您才能比较投影点对空间距离并按所述距离对图像进行排序；最短的距离意味着最佳的匹配（在特定情况下）。

如果我没记错的话，描述符是按角度直方图中最强的角度定向的。 Theat 意味着您还可以决定直接采用 64 维或 128 维特征描述符的欧几里德 (L2) 距离来获取两个给定特征的实际特征空间相似度，并对最佳 k 候选人。（不过，您不会比较找到描述符的比例，因为这会破坏比例不变性的目的。）

这两个选项都很耗时，并且直接取决于图像和特征的数量；换句话说：愚蠢的想法。

近似最近邻

一个巧妙的技巧是根本不使用实际距离，而是使用近似距离。换句话说，您需要一个近似的最近邻算法，以及 FLANN （尽管不是对于.NET）将是其中之一。

这里的一个关键点是投影搜索算法。它的工作原理如下：
假设您要比较 64 维特征空间中的描述符。您生成一个随机 64 维向量并将其归一化，从而在特征空间中产生任意单位向量；我们称之为A。现在（在索引期间）您可以根据该向量形成每个描述符的点积。这会将每个 64 维向量投影到 A 上，从而生成单个实数 a_n。（该值a_n表示描述符沿A相对于A原点的距离。）

这张图片是我借用的CrossValidated 上关于 PCA 的这个答案直观地演示了这一点；将旋转视为 A 不同随机选择的结果，其中红点对应于投影（因此，标量 a_n）。红线显示您使用该方法所犯的错误，这就是搜索近似的原因。

您将再次需要 A 进行搜索，因此您存储它。您还可以跟踪每个投影值a_n及其来自的描述符；此外，您还可以将每个 a_n （带有指向其描述符的链接）对齐到列表中，并按 a_n 排序。

为了澄清使用此处中的另一张图像，我们感兴趣的是投影点沿轴A的位置：

<图片src="https://i.sstatic.net/IJhhl.jpg" alt="投影搜索">

图像中 4 个投影点的值 a_0 .. a_3 为大约sqrt(0.5²+2²)=1.58，sqrt(0.4²+1.1²)=1.17， -0.84 和 -0.95，对应于它们到 A 原点的距离。

如果您现在想要查找相似的图像，请执行相同的操作：将每个描述符投影到 A 上，从而生成标量 q（查询）。现在，您转到列表中 q 的位置并获取周围的 k 条目。这些是您的近似最近邻居。现在获取这些k值的特征空间距离并按最小距离排序 - 顶部的值是最好的候选者。

回到最后一张图片，假设最上面的点是我们的查询。它的投影是 1.58，它的近似最近邻（四个投影点中）是 1.17 处的投影。它们在特征空间中并不是非常接近，但考虑到我们只是使用两个值来比较两个 64 维向量，所以情况也没有那么糟糕。

您会看到那里的限制，并且类似的投影根本不需要原始值接近，这当然会导致相当有创意的匹配。为了适应这一点，您只需生成更多个基向量B、C等 - 比如说其中的n - 并跟踪每个单独的列表。获取所有 k 个最佳匹配，根据与查询向量的欧氏距离对 k*n 64 维向量列表进行排序，对最佳匹配执行单应性并选择投影误差最小的一个。

这样做的巧妙之处在于，如果您有 n 个（随机、标准化）投影轴并且想要在 64 维空间中搜索，则只需将每个描述符与 nx 64 相乘即可code> 矩阵，产生 n 标量。

There are multiple things here.

In order to know two images are (almost) equal, you have to find the homographic projection of the two such that the projection results in a minimal error between the projected feature locations. Brute-forcing that is possible but not efficient, so a trick is to assume that similar images tend to have the feature locations in the same spot as well (give or take a bit). For example, when stitching images, the image to stitch are usually taken only from a slightly different angle and/or location; even if not, the distances will likely grow ("proportionally") to the difference in orientation.

This means that you can - as a broad phase - select candidate images by finding k pairs of points with minimum spatial distance (the k nearest neighbors) between all pairs of images and perform homography only on these points. Only then you compare the projected point-pairwise spatial distance and sort the images by said distance; the lowest distance implies the best possible match (given the circumstances).

If I'm not mistaken, the descriptors are oriented by the strongest angle in the angle histogram. Theat means you may also decide to take the euclidean (L2) distance of the 64- or 128-dimensional feature descriptors directly to obtain the actual feature-space similarity of two given features and perform homography on the best k candidates. (You will not compare the scale in which the descriptors were found though, because that would defeat the purpose of scale invariance.)

Both options are time consuming and direcly depend on the number of images and features; in other word's: stupid idea.

Approximate Nearest Neighbors

A neat trick is to not use actual distances at all, but approximate distances instead. In other words, you want an approximate nearest neighbor algorithm, and FLANN (although not for .NET) would be one of them.

One key point here is the projection search algorithm. It works like this:
Assuming you want to compare the descriptors in 64-dimensional feature space. You generate a random 64-dimensional vector and normalize it, resulting in an arbitrary unit vector in feature space; let's call it A. Now (during indexing) you form the dot product of each descriptor against this vector. This projects each 64-d vector onto A, resulting in a single, real number a_n. (This value a_n represents the distance of the descriptor along A in relation to A's origin.)

This image I borrowed from this answer on CrossValidated regarding PCA demonstrates it visually; think about the rotation as the result of different random choices of A, where the red dots correspond to the projections (and thus, scalars a_n). The red lines show the error you make by using that approach, this is what makes the search approximate.

You will need A again for search, so you store it. You also keep track of each projected value a_n and the descriptor it came from; furthermore you align each a_n (with a link to its descriptor) in a list, sorted by a_n.

To clarify using another image from here, we're interested in the location of the projected points along the axis A:

The values a_0 .. a_3 of the 4 projected points in the image are approximately sqrt(0.5²+2²)=1.58, sqrt(0.4²+1.1²)=1.17, -0.84 and -0.95, corresponding to their distance to A's origin.

If you now want to find similar images, you do the same: Project each descriptor onto A, resulting in a scalar q (query). Now you go to the position of q in the list and take the k surrounding entries. These are your approximate nearest neighbors. Now take the feature-space distance of these k values and sort by lowest distance - the top ones are your best candidates.

Coming back to the last picture, assume the topmost point is our query. It's projection is 1.58 and it's approximate nearest neighbor (of the four projected points) is the one at 1.17. They're not really close in feature space, but given that we just compared two 64-dimensional vectors using only two values, it's not that bad either.

You see the limits there and, similar projections do not at all require the original values to be close, this will of course result in rather creative matches. To accomodate for this, you simply generate more base vectors B, C, etc. - say n of them - and keep track of a separate list for each. Take the k best matches on all of them, sort that list of k*n 64-dimensional vectors according to their euclidean distance to the query vector, perform homography on the best ones and select the one with the lowest projection error.

The neat part about this is that if you have n (random, normalized) projection axes and want to search in 64-dimensional space, you are simply multiplying each descriptor with a n x 64 matrix, resulting in n scalars.

回复收藏 0 原文