当前位置：文江博客话题详情

如何在 SIFT 中使用 DoG 金字塔

发布于 2024-08-30 03:06:55 字数 195 浏览 13 评论 0原文

我对图像处理和模式识别非常陌生。我正在尝试实现 SIFT 算法，在该算法中我能够创建 DoG 金字塔并识别每个八度音程中的局部最大值或最小值。我不明白的是如何在每个八度音阶中使用这些局部最大/最小值。我如何结合这些点？

我的问题听起来可能很微不足道。我读过Lowe的论文，但无法真正理解他在建造DoG金字塔后做了什么。任何帮助表示赞赏。

谢谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

记忆で 2024-09-06 03:06:55

基本上，他在构建 DoG 金字塔后所做的就是检测这些图像中的局部极值。之后，他丢弃了一些检测到的局部极值，因为它们可能不稳定。识别那些不稳定关键点/特征的过程通过两个步骤完成：

拒绝具有低对比度的
点拒绝沿边缘定位不良的点（这意味着它们仅在一个方向上具有较强的边缘响应）

能够执行这些步骤，首先需要通过泰勒级数展开来得到极值的真实位置。它将为您提供解决这两个步骤的信息。

最后一步是构建描述符......

我也在研究这个算法，我觉得理解它并不那么简单。 Lowe 的论文中没有包含一些细节，因此更难理解。我还没有找到很多额外的资源来更深入地解释该算法，但有一些开源实现，因此您也可以使用它们。

编辑：更多信息:)

您链接的论文是他的早期作品，您应该获得最新版本的论文，因为有一些修改。在搜索更多资源时，我也阅读了他的专利，并且它还包含旧信息，因此您也不应该查看那里。

所以，我对这个尺度空间极值步骤的理解如下。首先，我们需要建立一个高斯金字塔。 Paper 说，为了局部极值完整性，我们需要在每个八度音程中构建 s+3 高斯图像。经过一些测试后，Lowe 得出结论，对于s = 3，他获得了最好的结果。这意味着每个八度音程中有 6 个高斯图像，从中我们得到 5 个 DoG 图像。请注意，所有这些 DoG 图像都具有相同的分辨率。仅当传递到下一个八度音程时才进行重新采样。

下一步是找到局部极值。 Lowe 建议在 26 个邻域内进行搜索，这意味着我们应该从第二张图像开始搜索，因为这是存在 26 个邻域的第一张图像。同样，我们停止对第四张图像的搜索。对每个八度音程单独重复此过程。对于找到的每个极值，至少应该保存其位置和尺度。找到极值后，下一步将是使用泰勒级数完成的更准确的定位。

这是我对这一步如何运作的理解，我希望我离事实不太远:)

希望这能有所帮助。

Basically what he does after building the DoG pyramid is detecting local extrema in those images. Afterwards, he discards some of the detected local extrema because they're probably unstable. Process of identifying those unstable keypoints/features is done by two steps:

rejecting points that have low contrast
rejecting points that are poorly localized along the edge (it means that they have strong edge response in one direction only)

To be able to do these steps, first you need to get the true location of extrema by taking a Taylor series expansion. It will give you information to solve those two steps.

Final step is to build descriptors ...

I'm in a process of studying this algorithm as well and i don't find it so trivial to understand. There are some details that are not included in Lowe's paper so that's what it makes it harder to understand. I haven't found many extra resources which will explain this algorithm more in depth but there are some open source implementations so you could also make use of them.

EDIT: more information :)

Paper you linked is his early work and you should get the newest version of paper because there are some modifications. Searching for more resources I've read his patent as well and it also contains old information so you shouldn't look there either.

So, my understanding of this scale-space extrema step is as it follows. First, we need to build a Gaussian pyramid. Paper says that for local extrema completeness we need to build s+3 Gaussian images in each octave. Having some tests Lowe concluded that for s = 3 he gets the best results. So that implies we have 6 Gaussian images in each octave from which we get 5 DoG images. Note that all these DoG images have the same resolution. Re-sampling is done only when passing to next octave.

Next step would be finding a local extrema. Lowe proposes to search within a 26 neighborhood which means that we should start our search from second image because that's the first image for which 26 neighborhood exists. Similarly we stop our search on fourth image. This process is repeated for each octave individually. For each extrema found, at least you should save its location and its scale. Having extrema found next step would be more accurate localization which is done with Taylor series.

This is my understanding how this step works and i hope I'm not too far from the truth :)

Hope this helped a little bit more.

回复收藏 0 原文

原来分手还会想你 2024-09-06 03:06:55

vlfeat 是一个开源库，实现了多种计算机视觉算法，包括 SIFT。您应该能够查看该源代码以更好地了解正在做什么。

如果您正确地找到了每个八度音阶中的极值，那么您可以：

对
极值的规模和位置
拒绝低对比度和边缘
响应

对于此时剩余的每个特征，

计算相对于检测到的特征的比例的窗口大小内的主导方向
构建 SIFT 描述符表示（通过将梯度累积到方向直方图的空间 4x4 网格中）。论文的 6.1 对此进行了描述。

我不确定这有多大帮助，因为我不知道你在哪里被挂断了。

回复收藏 0 原文

被你宠の有点坏 2024-09-06 03:06:55

我们有两座金字塔。高斯金字塔和 DoG 金字塔。高斯金字塔有 6 个模糊图像。 DoG是这些图像的差异，因此DoG中有5张图像。
你与高斯金字塔无关。请注意，所有这些都在第一个八度！创建第一个金字塔时，调整图像大小并开始为第二个八度音程构建新金字塔。

假设您的原始图像是 512x512。在第一个八度音程中，所有图像均为 512x512，但在第二个八度音程中，所有图像均为 256x256。同样，您有 6 个高斯金字塔图像和 5 个 DoG 金字塔图像。但第二个 ocave 的尺寸都是 256x256。无需提及第三个八度。

现在进行最小值和最大值的匹配：（您位于第一个八度音阶）
假设您在第一个八度音阶中寻找最大值。您必须使用 DoG 金字塔并从第二张图像开始。您获取一个像素并计算它是否为最大值。在此计算中，您应该使用 DoG 金字塔的第一、第二和第三图像。如果完成，则通过考虑第二、第三和第四图像来找到第三图像中的最大值。最后通过考虑第三、第四和第五图像来找到第四图像中的最大值。

现在在第一个八度音阶中找到 mixama 已完成，转到下一个八度音阶并重复这些步骤。

回复收藏 0 原文

~没有更多了~