检测较大图像中图像的位置

发布于 2024-08-21 06:40:22 字数 194 浏览 16 评论 0原文

如何检测较大图像中图像的位置？我有一个未经修改的图像副本。然后将该图像更改为任意分辨率，并随机放置在任意尺寸的大得多的图像中。不对生成的图像进行其他转换。 Python 代码是理想的，并且可能需要 libgd。如果您知道解决此问题的好方法，您将获得+1。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

慵挽 2024-08-28 06:40:22

有一个快速但肮脏的解决方案，只需在目标图像上滑动一个窗口并计算每个位置的相似度，然后选择相似度最高的位置。然后将相似度与阈值进行比较，如果分数高于阈值，则得出结论该图像在那里，这就是位置；如果分数低于阈值，则图像不存在。

作为相似性度量，您可以使用归一化相关性或平方差和（也称为 L2 范数）。正如人们提到的，这不会处理规模变化。因此，您还可以多次重新缩放原始图像，并对每个缩放版本重复上述过程。根据输入图像的大小和可能的比例范围，这可能足够好，并且很容易实现。

正确的解决方案是使用仿射不变量。尝试查找“宽基线立体匹配”，人们在这种情况下看待这个问题。使用的方法通常是这样的：

原始图像的预处理

运行“兴趣点检测器”。这将在图像中找到一些易于定位的点，例如角点。有很多检测器，一种名为“harris-affine”的检测器效果很好并且非常流行（因此可能存在实现）。另一种选择是使用高斯差分 (DoG) 检测器，它是为 SIFT 开发的并且效果也很好。
在每个兴趣点，提取一个小子图像（例如 30x30 像素）。
对于每个子图像，计算一个“描述符”，即该窗口中图像内容的某种表示。同样，存在许多描述符。需要注意的是描述符描述图像内容的程度（您希望两个描述符仅在相似时才匹配）以及它的不变性（您希望即使在缩放后它也相同）。对于您的情况，我建议使用 SIFT。它不像其他一些描述符那样不变，但可以很好地应对规模，并且在您的情况下，规模是唯一改变的事情。

在此阶段结束时，您将拥有一组描述符。

测试（使用新的测试图像）。

首先，运行与步骤 1 中相同的兴趣点检测器并获取一组兴趣点。如上所述，您为每个点计算相同的描述符。现在您也有了一组目标图像的描述符。
接下来，您寻找匹配项。理想情况下，对于原始图像中的每个描述符，目标图像中都会有一些非常相似的描述符。（由于目标图像较大，因此还会有“剩余”描述符，即与原始图像中的任何内容都不对应的点。）因此，如果足够多的原始描述符与足够的相似度相匹配，那么您就知道目标是那里。此外，由于描述符是特定于位置的，因此您还将知道原始图像在目标图像中的位置。

There is a quick and dirty solution, and that's simply sliding a window over the target image and computing some measure of similarity at each location, then picking the location with the highest similarity. Then you compare the similarity to a threshold, if the score is above the threshold, you conclude the image is there and that's the location; if the score is below the threshold, then the image isn't there.

As a similarity measure, you can use normalized correlation or sum of squared differences (aka L2 norm). As people mentioned, this will not deal with scale changes. So you also rescale your original image multiple times and repeat the process above with each scaled version. Depending on the size of your input image and the range of possible scales, this may be good enough, and it's easy to implement.

A proper solution is to use affine invariants. Try looking up "wide-baseline stereo matching", people looked at that problem in that context. The methods that are used are generally something like this:

Preprocessing of the original image

Run an "interest point detector". This will find a few points in the image which are easily localizable, e.g. corners. There are many detectors, a detector called "harris-affine" works well and is pretty popular (so implementations probably exist). Another option is to use the Difference-of-Gaussians (DoG) detector, it was developed for SIFT and works well too.
At each interest point, extract a small sub-image (e.g. 30x30 pixels)
For each sub-image, compute a "descriptor", some representation of the image content in that window. Again, many descriptors exist. Things to look at are how well the descriptor describes the image content (you want two descriptors to match only if they are similar) and how invariant it is (you want it to be the same even after scaling). In your case, I'd recommend using SIFT. It is not as invariant as some other descriptors, but can cope with scale well, and in your case scale is the only thing that changes.

At the end of this stage, you will have a set of descriptors.

Testing (with the new test image).

First, you run the same interest point detector as in step 1 and get a set of interest points. You compute the same descriptor for each point, as above. Now you have a set of descriptors for the target image as well.
Next, you look for matches. Ideally, to each descriptor from your original image, there will be some pretty similar descriptor in the target image. (Since the target image is larger, there will also be "leftover" descriptors, i.e. points that don't correspond to anything in the original image.) So if enough of the original descriptors match with enough similarity, then you know the target is there. Moreover, since the descriptors are location-specific, you will also know where in the target image the original image is.

回复收藏 0 原文