在 Numpy 图像中查找子图像
我有两个从 PIL 图像转换而来的 Numpy 数组(3 维 uint8)。
我想查找第一张图像是否包含第二张图像,如果是,则找出第一张图像内匹配的左上角像素的坐标。
有没有一种方法可以纯粹在 Numpy 中以足够快的方式做到这一点,而不是使用(4!非常慢)纯 Python 循环?
2D 示例:
a = numpy.array([
[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]
])
b = numpy.array([
[2, 3],
[6, 7]
])
如何做这样的事情?
position = a.find(b)
position
将是 (0, 2)
。
I have two Numpy arrays (3-dimensional uint8) converted from PIL images.
I want to find if the first image contains the second image, and if so, find out the coordinates of the top-left pixel inside the first image where the match is.
Is there a way to do that purely in Numpy, in a fast enough way, rather than using (4! very slow) pure Python loops?
2D example:
a = numpy.array([
[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]
])
b = numpy.array([
[2, 3],
[6, 7]
])
How to do something like this?
position = a.find(b)
position
would then be (0, 2)
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我正在使用 OpenCV 的
matchTemplate
函数。 OpenCV 有一个出色的 python 绑定,它在内部使用 numpy,因此图像只是 numpy 数组。例如,假设您有一个 100x100 像素的 BGR 文件 testimage.bmp。我们在位置 (30,30) 处获取一个 10x10 子图像并在原始图像中找到它。输出:
您可以在多种算法之间进行选择,以将模板与原始模板进行匹配,
cv2.TM_CCOEFF_NORMED
只是其中之一。有关更多详细信息,请参阅文档,一些算法将匹配指示为结果数组中的最小值,其他算法指示为最大值。警告:OpenCV 默认使用 BGR 通道顺序,因此要小心,例如,当您将使用cv2.imread
加载的图像与从 PIL 转换为 numpy 的图像进行比较时。您始终可以使用cv2.cvtColor
在格式之间进行转换。为了找到高于给定阈值
置信度
的所有匹配项,我使用类似的方法从结果数组中提取匹配坐标:这给出了一个长度为数组的元组2、每一个都是一个匹配坐标。
I'm doing this with OpenCV's
matchTemplate
function. There is an excellent python binding to OpenCV which uses numpy internally, so images are just numpy arrays. For example, let's assume you have a 100x100 pixel BGR file testimage.bmp. We take a 10x10 sub-image at position (30,30) and find it in the original.Output:
You can choose between several algorithms to match the template to the original,
cv2.TM_CCOEFF_NORMED
is just one of them. See the documentation for more details, some algorithms indicate matches as minima, others as maxima in the result array. A word of warning: OpenCV uses BGR channel order by default, so be careful, e.g. when you compare an image you loaded withcv2.imread
to an image you converted from PIL to numpy. You can always usecv2.cvtColor
to convert between formats.To find all matches above a given threshold
confidence
, I use something along the lines of this to extract the matching coordinates from my result array:This gives a tuple of arrays of length 2, each of which is a matching coordinate.
这可以使用 scipy 的 correlate2d 来完成 然后使用 argmax 找到互相关的峰值。
这里对数学和想法以及一些示例进行了更完整的解释。
如果您想留在纯 Numpy 中,甚至不使用 scipy,或者如果图像很大,您可能最好使用基于 FFT 的互相关方法。
编辑:该问题特别要求纯 Numpy 解决方案。但如果你可以使用 OpenCV 或其他图像处理工具,那么使用其中之一显然会更容易。下面 PiQuer 给出了一个这样的例子,如果你可以使用的话我会推荐它。
This can be done using scipy's correlate2d and then using argmax to find the peak in the cross-correlation.
Here's a more complete explanation of the math and ideas, and some examples.
If you want to stay in pure Numpy and not even use scipy, or if the images are large, you'd probably be best using an FFT based approach to the cross-correlations.
Edit: The question specifically asked for a pure Numpy solution. But if you can use OpenCV, or other image processing tools, it's obviously easier to use one of these. An example of such is given by PiQuer below, which I'd recommend if you can use it.
我刚刚完成了 N 维数组归一化互相关的独立实现。您可以从此处获取它。
互相关可以使用 scipy.ndimage.correlate 直接计算,也可以使用 scipy.fftpack.fftn/ifftn 在频域中计算取决于给定输入大小最快的那个。
I just finished writing a standalone implementation of normalized cross-correlation for N-dimensional arrays. You can get it from here.
Cross-correlation is calculated either directly, using
scipy.ndimage.correlate
, or in the frequency domain, usingscipy.fftpack.fftn
/ifftn
depending on whichever will be quickest for the given input sizes.实际上,您可以使用
regex
将这个问题简化为简单的字符串搜索,如下所示的实现 - 接受两个PIL.Image
对象并查找needle
的坐标haystack
中的 code>。这比使用逐像素搜索快约 127 倍。You can actually reduce this problem to a simple string search using a
regex
like the following implementation - accepts twoPIL.Image
objects and finds coordinates of theneedle
within thehaystack
. This is about 127x faster than using a pixel-by-pixel search.