在 Numpy 图像中查找子图像

发布于 2024-12-08 19:19:30 字数 450 浏览 0 评论 0原文

我有两个从 PIL 图像转换而来的 Numpy 数组(3 维 uint8)。

我想查找第一张图像是否包含第二张图像,如果是,则找出第一张图像内匹配的左上角像素的坐标。

有没有一种方法可以纯粹在 Numpy 中以足够快的方式做到这一点,而不是使用(4!非常慢)纯 Python 循环?

2D 示例:

a = numpy.array([
    [0, 1,  2,  3],
    [4, 5,  6,  7],
    [8, 9, 10, 11]
])
b = numpy.array([
    [2, 3],
    [6, 7]
])

如何做这样的事情?

position = a.find(b)

position 将是 (0, 2)

I have two Numpy arrays (3-dimensional uint8) converted from PIL images.

I want to find if the first image contains the second image, and if so, find out the coordinates of the top-left pixel inside the first image where the match is.

Is there a way to do that purely in Numpy, in a fast enough way, rather than using (4! very slow) pure Python loops?

2D example:

a = numpy.array([
    [0, 1,  2,  3],
    [4, 5,  6,  7],
    [8, 9, 10, 11]
])
b = numpy.array([
    [2, 3],
    [6, 7]
])

How to do something like this?

position = a.find(b)

position would then be (0, 2).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

还不是爱你 2024-12-15 19:19:30

我正在使用 OpenCVmatchTemplate 函数。 OpenCV 有一个出色的 python 绑定,它在内部使用 numpy,因此图像只是 numpy 数组。例如,假设您有一个 100x100 像素的 BGR 文件 testimage.bmp。我们在位置 (30,30) 处获取一个 10x10 子图像并在原始图像中找到它。

import cv2
import numpy as np

image = cv2.imread("testimage.bmp")
template = image[30:40,30:40,:]

result = cv2.matchTemplate(image,template,cv2.TM_CCOEFF_NORMED)
print np.unravel_index(result.argmax(),result.shape)

输出:

(30, 30)

您可以在多种算法之间进行选择,以将模板与原始模板进行匹配,cv2.TM_CCOEFF_NORMED只是其中之一。有关更多详细信息,请参阅文档,一些算法将匹配指示为结果数组中的最小值,其他算法指示为最大值。警告:OpenCV 默认使用 BGR 通道顺序,因此要小心,例如,当您将使用 cv2.imread 加载的图像与从 PIL 转换为 numpy 的图像进行比较时。您始终可以使用 cv2.cvtColor 在格式之间进行转换。

为了找到高于给定阈值置信度所有匹配项,我使用类似的方法从结果数组中提取匹配坐标:

match_indices = np.arange(result.size)[(result>confidence).flatten()]
np.unravel_index(match_indices,result.shape)

这给出了一个长度为数组的元组2、每一个都是一个匹配坐标。

I'm doing this with OpenCV's matchTemplate function. There is an excellent python binding to OpenCV which uses numpy internally, so images are just numpy arrays. For example, let's assume you have a 100x100 pixel BGR file testimage.bmp. We take a 10x10 sub-image at position (30,30) and find it in the original.

import cv2
import numpy as np

image = cv2.imread("testimage.bmp")
template = image[30:40,30:40,:]

result = cv2.matchTemplate(image,template,cv2.TM_CCOEFF_NORMED)
print np.unravel_index(result.argmax(),result.shape)

Output:

(30, 30)

You can choose between several algorithms to match the template to the original, cv2.TM_CCOEFF_NORMED is just one of them. See the documentation for more details, some algorithms indicate matches as minima, others as maxima in the result array. A word of warning: OpenCV uses BGR channel order by default, so be careful, e.g. when you compare an image you loaded with cv2.imread to an image you converted from PIL to numpy. You can always use cv2.cvtColor to convert between formats.

To find all matches above a given threshold confidence, I use something along the lines of this to extract the matching coordinates from my result array:

match_indices = np.arange(result.size)[(result>confidence).flatten()]
np.unravel_index(match_indices,result.shape)

This gives a tuple of arrays of length 2, each of which is a matching coordinate.

只是偏爱你 2024-12-15 19:19:30

这可以使用 scipy 的 correlate2d 来完成 然后使用 argmax 找到互相关的峰值。

这里对数学和想法以及一些示例进行了更完整的解释。

如果您想留在纯 Numpy 中,甚至不使用 scipy,或者如果图像很大,您可能最好使用基于 FFT 的互相关方法。

编辑:该问题特别要求纯 Numpy 解决方案。但如果你可以使用 OpenCV 或其他图像处理工具,那么使用其中之一显然会更容易。下面 PiQuer 给出了一个这样的例子,如果你可以使用的话我会推荐它。

This can be done using scipy's correlate2d and then using argmax to find the peak in the cross-correlation.

Here's a more complete explanation of the math and ideas, and some examples.

If you want to stay in pure Numpy and not even use scipy, or if the images are large, you'd probably be best using an FFT based approach to the cross-correlations.

Edit: The question specifically asked for a pure Numpy solution. But if you can use OpenCV, or other image processing tools, it's obviously easier to use one of these. An example of such is given by PiQuer below, which I'd recommend if you can use it.

梨涡 2024-12-15 19:19:30

我刚刚完成了 N 维数组归一化互相关的独立实现。您可以从此处获取它。

互相关可以使用 scipy.ndimage.correlate 直接计算,也可以使用 scipy.fftpack.fftn/ifftn 在频域中计算取决于给定输入大小最快的那个。

I just finished writing a standalone implementation of normalized cross-correlation for N-dimensional arrays. You can get it from here.

Cross-correlation is calculated either directly, using scipy.ndimage.correlate, or in the frequency domain, using scipy.fftpack.fftn/ifftn depending on whichever will be quickest for the given input sizes.

沉默的熊 2024-12-15 19:19:30

实际上,您可以使用 regex 将这个问题简化为简单的字符串搜索,如下所示的实现 - 接受两个 PIL.Image 对象并查找 needle 的坐标haystack 中的 code>。这比使用逐像素搜索快约 127 倍。

def subimg_location(haystack, needle):
    haystack = haystack.convert('RGB')
    needle   = needle.convert('RGB')

    haystack_str = haystack.tostring()
    needle_str   = needle.tostring()

    gap_size = (haystack.size[0] - needle.size[0]) * 3
    gap_regex = '.{' + str(gap_size) + '}'

    # Split b into needle.size[0] chunks
    chunk_size = needle.size[0] * 3
    split = [needle_str[i:i+chunk_size] for i in range(0, len(needle_str), chunk_size)]

    # Build regex
    regex = re.escape(split[0])
    for i in xrange(1, len(split)):
        regex += gap_regex + re.escape(split[i])

    p = re.compile(regex)
    m = p.search(haystack_str)

    if not m:
        return None

    x, _ = m.span()

    left = x % (haystack.size[0] * 3) / 3
    top  = x / haystack.size[0] / 3

    return (left, top)

You can actually reduce this problem to a simple string search using a regex like the following implementation - accepts two PIL.Image objects and finds coordinates of the needle within the haystack. This is about 127x faster than using a pixel-by-pixel search.

def subimg_location(haystack, needle):
    haystack = haystack.convert('RGB')
    needle   = needle.convert('RGB')

    haystack_str = haystack.tostring()
    needle_str   = needle.tostring()

    gap_size = (haystack.size[0] - needle.size[0]) * 3
    gap_regex = '.{' + str(gap_size) + '}'

    # Split b into needle.size[0] chunks
    chunk_size = needle.size[0] * 3
    split = [needle_str[i:i+chunk_size] for i in range(0, len(needle_str), chunk_size)]

    # Build regex
    regex = re.escape(split[0])
    for i in xrange(1, len(split)):
        regex += gap_regex + re.escape(split[i])

    p = re.compile(regex)
    m = p.search(haystack_str)

    if not m:
        return None

    x, _ = m.span()

    left = x % (haystack.size[0] * 3) / 3
    top  = x / haystack.size[0] / 3

    return (left, top)
终难愈 2024-12-15 19:19:30
import cv2
import numpy as np

img = cv2.imread("brows.PNG")              #main image
gray_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

template = cv2.imread("websearch.PNG", cv2.IMREAD_GRAYSCALE)      #subimage
w,h = template.shape[::-1]

result = cv2.matchTemplate(gray_img,template, cv2.TM_CCOEFF_NORMED)
loc = np.where(result >= 0.9)

for pt in zip(*loc[::-1]):
    cv2.rectangle(img, pt,(pt[0] + w,pt[1] +h), (0,255,0),3)

cv2.imshow("img",img)
cv2.waitKey(0)
cv2.destroyAllWindows()
import cv2
import numpy as np

img = cv2.imread("brows.PNG")              #main image
gray_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

template = cv2.imread("websearch.PNG", cv2.IMREAD_GRAYSCALE)      #subimage
w,h = template.shape[::-1]

result = cv2.matchTemplate(gray_img,template, cv2.TM_CCOEFF_NORMED)
loc = np.where(result >= 0.9)

for pt in zip(*loc[::-1]):
    cv2.rectangle(img, pt,(pt[0] + w,pt[1] +h), (0,255,0),3)

cv2.imshow("img",img)
cv2.waitKey(0)
cv2.destroyAllWindows()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文