在 Numpy 图像中查找子图像

发布于 2024-12-08 19:19:30 字数 450 浏览 0 评论 0原文

我有两个从 PIL 图像转换而来的 Numpy 数组（3 维 uint8）。

我想查找第一张图像是否包含第二张图像，如果是，则找出第一张图像内匹配的左上角像素的坐标。

有没有一种方法可以纯粹在 Numpy 中以足够快的方式做到这一点，而不是使用（4！非常慢）纯 Python 循环？

2D 示例：

a = numpy.array([
    [0, 1,  2,  3],
    [4, 5,  6,  7],
    [8, 9, 10, 11]
])
b = numpy.array([
    [2, 3],
    [6, 7]
])

如何做这样的事情？

position = a.find(b)

position 将是 (0, 2)。

原文

I have two Numpy arrays (3-dimensional uint8) converted from PIL images.

I want to find if the first image contains the second image, and if so, find out the coordinates of the top-left pixel inside the first image where the match is.

Is there a way to do that purely in Numpy, in a fast enough way, rather than using (4! very slow) pure Python loops?

2D example:

a = numpy.array([
    [0, 1,  2,  3],
    [4, 5,  6,  7],
    [8, 9, 10, 11]
])
b = numpy.array([
    [2, 3],
    [6, 7]
])

How to do something like this?

position = a.find(b)

position would then be (0, 2).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

还不是爱你 2024-12-15 19:19:30

我正在使用 OpenCV 的 matchTemplate 函数。 OpenCV 有一个出色的 python 绑定，它在内部使用 numpy，因此图像只是 numpy 数组。例如，假设您有一个 100x100 像素的 BGR 文件 testimage.bmp。我们在位置 (30,30) 处获取一个 10x10 子图像并在原始图像中找到它。

import cv2
import numpy as np

image = cv2.imread("testimage.bmp")
template = image[30:40,30:40,:]

result = cv2.matchTemplate(image,template,cv2.TM_CCOEFF_NORMED)
print np.unravel_index(result.argmax(),result.shape)

输出：

(30, 30)

您可以在多种算法之间进行选择，以将模板与原始模板进行匹配，cv2.TM_CCOEFF_NORMED只是其中之一。有关更多详细信息，请参阅文档，一些算法将匹配指示为结果数组中的最小值，其他算法指示为最大值。警告：OpenCV 默认使用 BGR 通道顺序，因此要小心，例如，当您将使用 cv2.imread 加载的图像与从 PIL 转换为 numpy 的图像进行比较时。您始终可以使用 cv2.cvtColor 在格式之间进行转换。

为了找到高于给定阈值置信度的所有匹配项，我使用类似的方法从结果数组中提取匹配坐标：

match_indices = np.arange(result.size)[(result>confidence).flatten()]
np.unravel_index(match_indices,result.shape)

这给出了一个长度为数组的元组2、每一个都是一个匹配坐标。

I'm doing this with OpenCV's matchTemplate function. There is an excellent python binding to OpenCV which uses numpy internally, so images are just numpy arrays. For example, let's assume you have a 100x100 pixel BGR file testimage.bmp. We take a 10x10 sub-image at position (30,30) and find it in the original.

import cv2
import numpy as np

image = cv2.imread("testimage.bmp")
template = image[30:40,30:40,:]

result = cv2.matchTemplate(image,template,cv2.TM_CCOEFF_NORMED)
print np.unravel_index(result.argmax(),result.shape)

Output:

(30, 30)

You can choose between several algorithms to match the template to the original, cv2.TM_CCOEFF_NORMED is just one of them. See the documentation for more details, some algorithms indicate matches as minima, others as maxima in the result array. A word of warning: OpenCV uses BGR channel order by default, so be careful, e.g. when you compare an image you loaded with cv2.imread to an image you converted from PIL to numpy. You can always use cv2.cvtColor to convert between formats.

To find all matches above a given threshold confidence, I use something along the lines of this to extract the matching coordinates from my result array:

match_indices = np.arange(result.size)[(result>confidence).flatten()]
np.unravel_index(match_indices,result.shape)

This gives a tuple of arrays of length 2, each of which is a matching coordinate.

回复收藏 0 原文

只是偏爱你 2024-12-15 19:19:30

这可以使用 scipy 的 correlate2d 来完成然后使用 argmax 找到互相关的峰值。

这里对数学和想法以及一些示例进行了更完整的解释。

如果您想留在纯 Numpy 中，甚至不使用 scipy，或者如果图像很大，您可能最好使用基于 FFT 的互相关方法。

编辑：该问题特别要求纯 Numpy 解决方案。但如果你可以使用 OpenCV 或其他图像处理工具，那么使用其中之一显然会更容易。下面 PiQuer 给出了一个这样的例子，如果你可以使用的话我会推荐它。

回复收藏 0 原文

梨涡 2024-12-15 19:19:30

我刚刚完成了 N 维数组归一化互相关的独立实现。您可以从此处获取它。

互相关可以使用 scipy.ndimage.correlate 直接计算，也可以使用 scipy.fftpack.fftn/ifftn 在频域中计算取决于给定输入大小最快的那个。

回复收藏 0 原文

沉默的熊 2024-12-15 19:19:30

实际上，您可以使用 regex 将这个问题简化为简单的字符串搜索，如下所示的实现 - 接受两个 PIL.Image 对象并查找 needle 的坐标haystack 中的 code>。这比使用逐像素搜索快约 127 倍。

def subimg_location(haystack, needle):
    haystack = haystack.convert('RGB')
    needle   = needle.convert('RGB')

    haystack_str = haystack.tostring()
    needle_str   = needle.tostring()

    gap_size = (haystack.size[0] - needle.size[0]) * 3
    gap_regex = '.{' + str(gap_size) + '}'

    # Split b into needle.size[0] chunks
    chunk_size = needle.size[0] * 3
    split = [needle_str[i:i+chunk_size] for i in range(0, len(needle_str), chunk_size)]

    # Build regex
    regex = re.escape(split[0])
    for i in xrange(1, len(split)):
        regex += gap_regex + re.escape(split[i])

    p = re.compile(regex)
    m = p.search(haystack_str)

    if not m:
        return None

    x, _ = m.span()

    left = x % (haystack.size[0] * 3) / 3
    top  = x / haystack.size[0] / 3

    return (left, top)

You can actually reduce this problem to a simple string search using a regex like the following implementation - accepts two PIL.Image objects and finds coordinates of the needle within the haystack. This is about 127x faster than using a pixel-by-pixel search.

def subimg_location(haystack, needle):
    haystack = haystack.convert('RGB')
    needle   = needle.convert('RGB')

    haystack_str = haystack.tostring()
    needle_str   = needle.tostring()

    gap_size = (haystack.size[0] - needle.size[0]) * 3
    gap_regex = '.{' + str(gap_size) + '}'

    # Split b into needle.size[0] chunks
    chunk_size = needle.size[0] * 3
    split = [needle_str[i:i+chunk_size] for i in range(0, len(needle_str), chunk_size)]

    # Build regex
    regex = re.escape(split[0])
    for i in xrange(1, len(split)):
        regex += gap_regex + re.escape(split[i])

    p = re.compile(regex)
    m = p.search(haystack_str)

    if not m:
        return None

    x, _ = m.span()

    left = x % (haystack.size[0] * 3) / 3
    top  = x / haystack.size[0] / 3

    return (left, top)

回复收藏 0 原文

终难愈 2024-12-15 19:19:30

import cv2
import numpy as np

img = cv2.imread("brows.PNG")              #main image
gray_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

template = cv2.imread("websearch.PNG", cv2.IMREAD_GRAYSCALE)      #subimage
w,h = template.shape[::-1]

result = cv2.matchTemplate(gray_img,template, cv2.TM_CCOEFF_NORMED)
loc = np.where(result >= 0.9)

for pt in zip(*loc[::-1]):
    cv2.rectangle(img, pt,(pt[0] + w,pt[1] +h), (0,255,0),3)

cv2.imshow("img",img)
cv2.waitKey(0)
cv2.destroyAllWindows()

import cv2
import numpy as np

img = cv2.imread("brows.PNG")              #main image
gray_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

template = cv2.imread("websearch.PNG", cv2.IMREAD_GRAYSCALE)      #subimage
w,h = template.shape[::-1]

result = cv2.matchTemplate(gray_img,template, cv2.TM_CCOEFF_NORMED)
loc = np.where(result >= 0.9)

for pt in zip(*loc[::-1]):
    cv2.rectangle(img, pt,(pt[0] + w,pt[1] +h), (0,255,0),3)

cv2.imshow("img",img)
cv2.waitKey(0)
cv2.destroyAllWindows()

回复收藏 0 原文

~没有更多了~