没有 kd 树的 Python 中的最近邻搜索

发布于 2024-12-21 09:15:46 字数 376 浏览 2 评论 0原文

我从 C++ 背景开始学习 Python。我正在寻找一种快速而简单的方法来查找多维点的 2D(numpy)数组(也是 numpy 数组)中某些多维查询点的最近(最近邻居)。我知道 scipy 有一个 kd 树,但我认为这不是我想要的。首先,我将更改二维数组中多维点的值。其次,二维数组中每个点的位置(坐标)很重要,因为我还将更改它们的邻居。

我可以编写一个函数来遍历 2D 数组并测量查询点与数组中的点之间的距离,同时跟踪最小的点(使用 scipy 空间距离函数来测量距离)。是否有内置函数可以执行此操作?我试图尽可能避免在 python 中迭代数组。我还将有许多查询点,因此至少有两个“for 循环” - 一个用于迭代查询点,对于每个查询,一个循环用于迭代 2D 数组并找到最小距离。

感谢您的任何建议。

I'm beginning to learn Python coming from a C++ background. What I am looking for is a quick and easy way to find the closest (nearest neighbor) of some multidimensional query point in an 2D (numpy) array of multidimensional points (also numpy arrays). I know that scipy has a k-d tree, but I don't think this is what I want. First of all, I will be changing the values of the multidimensional points in the 2D array. Secondly, the position (coordinates) of each point in the 2D array matters as I will also be changing their neighbors.

I could write a function that goes through the 2D array and measures the distance between the query point and the points in the array while keeping track of the smallest one (using a scipy spatial distance function to measure distance). Is there is a built in function that does this? I am trying to avoid iterating over arrays in python as much as possible. I will also have numerous query points so there would be at least two "for loops" - one to iterate through the query points and for each query, a loop to iterate through the 2D array and find the minimum distance.

Thanks for any advice.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

逆蝶 2024-12-28 09:15:46

如果简洁是您的目标,您可以这样做:

In [14]: X = scipy.randn(10,2)

In [15]: X
Out[15]: 
array([[ 0.85831163,  1.45039761],
       [ 0.91590236, -0.64937523],
       [-1.19610431, -1.07731673],
       [-0.48454195,  1.64276509],
       [ 0.90944798, -0.42998205],
       [-1.17765553,  0.20858178],
       [-0.29433563, -0.8737285 ],
       [ 0.5115424 , -0.50863231],
       [-0.73882547, -0.52016481],
       [-0.14366935, -0.96248649]])

In [16]: q = scipy.array([0.91, -0.43])

In [17]: scipy.argmin([scipy.inner(q-x,q-x) for x in X])
Out[17]: 4

如果您有多个查询点:

In [18]: Q = scipy.array([[0.91, -0.43], [-0.14, -0.96]])

In [19]: [scipy.argmin([scipy.inner(q-x,q-x) for x in X]) for q in Q]
Out[19]: [4, 9]

If concise is your goal, you can do this one-liner:

In [14]: X = scipy.randn(10,2)

In [15]: X
Out[15]: 
array([[ 0.85831163,  1.45039761],
       [ 0.91590236, -0.64937523],
       [-1.19610431, -1.07731673],
       [-0.48454195,  1.64276509],
       [ 0.90944798, -0.42998205],
       [-1.17765553,  0.20858178],
       [-0.29433563, -0.8737285 ],
       [ 0.5115424 , -0.50863231],
       [-0.73882547, -0.52016481],
       [-0.14366935, -0.96248649]])

In [16]: q = scipy.array([0.91, -0.43])

In [17]: scipy.argmin([scipy.inner(q-x,q-x) for x in X])
Out[17]: 4

If you have several query points:

In [18]: Q = scipy.array([[0.91, -0.43], [-0.14, -0.96]])

In [19]: [scipy.argmin([scipy.inner(q-x,q-x) for x in X]) for q in Q]
Out[19]: [4, 9]
赠意 2024-12-28 09:15:46

广播对于这种事情非常有用。我不确定这是否是您所需要的,但在这里我使用广播来查找 p(3 空间中的一个点)和 X(3 空间中的一组 10 个点)之间的位移。

import numpy as np

def closest(X, p):
    disp = X - p
    return np.argmin((disp*disp).sum(1))

X = np.random.random((10, 3))
p = np.random.random(3)

print X
#array([[ 0.68395953,  0.97882991,  0.68826511],
#       [ 0.57938059,  0.24713904,  0.32822283],
#       [ 0.06070267,  0.06561339,  0.62241713],
#       [ 0.93734468,  0.73026772,  0.33755815],
#       [ 0.29370809,  0.76298588,  0.68728743],
#       [ 0.66248449,  0.6023311 ,  0.76704199],
#       [ 0.53490144,  0.96555923,  0.43994738],
#       [ 0.23780428,  0.75525843,  0.46067472],
#       [ 0.84240565,  0.82573202,  0.56029917],
#       [ 0.66751884,  0.31561133,  0.19244683]])
print p
#array([ 0.587416 ,  0.4181857,  0.2539029])
print closest(X, p)
#9

Broadcasting is very useful for this kind of thing. I'm not sure if this is what you need, but here I use broadcasting to find the displacement between p (one point in 3 space) and X (a set of 10 points in 3-space).

import numpy as np

def closest(X, p):
    disp = X - p
    return np.argmin((disp*disp).sum(1))

X = np.random.random((10, 3))
p = np.random.random(3)

print X
#array([[ 0.68395953,  0.97882991,  0.68826511],
#       [ 0.57938059,  0.24713904,  0.32822283],
#       [ 0.06070267,  0.06561339,  0.62241713],
#       [ 0.93734468,  0.73026772,  0.33755815],
#       [ 0.29370809,  0.76298588,  0.68728743],
#       [ 0.66248449,  0.6023311 ,  0.76704199],
#       [ 0.53490144,  0.96555923,  0.43994738],
#       [ 0.23780428,  0.75525843,  0.46067472],
#       [ 0.84240565,  0.82573202,  0.56029917],
#       [ 0.66751884,  0.31561133,  0.19244683]])
print p
#array([ 0.587416 ,  0.4181857,  0.2539029])
print closest(X, p)
#9
当梦初醒 2024-12-28 09:15:46

您可以计算所有距离 scipy.spatial.distance.cdist( X, Y )
或使用 RTree 获取动态数据: http://gispython.org/rtree/docs/class.html< /a> .

You can compute all distances scipy.spatial.distance.cdist( X, Y )
or use RTree for dynamic data: http://gispython.org/rtree/docs/class.html .

帝王念 2024-12-28 09:15:46

为了更快地搜索和支持动态项目插入,您可以对 2D 项目使用二叉树,其中大于和小于运算符由到参考点 (0,0) 的距离定义。

def dist(x1,x2):
    return np.sqrt( (float(x1[0])-float(x2[0]))**2 +(float(x1[1])-float(x2[1]))**2 )

class Node(object):

    def __init__(self, item=None,):
        self.item = item
        self.left = None
        self.right = None

    def __repr__(self):
        return '{}'.format(self.item)

    def _add(self, value, center):
        new_node = Node(value)
        if not self.item:
            self.item = new_node        
        else:
        vdist = dist(value,center)
        idist = dist(self.item,center)
            if vdist > idist:
                self.right = self.right and self.right._add(value, center) or new_node
            elif vdist < idist:
                self.left = self.left and self.left._add(value, center) or new_node
            else:
                print("BSTs do not support repeated items.")

        return self # this is necessary!!!

    def _isLeaf(self):
        return not self.right and not self.left

class BSTC(object):

    def __init__(self, center=[0.0,0.0]):
        self.root = None
    self.count = 0
    self.center = center

    def add(self, value):
        if not self.root:
            self.root = Node(value)
        else:
            self.root._add(value,self.center)
    self.count += 1

    def __len__(self): return self.count

    def closest(self, target):
            gap = float("inf")
            closest = float("inf")
            curr = self.root
            while curr:
                if dist(curr.item,target) < gap:
                    gap = dist(curr.item, target)
                    closest = curr
                if target == curr.item:
                    break
                elif dist(target,self.center) < dist(curr.item,self.center):
                    curr = curr.left
                else:
                    curr = curr.right
            return closest.item, gap


import util

bst = util.BSTC()
print len(bst)

arr = [(23.2323,34.34535),(23.23,36.34535),(53.23,34.34535),(66.6666,11.11111)]
for i in range(len(arr)): bst.add(arr[i])

f = (11.111,22.2222)
print bst.closest(f)
print map(lambda x: util.dist(f,x), arr)

For faster search and support for dynamic item insertion, you could use a binary tree for 2D items where greater and less than operator is defined by distance to a reference point (0,0).

def dist(x1,x2):
    return np.sqrt( (float(x1[0])-float(x2[0]))**2 +(float(x1[1])-float(x2[1]))**2 )

class Node(object):

    def __init__(self, item=None,):
        self.item = item
        self.left = None
        self.right = None

    def __repr__(self):
        return '{}'.format(self.item)

    def _add(self, value, center):
        new_node = Node(value)
        if not self.item:
            self.item = new_node        
        else:
        vdist = dist(value,center)
        idist = dist(self.item,center)
            if vdist > idist:
                self.right = self.right and self.right._add(value, center) or new_node
            elif vdist < idist:
                self.left = self.left and self.left._add(value, center) or new_node
            else:
                print("BSTs do not support repeated items.")

        return self # this is necessary!!!

    def _isLeaf(self):
        return not self.right and not self.left

class BSTC(object):

    def __init__(self, center=[0.0,0.0]):
        self.root = None
    self.count = 0
    self.center = center

    def add(self, value):
        if not self.root:
            self.root = Node(value)
        else:
            self.root._add(value,self.center)
    self.count += 1

    def __len__(self): return self.count

    def closest(self, target):
            gap = float("inf")
            closest = float("inf")
            curr = self.root
            while curr:
                if dist(curr.item,target) < gap:
                    gap = dist(curr.item, target)
                    closest = curr
                if target == curr.item:
                    break
                elif dist(target,self.center) < dist(curr.item,self.center):
                    curr = curr.left
                else:
                    curr = curr.right
            return closest.item, gap


import util

bst = util.BSTC()
print len(bst)

arr = [(23.2323,34.34535),(23.23,36.34535),(53.23,34.34535),(66.6666,11.11111)]
for i in range(len(arr)): bst.add(arr[i])

f = (11.111,22.2222)
print bst.closest(f)
print map(lambda x: util.dist(f,x), arr)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文