没有 kd 树的 Python 中的最近邻搜索

发布于 2024-12-21 09:15:46 字数 376 浏览 2 评论 0原文

我从 C++ 背景开始学习 Python。我正在寻找一种快速而简单的方法来查找多维点的 2D（numpy）数组（也是 numpy 数组）中某些多维查询点的最近（最近邻居）。我知道 scipy 有一个 kd 树，但我认为这不是我想要的。首先，我将更改二维数组中多维点的值。其次，二维数组中每个点的位置（坐标）很重要，因为我还将更改它们的邻居。

我可以编写一个函数来遍历 2D 数组并测量查询点与数组中的点之间的距离，同时跟踪最小的点（使用 scipy 空间距离函数来测量距离）。是否有内置函数可以执行此操作？我试图尽可能避免在 python 中迭代数组。我还将有许多查询点，因此至少有两个“for 循环” - 一个用于迭代查询点，对于每个查询，一个循环用于迭代 2D 数组并找到最小距离。

感谢您的任何建议。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

逆蝶 2024-12-28 09:15:46

如果简洁是您的目标，您可以这样做：

In [14]: X = scipy.randn(10,2)

In [15]: X
Out[15]: 
array([[ 0.85831163,  1.45039761],
       [ 0.91590236, -0.64937523],
       [-1.19610431, -1.07731673],
       [-0.48454195,  1.64276509],
       [ 0.90944798, -0.42998205],
       [-1.17765553,  0.20858178],
       [-0.29433563, -0.8737285 ],
       [ 0.5115424 , -0.50863231],
       [-0.73882547, -0.52016481],
       [-0.14366935, -0.96248649]])

In [16]: q = scipy.array([0.91, -0.43])

In [17]: scipy.argmin([scipy.inner(q-x,q-x) for x in X])
Out[17]: 4

如果您有多个查询点：

In [18]: Q = scipy.array([[0.91, -0.43], [-0.14, -0.96]])

In [19]: [scipy.argmin([scipy.inner(q-x,q-x) for x in X]) for q in Q]
Out[19]: [4, 9]

If concise is your goal, you can do this one-liner:

In [14]: X = scipy.randn(10,2)

In [15]: X
Out[15]: 
array([[ 0.85831163,  1.45039761],
       [ 0.91590236, -0.64937523],
       [-1.19610431, -1.07731673],
       [-0.48454195,  1.64276509],
       [ 0.90944798, -0.42998205],
       [-1.17765553,  0.20858178],
       [-0.29433563, -0.8737285 ],
       [ 0.5115424 , -0.50863231],
       [-0.73882547, -0.52016481],
       [-0.14366935, -0.96248649]])

In [16]: q = scipy.array([0.91, -0.43])

In [17]: scipy.argmin([scipy.inner(q-x,q-x) for x in X])
Out[17]: 4

If you have several query points:

In [18]: Q = scipy.array([[0.91, -0.43], [-0.14, -0.96]])

In [19]: [scipy.argmin([scipy.inner(q-x,q-x) for x in X]) for q in Q]
Out[19]: [4, 9]

回复收藏 0 原文

赠意 2024-12-28 09:15:46

广播对于这种事情非常有用。我不确定这是否是您所需要的，但在这里我使用广播来查找 p（3 空间中的一个点）和 X（3 空间中的一组 10 个点）之间的位移。

import numpy as np

def closest(X, p):
    disp = X - p
    return np.argmin((disp*disp).sum(1))

X = np.random.random((10, 3))
p = np.random.random(3)

print X
#array([[ 0.68395953,  0.97882991,  0.68826511],
#       [ 0.57938059,  0.24713904,  0.32822283],
#       [ 0.06070267,  0.06561339,  0.62241713],
#       [ 0.93734468,  0.73026772,  0.33755815],
#       [ 0.29370809,  0.76298588,  0.68728743],
#       [ 0.66248449,  0.6023311 ,  0.76704199],
#       [ 0.53490144,  0.96555923,  0.43994738],
#       [ 0.23780428,  0.75525843,  0.46067472],
#       [ 0.84240565,  0.82573202,  0.56029917],
#       [ 0.66751884,  0.31561133,  0.19244683]])
print p
#array([ 0.587416 ,  0.4181857,  0.2539029])
print closest(X, p)
#9

Broadcasting is very useful for this kind of thing. I'm not sure if this is what you need, but here I use broadcasting to find the displacement between p (one point in 3 space) and X (a set of 10 points in 3-space).

import numpy as np

def closest(X, p):
    disp = X - p
    return np.argmin((disp*disp).sum(1))

X = np.random.random((10, 3))
p = np.random.random(3)

print X
#array([[ 0.68395953,  0.97882991,  0.68826511],
#       [ 0.57938059,  0.24713904,  0.32822283],
#       [ 0.06070267,  0.06561339,  0.62241713],
#       [ 0.93734468,  0.73026772,  0.33755815],
#       [ 0.29370809,  0.76298588,  0.68728743],
#       [ 0.66248449,  0.6023311 ,  0.76704199],
#       [ 0.53490144,  0.96555923,  0.43994738],
#       [ 0.23780428,  0.75525843,  0.46067472],
#       [ 0.84240565,  0.82573202,  0.56029917],
#       [ 0.66751884,  0.31561133,  0.19244683]])
print p
#array([ 0.587416 ,  0.4181857,  0.2539029])
print closest(X, p)
#9

回复收藏 0 原文

当梦初醒 2024-12-28 09:15:46

您可以计算所有距离 scipy.spatial.distance.cdist( X, Y )
或使用 RTree 获取动态数据： http://gispython.org/rtree/docs/class.html< /a> .

回复收藏 0 原文

帝王念 2024-12-28 09:15:46

为了更快地搜索和支持动态项目插入，您可以对 2D 项目使用二叉树，其中大于和小于运算符由到参考点 (0,0) 的距离定义。

def dist(x1,x2):
    return np.sqrt( (float(x1[0])-float(x2[0]))**2 +(float(x1[1])-float(x2[1]))**2 )

class Node(object):

    def __init__(self, item=None,):
        self.item = item
        self.left = None
        self.right = None

    def __repr__(self):
        return '{}'.format(self.item)

    def _add(self, value, center):
        new_node = Node(value)
        if not self.item:
            self.item = new_node        
        else:
        vdist = dist(value,center)
        idist = dist(self.item,center)
            if vdist > idist:
                self.right = self.right and self.right._add(value, center) or new_node
            elif vdist < idist:
                self.left = self.left and self.left._add(value, center) or new_node
            else:
                print("BSTs do not support repeated items.")

        return self # this is necessary!!!

    def _isLeaf(self):
        return not self.right and not self.left

class BSTC(object):

    def __init__(self, center=[0.0,0.0]):
        self.root = None
    self.count = 0
    self.center = center

    def add(self, value):
        if not self.root:
            self.root = Node(value)
        else:
            self.root._add(value,self.center)
    self.count += 1

    def __len__(self): return self.count

    def closest(self, target):
            gap = float("inf")
            closest = float("inf")
            curr = self.root
            while curr:
                if dist(curr.item,target) < gap:
                    gap = dist(curr.item, target)
                    closest = curr
                if target == curr.item:
                    break
                elif dist(target,self.center) < dist(curr.item,self.center):
                    curr = curr.left
                else:
                    curr = curr.right
            return closest.item, gap


import util

bst = util.BSTC()
print len(bst)

arr = [(23.2323,34.34535),(23.23,36.34535),(53.23,34.34535),(66.6666,11.11111)]
for i in range(len(arr)): bst.add(arr[i])

f = (11.111,22.2222)
print bst.closest(f)
print map(lambda x: util.dist(f,x), arr)

For faster search and support for dynamic item insertion, you could use a binary tree for 2D items where greater and less than operator is defined by distance to a reference point (0,0).

def dist(x1,x2):
    return np.sqrt( (float(x1[0])-float(x2[0]))**2 +(float(x1[1])-float(x2[1]))**2 )

class Node(object):

    def __init__(self, item=None,):
        self.item = item
        self.left = None
        self.right = None

    def __repr__(self):
        return '{}'.format(self.item)

    def _add(self, value, center):
        new_node = Node(value)
        if not self.item:
            self.item = new_node        
        else:
        vdist = dist(value,center)
        idist = dist(self.item,center)
            if vdist > idist:
                self.right = self.right and self.right._add(value, center) or new_node
            elif vdist < idist:
                self.left = self.left and self.left._add(value, center) or new_node
            else:
                print("BSTs do not support repeated items.")

        return self # this is necessary!!!

    def _isLeaf(self):
        return not self.right and not self.left

class BSTC(object):

    def __init__(self, center=[0.0,0.0]):
        self.root = None
    self.count = 0
    self.center = center

    def add(self, value):
        if not self.root:
            self.root = Node(value)
        else:
            self.root._add(value,self.center)
    self.count += 1

    def __len__(self): return self.count

    def closest(self, target):
            gap = float("inf")
            closest = float("inf")
            curr = self.root
            while curr:
                if dist(curr.item,target) < gap:
                    gap = dist(curr.item, target)
                    closest = curr
                if target == curr.item:
                    break
                elif dist(target,self.center) < dist(curr.item,self.center):
                    curr = curr.left
                else:
                    curr = curr.right
            return closest.item, gap


import util

bst = util.BSTC()
print len(bst)

arr = [(23.2323,34.34535),(23.23,36.34535),(53.23,34.34535),(66.6666,11.11111)]
for i in range(len(arr)): bst.add(arr[i])

f = (11.111,22.2222)
print bst.closest(f)
print map(lambda x: util.dist(f,x), arr)

回复收藏 0 原文

~没有更多了~