当前位置：文江博客话题详情

使用 Numpy 求一组点的平均距离

发布于 2024-08-23 11:41:26 字数 215 浏览 6 评论 0 原文

我在未知维度空间中有一个点数组，例如：

data=numpy.array(
[[ 115, 241, 314],
[ 153, 413, 144],
[ 535, 2986, 41445]])

我想找到所有点之间的平均欧几里德距离。

请注意，我有超过 20,000 点积分，因此我希望尽可能高效地完成此操作。

谢谢。

原文

I have an array of points in unknown dimensional space, such as:

data=numpy.array(
[[ 115, 241, 314],
[ 153, 413, 144],
[ 535, 2986, 41445]])

and I would like to find the average euclidean distance between all points.

Please note that I have over 20,000 points, so I would like to do this as efficiently as possible.

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伴我心暖 2024-08-30 11:41:26

如果您有权访问 scipy，您可以尝试以下操作：

scipy .spatial.distance.cdist(数据,数据)

回复收藏 0 原文

奈何桥上唱咆哮 2024-08-30 11:41:26

好吧，我不认为有一种超级快速的方法可以做到这一点，但这应该可以做到：

tot = 0.

for i in xrange(data.shape[0]-1):
    tot += ((((data[i+1:]-data[i])**2).sum(1))**.5).sum()

avg = tot/((data.shape[0]-1)*(data.shape[0])/2.)

Well, I don't think that there is a super fast way to do this, but this should do it:

tot = 0.

for i in xrange(data.shape[0]-1):
    tot += ((((data[i+1:]-data[i])**2).sum(1))**.5).sum()

avg = tot/((data.shape[0]-1)*(data.shape[0])/2.)

回复收藏 0 原文

泪之魂 2024-08-30 11:41:26

现在您已经说明了查找异常值的目标，您可能最好计算样本均值以及样本方差，因为这两个操作都会为您提供 O(nd) 操作。这样，您应该能够找到异常值（例如，排除距离平均值比标准偏差的某些分数更远的点），并且该过滤过程应该可以在 O(nd) 时间内执行，总共 O(和）。

您可能有兴趣回顾一下切比雪夫不等式。

回复收藏 0 原文

方觉久 2024-08-30 11:41:26

如果没有可行的解决方案，是否值得进行优化？此外，在整个数据集上计算距离矩阵很少需要很快，因为您只需执行一次 - 当您需要知道两点之间的距离时，您只需查找它，它已经计算好了。

因此，如果您没有地方可以开始，这里是一个。如果您想在 Numpy 中执行此操作，而不需要编写任何内联 fortran 或 C，那应该没问题，尽管您可能希望包含这个名为“numexpr" （在 PyPI 上可用，安装很简单），在这种情况下，与单独使用 Numpy 相比，性能提升了 5 倍。

下面我计算了 2D 空间中 10,000 个点的距离矩阵（一个 10K x 10k 矩阵给出了所有 10k 点之间的距离）。我的 MBP 花了 59 秒。

import numpy as NP
import numexpr as NE

# data are points in 2D space (x, y)--obviously, this code can accept data of any dimension
x = NP.random.randint(0, 10, 10000)
y = NP.random.randint(0, 10, 10000)
fnx = lambda q : q - NP.reshape(q, (len(q), 1))
delX = fnx(x)
delY = fnx(y)
dist_mat = NE.evaluate("(delX**2 + delY**2)**0.5")

Is it ever worthwhile to optimize without a working solution? Also, computation of a distance matrix over the entire data set rarely needs to be fast because you only do it once--when you need to know a distance between two points, you just look it up, it's already calculated.

So if you don't have a place to start, here's one. If you want to do this in Numpy without the need to write any inline fortran or C, that should be no problem, though perhaps you want to include this small vector-based virtual machine called "numexpr" (available on PyPI, trivial to intall) which in this case gave a 5x performance boost versus Numpy alone.

Below i've calculated a distance matrix for 10,000 points in 2D space (a 10K x 10k matrix giving the distance between all 10k points). This took 59 seconds on my MBP.

import numpy as NP
import numexpr as NE

# data are points in 2D space (x, y)--obviously, this code can accept data of any dimension
x = NP.random.randint(0, 10, 10000)
y = NP.random.randint(0, 10, 10000)
fnx = lambda q : q - NP.reshape(q, (len(q), 1))
delX = fnx(x)
delY = fnx(y)
dist_mat = NE.evaluate("(delX**2 + delY**2)**0.5")

回复收藏 0 原文

超可爱的懒熊 2024-08-30 11:41:26

无法回避评估的数量：

Sum[ni, {i, 0, n} ] = http://www.equationsheet.com/latexrender/pictures/27744c0bd81116aa31c138ab38a2aa87.gif

但是，如果您可以使用近似结果。这取决于您的需求。

如果您要计算平均值，我建议您在计算之前不要尝试将所有值放入数组中。只需计算总和（如果还需要标准差，则计算平方和）并在计算时丢弃每个值。

自

回复收藏 0 原文

月棠 2024-08-30 11:41:26

如果您想要快速且不精确的解决方案，您可以采用快速多极方法算法。

相距较小距离的点对最终平均距离的贡献较小，因此将点分组为簇并比较簇距离是有意义的。

回复收藏 0 原文

云仙小弟 2024-08-30 11:41:26

在水平轴 (1D) 上的一组点中，“欧几里得距离”只是点之间的差异，您可以使用 np.diff 非常轻松地计算它们的平均值：

import numpy as np

arr = np.array([10,80,50,5,25,4])
avg = np.mean (    abs ( np.diff(arr)  )     )
print(avg)

打印结果： 37.2

如果您想考虑结果平均值中的负差异，则可以排除 abs。祝你好运。

in just a set of points on a horizontal axis (1D), the "euclidean distance" is simply the difference between points, and you can use np.diff to calculate their mean very easily:

import numpy as np

arr = np.array([10,80,50,5,25,4])
avg = np.mean (    abs ( np.diff(arr)  )     )
print(avg)

which prints: 37.2

you can exclude abs if you want to consider negative differences in the resulted mean. good luck.

回复收藏 0 原文

~没有更多了~

关于作者

溺孤伤于心

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

使用 Numpy 求一组点的平均距离

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

使用 Numpy 求一组点的平均距离

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。