为什么只使用距离的第一列？

发布于 2025-02-07 21:00:39 字数 253 浏览 1 评论 0原文

我有一个聚类（应用了kmeans并获得集群）。每个集群的半径正在中心和观测值之间计算。

我在这里不明白这一点[：，0]我知道我们正在从第一列中进行所有观察，但是为什么不参加第二列呢？ [：，0]表示什么？

X_distances = euclidean_distances(X, [center])[:, 0]
radius = np.max(X_distances)

原文

I have a clustering (Kmeans was applied and clusters were obtained). The radius for each cluster is being calculated between the center and the observations.

I don't understand this here [:, 0] I know we're taking all observations from the first column but why not take the second column as well? What does [:, 0] represent?

X_distances = euclidean_distances(X, [center])[:, 0]
radius = np.max(X_distances)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

郁金香雨 2025-02-14 21:00:39

每当您遇到这样的东西时，请稍微解开代码，看看每个作品给您的东西。

在这种情况下， for code> sklealeln的文档。 Metrics.pairwise.euclidean_distances - 我认为您正在使用（请在您的问题中包含此类信息！） - 另外，请告诉您您需要知道的内容：

返回：距离： ndarray的形状（n_samples_x，n_samples_y）

因此功能euclidean_distances（x，y）返回所有距离的2D数组在x中的点与y中的点之间。您的X是您的所有数据，您的y只是一个点：群集的质心。因为y只是一个点，所以您所得的距离矩阵只有一个列。像这样：

from sklearn.metrics import euclidean_distances
import numpy as np

X = np.array([[1, 3], [2, 5], [0, 4]])
euclidean_distances(X, [[0, 0]])

这给出了：

array([[3.16227766],
       [5.38516481],
       [4.        ]])

因此，索引[：，0]正在获取此列。实际上，您可以跳过索引，因为np.max（）不在乎：它只是为您提供整个数组的最大值。因此，您可以将代码减少到：

radius = euclidean_distances(X, [center]).max()

Whenever you come across things like this, unpack the code a bit and look at what each piece is giving you.

In this case, the docs for sklearn.metrics.pairwise.euclidean_distances — which I assume you are using (please include this sort of information in your questions!) — also tell you what you need to know:

Returns: distances: ndarray of shape (n_samples_X, n_samples_Y)

So the function euclidean_distances(X, Y) returns a 2D array of all the distances between the points in X and the points in Y. Your X is all your data, and your Y is just one point: the centroid of the cluster. Because Y is only one point, your resulting distance matrix has only one column. Like this:

from sklearn.metrics import euclidean_distances
import numpy as np

X = np.array([[1, 3], [2, 5], [0, 4]])
euclidean_distances(X, [[0, 0]])

This gives:

array([[3.16227766],
       [5.38516481],
       [4.        ]])

So the index [:, 0] is getting this column. In fact, you could skip the indexing because np.max() doesn't care: it's just going to give you the max of the entire array. So you could reduce your code to:

radius = euclidean_distances(X, [center]).max()

回复收藏 0 原文

~没有更多了~

关于作者

雾里花

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

为什么只使用距离的第一列？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

为什么只使用距离的第一列？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。