为什么只使用距离的第一列?

发布于 2025-02-07 21:00:39 字数 253 浏览 1 评论 0原文

我有一个聚类(应用了kmeans并获得集群)。每个集群的半径正在中心和观测值之间计算。

我在这里不明白这一点[:,0]我知道我们正在从第一列中进行所有观察,但是为什么不参加第二列呢? [:,0]表示什么?

X_distances = euclidean_distances(X, [center])[:, 0]
radius = np.max(X_distances)

I have a clustering (Kmeans was applied and clusters were obtained). The radius for each cluster is being calculated between the center and the observations.

I don't understand this here [:, 0] I know we're taking all observations from the first column but why not take the second column as well? What does [:, 0] represent?

X_distances = euclidean_distances(X, [center])[:, 0]
radius = np.max(X_distances)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

郁金香雨 2025-02-14 21:00:39

每当您遇到这样的东西时,请稍微解开代码,看看每个作品给您的东西。

在这种情况下, for code> sklealeln的文档。 Metrics.pairwise.euclidean_distances - 我认为您正在使用(请在您的问题中包含此类信息!) - 另外,请告诉您您需要知道的内容:

返回:距离: ndarray的形状(n_samples_x,n_samples_y)

因此功能euclidean_distances(x,y)返回所有距离的2D数组在x中的点与y中的点之间。您的X是您的所有数据,您的y只是一个点:群集的质心。因为y只是一个点,所以您所得的距离矩阵只有一个列。像这样:

from sklearn.metrics import euclidean_distances
import numpy as np

X = np.array([[1, 3], [2, 5], [0, 4]])
euclidean_distances(X, [[0, 0]])

这给出了:

array([[3.16227766],
       [5.38516481],
       [4.        ]])

因此,索引[:,0]正在获取此列。实际上,您可以跳过索引,因为np.max()不在乎:它只是为您提供整个数组的最大值。因此,您可以将代码减少到:

radius = euclidean_distances(X, [center]).max()

Whenever you come across things like this, unpack the code a bit and look at what each piece is giving you.

In this case, the docs for sklearn.metrics.pairwise.euclidean_distances — which I assume you are using (please include this sort of information in your questions!) — also tell you what you need to know:

Returns: distances: ndarray of shape (n_samples_X, n_samples_Y)

So the function euclidean_distances(X, Y) returns a 2D array of all the distances between the points in X and the points in Y. Your X is all your data, and your Y is just one point: the centroid of the cluster. Because Y is only one point, your resulting distance matrix has only one column. Like this:

from sklearn.metrics import euclidean_distances
import numpy as np

X = np.array([[1, 3], [2, 5], [0, 4]])
euclidean_distances(X, [[0, 0]])

This gives:

array([[3.16227766],
       [5.38516481],
       [4.        ]])

So the index [:, 0] is getting this column. In fact, you could skip the indexing because np.max() doesn't care: it's just going to give you the max of the entire array. So you could reduce your code to:

radius = euclidean_distances(X, [center]).max()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文