为什么只使用距离的第一列?
我有一个聚类(应用了kmeans并获得集群)。每个集群的半径正在中心和观测值之间计算。
我在这里不明白这一点[:,0]
我知道我们正在从第一列中进行所有观察,但是为什么不参加第二列呢? [:,0]
表示什么?
X_distances = euclidean_distances(X, [center])[:, 0]
radius = np.max(X_distances)
I have a clustering (Kmeans was applied and clusters were obtained). The radius for each cluster is being calculated between the center and the observations.
I don't understand this here [:, 0]
I know we're taking all observations from the first column but why not take the second column as well? What does [:, 0]
represent?
X_distances = euclidean_distances(X, [center])[:, 0]
radius = np.max(X_distances)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
每当您遇到这样的东西时,请稍微解开代码,看看每个作品给您的东西。
在这种情况下, for code> sklealeln的文档。 Metrics.pairwise.euclidean_distances - 我认为您正在使用(请在您的问题中包含此类信息!) - 另外,请告诉您您需要知道的内容:
因此功能
euclidean_distances(x,y)
返回所有距离的2D数组在x
中的点与y
中的点之间。您的X
是您的所有数据,您的y
只是一个点:群集的质心。因为y
只是一个点,所以您所得的距离矩阵只有一个列。像这样:这给出了:
因此,索引
[:,0]
正在获取此列。实际上,您可以跳过索引,因为np.max()
不在乎:它只是为您提供整个数组的最大值。因此,您可以将代码减少到:Whenever you come across things like this, unpack the code a bit and look at what each piece is giving you.
In this case, the docs for
sklearn.metrics.pairwise.euclidean_distances
— which I assume you are using (please include this sort of information in your questions!) — also tell you what you need to know:So the function
euclidean_distances(X, Y)
returns a 2D array of all the distances between the points inX
and the points inY
. YourX
is all your data, and yourY
is just one point: the centroid of the cluster. BecauseY
is only one point, your resulting distance matrix has only one column. Like this:This gives:
So the index
[:, 0]
is getting this column. In fact, you could skip the indexing becausenp.max()
doesn't care: it's just going to give you the max of the entire array. So you could reduce your code to: