获取散点图的质心
我通过特征提取的前两个PCA元素来生成此散点图... PCA1和PCA2。
上面显示的绘图是针对3个类,pca1(x轴)与pca2(y轴)。我已经产生了以下图:
target_names = ['class_1', 'class_2', 'class_3']
plt.figure(figsize=(11, 8))
Xt = pca.fit_transform(X)
plot = plt.scatter(Xt[:,0], Xt[:,1], c=y, cmap=plt.cm.jet,
s=30, linewidths=0, alpha=0.7)
#centers = kmeans.cluster_centers_
#plt.scatter(centers[:, 0], centers[:, 1], c=['black', 'green', 'red'], marker='^', s=100, #alpha=0.5);
plt.legend(handles=plot.legend_elements()[0], labels=list(target_names))
plt.show()
我想知道如何正确获取图中每个类的质心。
这是数据的前几列:
Xt1 Xt2 y
-107.988187 -23.70121 1
-128.578852 -20.222378 1
-124.522967 -25.298283 1
-96.222918 -25.028239 1
-95.152954 -23.94496 1
-113.275804 -26.563129 1
-101.803 -24.22359 1
-94.662469 -22.94211 1
-104.118882 -24.037226 1
439.765098 -101.532469 2
50.100362 -34.278841 2
-69.229603 62.178599 2
-60.915475 53.296491 2
64.797364 91.991527 2
-112.815192 0.263505 0
-91.287067 -25.207217 0
-74.181941 -2.457892 0
-83.273718 -0.608004 0
-100.881393 -22.387571 0
-107.861711 -15.848869 0
-85.866992 -18.79126 0
-53.96314 -28.885316 0
-59.195432 -3.373361 0
任何帮助将不胜感激。
I have generated this scatter plot via the plotting of the first two PCA elements from a feature extraction...PCA1 and PCA2.
The plot shown above is for 3 classes and with PCA1 (x-axis) vs PCA2 (y-axis). I have generated the plot as follow:
target_names = ['class_1', 'class_2', 'class_3']
plt.figure(figsize=(11, 8))
Xt = pca.fit_transform(X)
plot = plt.scatter(Xt[:,0], Xt[:,1], c=y, cmap=plt.cm.jet,
s=30, linewidths=0, alpha=0.7)
#centers = kmeans.cluster_centers_
#plt.scatter(centers[:, 0], centers[:, 1], c=['black', 'green', 'red'], marker='^', s=100, #alpha=0.5);
plt.legend(handles=plot.legend_elements()[0], labels=list(target_names))
plt.show()
I wanted to know how to correctly get the centroid of each of the classes from the plot.
Here are the first few columns of the data:
Xt1 Xt2 y
-107.988187 -23.70121 1
-128.578852 -20.222378 1
-124.522967 -25.298283 1
-96.222918 -25.028239 1
-95.152954 -23.94496 1
-113.275804 -26.563129 1
-101.803 -24.22359 1
-94.662469 -22.94211 1
-104.118882 -24.037226 1
439.765098 -101.532469 2
50.100362 -34.278841 2
-69.229603 62.178599 2
-60.915475 53.296491 2
64.797364 91.991527 2
-112.815192 0.263505 0
-91.287067 -25.207217 0
-74.181941 -2.457892 0
-83.273718 -0.608004 0
-100.881393 -22.387571 0
-107.861711 -15.848869 0
-85.866992 -18.79126 0
-53.96314 -28.885316 0
-59.195432 -3.373361 0
Any help will be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设
y
是与x
行相对应的标签数组(因此,XT
),我们可以使用XT [:,:2] 和
y
,然后使用groupby('y')
汇总> XT [:,,,, 0]
和XT [:,1]
对于y
的每个值:这将产生
> XT [:,0] y
中每个标签的代码>和XT [:,1]数据的主要组成部分。
借助OP提供的数据段,以下脚本将质心计算并在图上覆盖它们为'x的颜色与数据相同的颜色:
Assuming that
y
is an array of labels corresponding to the rows ofX
(and thereforeXt
), we can create a data frame withXt[:, :2]
andy
and then usegroupby('y')
to aggregate the mean values forXt[:, 0]
andXt[:, 1]
for each value ofy
:This will produce the means of
Xt[:, 0]
andXt[:, 1]
for each label iny
, which are the centroid coordinates of each label iny
in the first two principal components of the data.With the snippet of data that the OP provided, the following script computes the centroids and overlays them on the plot as 'X's of the same color as the data: