获取散点图的质心

发布于 2025-02-06 10:16:39 字数 1405 浏览 2 评论 0原文

我通过特征提取的前两个PCA元素来生成此散点图... PCA1和PCA2。

上面显示的绘图是针对3个类，pca1（x轴）与pca2（y轴）。我已经产生了以下图：

target_names = ['class_1', 'class_2', 'class_3']
plt.figure(figsize=(11, 8))
Xt = pca.fit_transform(X)
plot = plt.scatter(Xt[:,0], Xt[:,1], c=y, cmap=plt.cm.jet,
            s=30, linewidths=0, alpha=0.7)
#centers = kmeans.cluster_centers_
#plt.scatter(centers[:, 0], centers[:, 1], c=['black', 'green', 'red'], marker='^', s=100, #alpha=0.5);
plt.legend(handles=plot.legend_elements()[0], labels=list(target_names))
plt.show()

我想知道如何正确获取图中每个类的质心。

这是数据的前几列：

Xt1 Xt2 y
-107.988187 -23.70121   1
-128.578852 -20.222378  1
-124.522967 -25.298283  1
-96.222918  -25.028239  1
-95.152954  -23.94496   1
-113.275804 -26.563129  1
-101.803    -24.22359   1
-94.662469  -22.94211   1
-104.118882 -24.037226  1
439.765098  -101.532469 2
50.100362   -34.278841  2
-69.229603  62.178599   2
-60.915475  53.296491   2
64.797364   91.991527   2
-112.815192 0.263505    0
-91.287067  -25.207217  0
-74.181941  -2.457892   0
-83.273718  -0.608004   0
-100.881393 -22.387571  0
-107.861711 -15.848869  0
-85.866992  -18.79126   0
-53.96314   -28.885316  0
-59.195432  -3.373361   0

任何帮助将不胜感激。

原文

I have generated this scatter plot via the plotting of the first two PCA elements from a feature extraction...PCA1 and PCA2.

The plot shown above is for 3 classes and with PCA1 (x-axis) vs PCA2 (y-axis). I have generated the plot as follow:

target_names = ['class_1', 'class_2', 'class_3']
plt.figure(figsize=(11, 8))
Xt = pca.fit_transform(X)
plot = plt.scatter(Xt[:,0], Xt[:,1], c=y, cmap=plt.cm.jet,
            s=30, linewidths=0, alpha=0.7)
#centers = kmeans.cluster_centers_
#plt.scatter(centers[:, 0], centers[:, 1], c=['black', 'green', 'red'], marker='^', s=100, #alpha=0.5);
plt.legend(handles=plot.legend_elements()[0], labels=list(target_names))
plt.show()

I wanted to know how to correctly get the centroid of each of the classes from the plot.

Here are the first few columns of the data:

Xt1 Xt2 y
-107.988187 -23.70121   1
-128.578852 -20.222378  1
-124.522967 -25.298283  1
-96.222918  -25.028239  1
-95.152954  -23.94496   1
-113.275804 -26.563129  1
-101.803    -24.22359   1
-94.662469  -22.94211   1
-104.118882 -24.037226  1
439.765098  -101.532469 2
50.100362   -34.278841  2
-69.229603  62.178599   2
-60.915475  53.296491   2
64.797364   91.991527   2
-112.815192 0.263505    0
-91.287067  -25.207217  0
-74.181941  -2.457892   0
-83.273718  -0.608004   0
-100.881393 -22.387571  0
-107.861711 -15.848869  0
-85.866992  -18.79126   0
-53.96314   -28.885316  0
-59.195432  -3.373361   0

Any help will be greatly appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

说谎友 2025-02-13 10:16:39

假设y是与x行相对应的标签数组（因此，XT），我们可以使用XT [：，：2] 和y，然后使用groupby（'y'）汇总> XT [：,,,, 0]和XT [：，1]对于y的每个值：

import pandas as pd

df = pd.DataFrame(Xt[:, :2], columns=['Xt1', 'Xt2'])
df['y'] = y
df.groupby('y').mean()

这将产生> XT [：，0] y中每个标签的代码>和XT [：，1]数据的主要组成部分。

借助OP提供的数据段，以下脚本将质心计算并在图上覆盖它们为'x的颜色与数据相同的颜色：

df = pd.DataFrame(Xt[:, :2], columns=['Xt1', 'Xt2'])
df['y'] = y
df_centroid = df.groupby('y').mean().reset_index()

target_names = ['class_1', 'class_2', 'class_3']
plt.figure(figsize=(11, 8))
plot = plt.scatter(Xt[:, 0], Xt[:, 1], c=y, cmap=plt.cm.jet,
                   s=30, linewidths=0, alpha=0.5)
# Overlays the centroids on the plot as 'X'
plt.scatter(df_centroid.Xt1, df_centroid.Xt2, marker='x', s=60,
            c=df_centroid.y, cmap=plt.cm.jet)
plt.legend(handles=plot.legend_elements()[0], labels=list(target_names))
plt.show()

Assuming that y is an array of labels corresponding to the rows of X (and therefore Xt), we can create a data frame with Xt[:, :2] and y and then use groupby('y') to aggregate the mean values for Xt[:, 0] and Xt[:, 1] for each value of y:

import pandas as pd

df = pd.DataFrame(Xt[:, :2], columns=['Xt1', 'Xt2'])
df['y'] = y
df.groupby('y').mean()

This will produce the means of Xt[:, 0] and Xt[:, 1] for each label in y, which are the centroid coordinates of each label in y in the first two principal components of the data.

With the snippet of data that the OP provided, the following script computes the centroids and overlays them on the plot as 'X's of the same color as the data:

df = pd.DataFrame(Xt[:, :2], columns=['Xt1', 'Xt2'])
df['y'] = y
df_centroid = df.groupby('y').mean().reset_index()

target_names = ['class_1', 'class_2', 'class_3']
plt.figure(figsize=(11, 8))
plot = plt.scatter(Xt[:, 0], Xt[:, 1], c=y, cmap=plt.cm.jet,
                   s=30, linewidths=0, alpha=0.5)
# Overlays the centroids on the plot as 'X'
plt.scatter(df_centroid.Xt1, df_centroid.Xt2, marker='x', s=60,
            c=df_centroid.y, cmap=plt.cm.jet)
plt.legend(handles=plot.legend_elements()[0], labels=list(target_names))
plt.show()