群集中每个标签的Sklearn聚类提取ID
您好,我正在学习如何使用Scikit-Learn聚类模块。我有一个工作脚本,可以在大熊猫数据框架中读取。
df=pd.read_csv("test.csv",index_col="identifier")
我将数据框转换为一个numpy阵列
array=df.to_numpy()
,然后实现了群集并绘制为:
km=KMeans(n_clusters=25,init="random",n_init=100,max_iter=1000,tol=1e-04, random_state=0)
##get cluster labels
y_km=km.fit_predict(array)
###To plot use PCA function
pca=PCA(n_components=3)
pca_t=pca.fit_transform(array)
####
u_labels=np.unique(y_km)
fig = plt.figure(figsize=(14,10))
ax = plt.axes(projection='3d')
for i in u_labels:
ax.scatter3D(pca_t[y_km == i , 0] , pca_t[y_km == i , 1],pca_t[y_km == i , 2], label = i)
ax.legend()
这一切都输出了一个看起来像这样的图:
我想尝试获取最终输出,以启用字典或文本某种形式的文件告诉我每个标识符基于原始数组的行ID所需的群集。我很难弄清楚如何维护该信息。我尝试查看是否可以使用pandas dataframe.to_records()函数,该函数维护了DTYPES,但无法弄清楚如何将其转换为我想要的内容。
Hello I am learning how to use the Scikit-learn clustering modules right now. I have a working script that reads in a pandas dataframe.
df=pd.read_csv("test.csv",index_col="identifier")
I converted the dataframe to a numpy array
array=df.to_numpy()
Then implemented the clustering and plotted as so:
km=KMeans(n_clusters=25,init="random",n_init=100,max_iter=1000,tol=1e-04, random_state=0)
##get cluster labels
y_km=km.fit_predict(array)
###To plot use PCA function
pca=PCA(n_components=3)
pca_t=pca.fit_transform(array)
####
u_labels=np.unique(y_km)
fig = plt.figure(figsize=(14,10))
ax = plt.axes(projection='3d')
for i in u_labels:
ax.scatter3D(pca_t[y_km == i , 0] , pca_t[y_km == i , 1],pca_t[y_km == i , 2], label = i)
ax.legend()
This all outputs a plot that looks like this:
I want to try and get a final output that ouputs a dictionary or text file of some sort that tells me what cluster each identifier is in based on the row ids of the original array. I was having trouble figuring out how to maintain that information though. I tried seeing if I could use the pandas Dataframe.to_records() function which maintained the dtypes but couldn't figure out how to translate that to what I wanted.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
y_km
以与熊猫数据框架中的行相同的顺序包含您的标签。例子:y_km
contains your labels in the same order as the rows in your pandas dataframe. example:您应该尝试:
这应该为您提供每个点的标签列表。
参见对于Kmeans。
You should try :
This should give you a list of label for each point.
See the documentation for KMeans.