我如何提高k-的轮廓分数?意味着聚类
我正在尝试使用K-Means算法进行一些聚类。 由于我具有分类和连续变量,因此我为分类变量创建了一些假人
#dummy codification
dataML=pd.get_dummies(dataML)
#print(dataML.head())
X=dataML
mms=StandardScaler()
Xnorm=mms.fit_transform(X)
,然后继续进行群集
#Fit k-means , k=3
#3 clusters, 10 initializations(find 10 times the initial clusters(random), max iterations, seed)
km=KMeans(n_clusters=4,n_init=10,max_iter=30,random_state=42)
y_kmeans=km.fit_predict(Xnorm)
#K-labels assigned
print("Labels assigned: ")
print(y_kmeans)
#The lowest SSE value
print("The lowest SSE value: " ,km.inertia_)
#The number of iterations required to converge
print("Num iterations to converge: ",km.n_iter_)
print("Final centers")
print(km.cluster_centers_)
#Clustering evaluation
#Silhouette score
#the closest to 1 the better
silSc=silhouette_score(X,y_kmeans,metric="euclidean")
print("Silhouette score: " , round(silSc,3))
,但是我得到了负式轮廓分数值。我的代码有问题吗?
我尝试删除标准标准,而Silhoutte得分高达0.6。为什么会发生这种情况?
I have a dataset with 18000 lines about some Customers, like this:
and I am trying to do some clustering using k-means algorithm.
Since I have both categorical and continuous variables I created some dummies for the categorical variables
#dummy codification
dataML=pd.get_dummies(dataML)
#print(dataML.head())
X=dataML
mms=StandardScaler()
Xnorm=mms.fit_transform(X)
I then proceed to do the clustering
#Fit k-means , k=3
#3 clusters, 10 initializations(find 10 times the initial clusters(random), max iterations, seed)
km=KMeans(n_clusters=4,n_init=10,max_iter=30,random_state=42)
y_kmeans=km.fit_predict(Xnorm)
#K-labels assigned
print("Labels assigned: ")
print(y_kmeans)
#The lowest SSE value
print("The lowest SSE value: " ,km.inertia_)
#The number of iterations required to converge
print("Num iterations to converge: ",km.n_iter_)
print("Final centers")
print(km.cluster_centers_)
#Clustering evaluation
#Silhouette score
#the closest to 1 the better
silSc=silhouette_score(X,y_kmeans,metric="euclidean")
print("Silhouette score: " , round(silSc,3))
But I get a negative silhouette score value. Is there something wrong with my code?
I tried removing the StandardScaler and the silhoutte score went up to 0,6. Why does this happen?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论