我如何提高k-的轮廓分数?意味着聚类

发布于 2025-02-08 15:13:47 字数 1192 浏览 1 评论 0原文

我有一个有关某些客户的数据集,其中有18000行,例如:

我正在尝试使用K-Means算法进行一些聚类。 由于我具有分类和连续变量,因此我为分类变量创建了一些假人

#dummy codification
dataML=pd.get_dummies(dataML)

#print(dataML.head())
X=dataML
mms=StandardScaler()
Xnorm=mms.fit_transform(X)

,然后继续进行群集

#Fit k-means , k=3
#3 clusters, 10 initializations(find 10 times the initial clusters(random), max iterations, seed)
km=KMeans(n_clusters=4,n_init=10,max_iter=30,random_state=42)
y_kmeans=km.fit_predict(Xnorm)

#K-labels assigned
print("Labels assigned: ")
print(y_kmeans)

#The lowest SSE value
print("The lowest SSE value: " ,km.inertia_)

#The number of iterations required to converge
print("Num iterations to converge: ",km.n_iter_)

print("Final centers")
print(km.cluster_centers_)

#Clustering evaluation
#Silhouette score

#the closest to 1 the better
silSc=silhouette_score(X,y_kmeans,metric="euclidean")
print("Silhouette score: " , round(silSc,3))

,但是我得到了负式轮廓分数值。我的代码有问题吗?

我尝试删除标准标准,而Silhoutte得分高达0.6。为什么会发生这种情况?

I have a dataset with 18000 lines about some Customers, like this:
enter image description here

and I am trying to do some clustering using k-means algorithm.
Since I have both categorical and continuous variables I created some dummies for the categorical variables

#dummy codification
dataML=pd.get_dummies(dataML)

#print(dataML.head())
X=dataML
mms=StandardScaler()
Xnorm=mms.fit_transform(X)

I then proceed to do the clustering

#Fit k-means , k=3
#3 clusters, 10 initializations(find 10 times the initial clusters(random), max iterations, seed)
km=KMeans(n_clusters=4,n_init=10,max_iter=30,random_state=42)
y_kmeans=km.fit_predict(Xnorm)

#K-labels assigned
print("Labels assigned: ")
print(y_kmeans)

#The lowest SSE value
print("The lowest SSE value: " ,km.inertia_)

#The number of iterations required to converge
print("Num iterations to converge: ",km.n_iter_)

print("Final centers")
print(km.cluster_centers_)

#Clustering evaluation
#Silhouette score

#the closest to 1 the better
silSc=silhouette_score(X,y_kmeans,metric="euclidean")
print("Silhouette score: " , round(silSc,3))

But I get a negative silhouette score value. Is there something wrong with my code?

I tried removing the StandardScaler and the silhoutte score went up to 0,6. Why does this happen?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文