T-SNE以进行更好的数据可视化

发布于 2025-02-09 09:50:05 字数 1433 浏览 1 评论 0 原文

我的数据集形状是（248857，11） 这就是StandartScaler之前的样子。我进行了聚类分析，因为这些聚类算法（例如K-均值）确实需要特征缩放，然后才能将其馈送到ALGO。

之后

我用三个簇进行了K-Means，并试图找到一种显示这些簇的方法。我找到了T-SNE作为解决方案，但我被卡住了。这就是我的实现方式：

# save the clusters into a variable l.
l = df_scale['clusters']
d = df_scale.drop("clusters", axis = 1)
standardized_data = StandardScaler().fit_transform(d)

# TSNE Picking the top 100000points as TSNE
data_points = standardized_data[0:100000, :]
labels_80 = l[0:100000]
 
model = TSNE(n_components = 2, random_state = 0)
tsne_data = model.fit_transform(data_points)
 
# creating a new data frame which help us in ploting the result data
tsne_data = np.vstack((tsne_data.T, labels_80)).T
tsne_df = pd.DataFrame(data = tsne_data,
columns =("Dimension1", "Dimension2", "Clusters"))
 
# Ploting the result of tsne
sns.FacetGrid(tsne_df, hue ="Clusters", size = 6).map(
plt.scatter, 'Dimension1', 'Dimension2').add_legend()
 
plt.show()

如您所见，这不是很好。如何更好地想象这一点？

原文

My dataset shape is (248857, 11)
This is how it looks like before StandartScaler. I performed clustering analysis because of those clustering algorithms such as K-means do need feature scaling before they are fed to the algo.

After

I performed K-Means with three clusters and I am trying to find a way to show these clusters.
I found T-SNE as a solution but I am stuck.
This is how I implemented it:

# save the clusters into a variable l.
l = df_scale['clusters']
d = df_scale.drop("clusters", axis = 1)
standardized_data = StandardScaler().fit_transform(d)

# TSNE Picking the top 100000points as TSNE
data_points = standardized_data[0:100000, :]
labels_80 = l[0:100000]
 
model = TSNE(n_components = 2, random_state = 0)
tsne_data = model.fit_transform(data_points)
 
# creating a new data frame which help us in ploting the result data
tsne_data = np.vstack((tsne_data.T, labels_80)).T
tsne_df = pd.DataFrame(data = tsne_data,
columns =("Dimension1", "Dimension2", "Clusters"))
 
# Ploting the result of tsne
sns.FacetGrid(tsne_df, hue ="Clusters", size = 6).map(
plt.scatter, 'Dimension1', 'Dimension2').add_legend()
 
plt.show()