T-SNE以进行更好的数据可视化

发布于 2025-02-09 09:50:05 字数 1433 浏览 1 评论 0 原文

我的数据集形状是(248857,11) 这就是StandartScaler之前的样子。我进行了聚类分析,因为这些聚类算法(例如K-均值)确实需要特征缩放,然后才能将其馈送到ALGO。

之后

我用三个簇进行了K-Means,并试图找到一种显示这些簇的方法。 我找到了T-SNE作为解决方案,但我被卡住了。 这就是我的实现方式:

# save the clusters into a variable l.
l = df_scale['clusters']
d = df_scale.drop("clusters", axis = 1)
standardized_data = StandardScaler().fit_transform(d)

# TSNE Picking the top 100000points as TSNE
data_points = standardized_data[0:100000, :]
labels_80 = l[0:100000]
 
model = TSNE(n_components = 2, random_state = 0)
tsne_data = model.fit_transform(data_points)
 
# creating a new data frame which help us in ploting the result data
tsne_data = np.vstack((tsne_data.T, labels_80)).T
tsne_df = pd.DataFrame(data = tsne_data,
columns =("Dimension1", "Dimension2", "Clusters"))
 
# Ploting the result of tsne
sns.FacetGrid(tsne_df, hue ="Clusters", size = 6).map(
plt.scatter, 'Dimension1', 'Dimension2').add_legend()
 
plt.show()

”在此处输入图像描述”

如您所见,这不是很好。如何更好地想象这一点?

My dataset shape is (248857, 11)
This is how it looks like before StandartScaler. I performed clustering analysis because of those clustering algorithms such as K-means do need feature scaling before they are fed to the algo.
enter image description here

After
enter image description here

I performed K-Means with three clusters and I am trying to find a way to show these clusters.
I found T-SNE as a solution but I am stuck.
This is how I implemented it:

# save the clusters into a variable l.
l = df_scale['clusters']
d = df_scale.drop("clusters", axis = 1)
standardized_data = StandardScaler().fit_transform(d)

# TSNE Picking the top 100000points as TSNE
data_points = standardized_data[0:100000, :]
labels_80 = l[0:100000]
 
model = TSNE(n_components = 2, random_state = 0)
tsne_data = model.fit_transform(data_points)
 
# creating a new data frame which help us in ploting the result data
tsne_data = np.vstack((tsne_data.T, labels_80)).T
tsne_df = pd.DataFrame(data = tsne_data,
columns =("Dimension1", "Dimension2", "Clusters"))
 
# Ploting the result of tsne
sns.FacetGrid(tsne_df, hue ="Clusters", size = 6).map(
plt.scatter, 'Dimension1', 'Dimension2').add_legend()
 
plt.show()

enter image description here

As you see, it is not that good. How to visualize this better?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

灰色世界里的红玫瑰 2025-02-16 09:50:05

看来您需要调整困惑超参数是:

一个可调参数,说(松散地)如何平衡数据的本地和全局方面之间的关注。从某种意义上说,参数是关于每个点都有的亲密邻居数量的猜测。困惑值对结果图片有复杂的影响。

这篇文章,更具体地说,

It seems you need to tune the perplexity hyper-parameter which is:

a tunable parameter that says (loosely) how to balance attention between local and global aspects of your data. The parameter is, in a sense, a guess about the number of close neighbors each point has. The perplexity value has a complex effect on the resulting pictures.

Read more about it in this post and more specifically, here.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文