K-Means的实现详细信息++没有Sklearn
我正在使用MINST数据集进行K均值。但是,我发现了初始化的实施和进一步步骤的困难。
对于初始化,我必须首先选择一个随机数据指向第一个质心。然后,对于剩余的质心,我们还随机选择数据点,但是从加权概率分布中,直到选择所有质心
我在此步骤中坚持,我如何应用此分布来选择?我的意思是,如何实施它?对于d_ {k-1}(x)
,我可以只使用np.linalg.norm
对其进行编译和平方吗?
第一个元素,我是否需要通过获得上一个质心和所有样本点之间的最大距离来找到下一个质心?
self.centroids = np.zeros((self.num_clusters, input_x.shape[1]))
ran_num = np.random.choice(input_x.shape[0])
self.centroids[0] = input_x[ran_num]
for k in range(1, self.num_clusters):
对于我的实施,我现在只是初始化了下一步的
I am doing K-means using MINST dataset. However, I found difficulties in the implementation on initialization and some further steps.
For the initialization, I have to first pick one random data point to the first centroid. Then for the remaining centroids, we also pick data points randomly, but from a weighted probability distribution, until all the centroids are chosen
I am sticking in this step, how can I apply this distribution to choose? I mean, how to implement it? for the D_{k-1}(x)
, can I just use np.linalg.norm
to compile and square it?
For my implementation, I now just initialized the first element
self.centroids = np.zeros((self.num_clusters, input_x.shape[1]))
ran_num = np.random.choice(input_x.shape[0])
self.centroids[0] = input_x[ran_num]
for k in range(1, self.num_clusters):
for the next step, do I need to find the next centroid by obtaining the largest distance between the previous centroid and all sample points?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要创建一个分布,其中选择观察结果的概率是观察结果与最接近的群集之间的(归一化)距离。因此,为了选择一个新的群集中心,很有可能选择远离已经存在的群集中心的观测值。同样,选择与已经存在的群集中心接近的观测值的可能性很低。
看起来像这样:
You need to create a distribution where the probability to select an observation is the (normalized) distance between the observation and its closest cluster. Thus, to select a new cluster center, there is a high probability to select observations that are far from all already existing cluster centers. Similarly, there is a low probability to select observations that are close to already existing cluster centers.
This would look like this: