二分k均值聚类算法解释
I was required to write a bisecting k-means algorithm, but I didnt understand the algorithm.
I know k-means algorithm.
Can you explain the algorithm, but not in academic language
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
![扫码二维码加入Web技术交流群](/public/img/jiaqun_03.jpg)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这个想法是迭代地将点云分成两部分。换句话说,您构建一个随机二叉树,其中每次分裂(具有两个子节点的节点)对应于将云中的点分裂为 2。
您从点云开始。
计算其质心(重心)w
在云的点中随机选择一个cL点
将点 cR 构造为 cL 与 w 的对称点(线段 cL->w 与 w->cR 相同)
将云的点分成两部分,最接近 cR 的点属于子云 R,最接近的点属于子云 R cL 属于子云 L
重申子云 R和 L
注意:
一旦使用了随机点,您就可以丢弃它们。但是,保留所有子曲线的质心。
当您的子云恰好包含一个点时停止。
如果你想要 k 个簇,只需取 k 个质心,使它们包含初始云的所有点。如果你愿意,你可以做更复杂的事情(最小化云的方差等...)假设你想要 4 个簇(为了方便起见,为 2 的幂)那么你只需要将云切成两半,然后将每个簇切成两半亚云一分为二。如果您想要 8 个簇,则再次将这些子云一分为二。再次针对 16 个簇。
如果您想要 K 个簇,其中 K 不是 2 的幂(假设是 24),那么请查看最接近的 2 的次幂。是 16。你还缺少 8 个簇。每个“16级簇”是“16级子云”的质心。您要做的就是取 8 个“16 级簇”(例如随机),并将它们分别替换为两个“子”“32 级簇”。 (这两个子“level-32-clusters”对应于两个“level-32-subcloud”,加起来等于父“level-16-subcloud”)
The idea is iteratively splitting your cloud of points in 2 parts. In other words, you build a random binary tree where each splitting (a node with two children) corresponds to splitting the points of your cloud in 2.
You begin with a cloud of points.
Compute its centroid (barycenter) w
Select a point at random cL among the points of the cloud
Construct the point cR as the symmetric point of cL when compared to w (the segment cL->w is the same as w->cR)
Separate the points of your cloud in two, the ones closest to cR belong to a subcloud R, and the ones closest to cL belongs to the subcloud L
Reiterate for the subclouds R and L
Notes :
You can discard the random points once you've used them. However, keep the centroids of all the subcoulds.
Stop when your subclouds contain exactly one point.
If you want k clusters, just take k centroids such that they contain all the points of the initial cloud. You can do much more elaborate stuff if you want (minimizing variance of the clouds, etc...) Suppose you want 4 clusters (a power of two for convenience) Then you only need to cut you cloud in two, and then cut each subclouds in two. If you want 8 clusters, then cut again these subclouds once in two. And again for 16 clusters.
If you want K clusters with K not a power of 2 (let's say 24) then look at the closest inferior power of two. It's 16. You still lack 8 clusters. Each "level-16-cluster" is the centroid of a "level-16-subcloud". What you'll do is take 8 "level-16-clusters" (at random for example) and replace them each with the two "child" "level-32-clusters". (These two child "level-32-clusters" correspond to two "level-32-subclouds" that add up to the parent "level-16-subcloud")