每个簇大小具有上限要求的聚类算法

发布于 2024-11-16 21:32:22 字数 101 浏览 1 评论 0原文

我需要将大约 50000 个点划分为不同的簇。有一个要求：每个簇的大小不能超过K。有没有任何聚类算法可以完成这项工作？

请注意，每个簇的上限 K 都是相同的，比如 100。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

独闯女儿国 2024-11-23 21:32:22

大多数聚类算法可用于创建树，其中最低级别只是单个元素 - 要么是因为它们自然地通过连接成对元素然后连接元素组来“自下而上”地工作，要么因为 - 就像 K 均值一样，它们可用于反复将组分成更小的组。

一旦你有了一棵树，你就可以决定在哪里分割子树以形成大小 <= 100 的簇。修剪现有的树通常非常容易。假设您想要划分现有树以最小化您创建的集群的一些成本总和。您可能有：

f(tree-node, list_of_clusters)
{
  cost = infinity;
  if (size of tree below tree-node <= 100)
  {
    cost = cost_function(stuff below tree-node);
  }
  temp_list = new List();
  cost_children = 0;
  for (children of tree_node)
  {
    cost_children += f(child, temp_list);
  }
  if (cost_children < cost)
  {
    list_of_clusters.add_all(temp_list);
    return cost_children;
  }
  list_of_clusters.add(tree_node);
  return cost;
}

Most clustering algorithms can be used to create a tree in which the lowest level is just a single element - either because they naturally work "bottom up" by joining pairs of elements and then groups of joined elements, or because - like K-Means, they can be used to repeatedly split groups into smaller groups.

Once you have a tree, you can decide where to split off subtrees to form your clusters of size <= 100. Pruning an existing tree is often quite easy. Suppose that you want to divide an existing tree to minimise the sum of some cost of the clusters you create. You might have:

f(tree-node, list_of_clusters)
{
  cost = infinity;
  if (size of tree below tree-node <= 100)
  {
    cost = cost_function(stuff below tree-node);
  }
  temp_list = new List();
  cost_children = 0;
  for (children of tree_node)
  {
    cost_children += f(child, temp_list);
  }
  if (cost_children < cost)
  {
    list_of_clusters.add_all(temp_list);
    return cost_children;
  }
  list_of_clusters.add(tree_node);
  return cost;
}

回复收藏 0 原文