如何计算 k-means 中的质心++通过使用距离?

发布于 2025-01-02 02:17:40 字数 1164 浏览 4 评论 0原文

我在交互式遗传算法中使用 Apache Commons Math 的 k-means++ 聚类器来减少用户评估的个体数量。

Commons Math 使其非常易于使用。用户只需执行 可集群 界面。它有两个方法:

非常清晰的 double distanceFrom(T p) 和 T centroidOf(Collectionp),它允许用户选择簇的质心。

如果用于欧几里德点,质心很容易计算。但在染色体上这是相当困难的,因为它们的含义并不总是很清楚。

我的问题:是否有一种有效的通用方法来选择质心,而不取决于问题域? (例如通过使用距离)


编辑

好的,现在这是我的质心计算代码。 想法:与所有其他点的总距离最小的点距离质心最近。

public T centroidOf(Collection<T> c) {
  double minDist = Double.MAX_VALUE;
  T minP = null;

  // iterate through c
  final Iterator<T> it = c.iterator();
  while (it.hasNext()) {
    // test every point p1
    final T p1 = it.next();
    double totalDist = 0d;
    for (final T p2 : c) {
      // sum up the distance to all points p2 | p2!=p1
      if (p2 != p1) {
        totalDist += p1.distanceFrom(p2);
      }
    }

    // if the current distance is lower that the min, take it as new min
    if (totalDist < minDist) {
      minDist = totalDist;
      minP = p1;
    }
  }
  return minP;
}

I am using the k-means++ clusterer from Apache Commons Math in a interactive genetic algorithm to reduce the number of individuals that are evaluated by the user.

Commons Math makes it very easy to use. The user only needs to implement the
Clusterable interface. It has two methods:

double distanceFrom(T p) which is quite clear and T centroidOf(Collection<T> p), which lets the user pick the centroid of a cluster.

If used on euclidean points, the centroid is very easy to calculate. But on chromosomes it is quite difficult, because their meaning is not always clear.

My question: Is there a efficient generic way to pick the centroid, not depending on the problem domain? (E.g. by using the distance)


EDIT

Ok, here is now my code for the centroid calculation.
The idea: The point that has the lowest total distance to all other points is the nearest to the centroid.

public T centroidOf(Collection<T> c) {
  double minDist = Double.MAX_VALUE;
  T minP = null;

  // iterate through c
  final Iterator<T> it = c.iterator();
  while (it.hasNext()) {
    // test every point p1
    final T p1 = it.next();
    double totalDist = 0d;
    for (final T p2 : c) {
      // sum up the distance to all points p2 | p2!=p1
      if (p2 != p1) {
        totalDist += p1.distanceFrom(p2);
      }
    }

    // if the current distance is lower that the min, take it as new min
    if (totalDist < minDist) {
      minDist = totalDist;
      minP = p1;
    }
  }
  return minP;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

若言繁花未落 2025-01-09 02:17:40

k-means 需要平均度量(例如,欧几里德)。如果不定义这样的度量和空间,您甚至不知道点的平均值是否实际上是空间内的点。

但是,您可以使用 k-medoids,它仅将原始点视为候选点中心点(而 k-means 找到不一定位于原始点上的均值/质心)。该算法寻找最小化成对差异的点(即distanceFrom)。

k-means requires an averaging metric (e.g., Euclidean). Without defining such a metric and space, you don't even know whether the average of points is actually a point inside the space.

You could, however, use k-medoids, which considers only the original points as candidates for medoids (while k-means finds means/centroids which are not necessarily on the original points). The algorithm looks for points which minimize pairwise dissimilarities (i.e., distanceFrom).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文