如何计算 k-means 中的质心++通过使用距离？

发布于 2025-01-02 02:17:40 字数 1164 浏览 4 评论 0原文

我在交互式遗传算法中使用 Apache Commons Math 的 k-means++ 聚类器来减少用户评估的个体数量。

Commons Math 使其非常易于使用。用户只需执行 可集群 界面。它有两个方法：

非常清晰的 double distanceFrom(T p) 和 T centroidOf(Collectionp)，它允许用户选择簇的质心。

如果用于欧几里德点，质心很容易计算。但在染色体上这是相当困难的，因为它们的含义并不总是很清楚。

我的问题：是否有一种有效的通用方法来选择质心，而不取决于问题域？（例如通过使用距离）

编辑

好的，现在这是我的质心计算代码。想法：与所有其他点的总距离最小的点距离质心最近。

public T centroidOf(Collection<T> c) {
  double minDist = Double.MAX_VALUE;
  T minP = null;

  // iterate through c
  final Iterator<T> it = c.iterator();
  while (it.hasNext()) {
    // test every point p1
    final T p1 = it.next();
    double totalDist = 0d;
    for (final T p2 : c) {
      // sum up the distance to all points p2 | p2!=p1
      if (p2 != p1) {
        totalDist += p1.distanceFrom(p2);
      }
    }

    // if the current distance is lower that the min, take it as new min
    if (totalDist < minDist) {
      minDist = totalDist;
      minP = p1;
    }
  }
  return minP;
}

原文

I am using the k-means++ clusterer from Apache Commons Math in a interactive genetic algorithm to reduce the number of individuals that are evaluated by the user.

Commons Math makes it very easy to use. The user only needs to implement the
Clusterable interface. It has two methods:

double distanceFrom(T p) which is quite clear and T centroidOf(Collection<T> p), which lets the user pick the centroid of a cluster.

If used on euclidean points, the centroid is very easy to calculate. But on chromosomes it is quite difficult, because their meaning is not always clear.

My question: Is there a efficient generic way to pick the centroid, not depending on the problem domain? (E.g. by using the distance)

EDIT

Ok, here is now my code for the centroid calculation.
The idea: The point that has the lowest total distance to all other points is the nearest to the centroid.

public T centroidOf(Collection<T> c) {
  double minDist = Double.MAX_VALUE;
  T minP = null;

  // iterate through c
  final Iterator<T> it = c.iterator();
  while (it.hasNext()) {
    // test every point p1
    final T p1 = it.next();
    double totalDist = 0d;
    for (final T p2 : c) {
      // sum up the distance to all points p2 | p2!=p1
      if (p2 != p1) {
        totalDist += p1.distanceFrom(p2);
      }
    }

    // if the current distance is lower that the min, take it as new min
    if (totalDist < minDist) {
      minDist = totalDist;
      minP = p1;
    }
  }
  return minP;
}

分享到QQ

分享到微博