K-均值算法

发布于 2024-07-25 20:37:56 字数 99 浏览 6 评论 0原文

我正在尝试用 Java 编写 k-means 算法。 我计算了许多数组,每个数组都包含许多系数。 我需要使用 k 均值算法来对所有这些数据进行分组。 你知道这个算法的任何实现吗?

I'm trying to program a k-means algorithm in Java. I have calculated a number of arrays, each of them containing a number of coefficients. I need to use a k-means algorithm in order to group all this data. Do you know of any implementation of this algorithm?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

别挽留 2024-08-01 20:37:56

分类、聚类和分组是信息检索成熟的领域。 这里有一个非常好的(Java)库/软件(开源) 称为 WEKA。 那里有几种聚类算法。 尽管有一个学习曲线,但当您遇到更困难的问题时它可能会很有用。

Classification, Clustering and grouping are well developed areas of IR. There is a very good (Java) library/software (open source) here Called WEKA. There are several algorithms for clustering there. Although there is a learning curve, it might useful when you encounter harder problems.

我一直都在从未离去 2024-08-01 20:37:56

OpenCV 是我用过的写得最糟糕的库之一。
另一方面,Matlab 做得非常巧妙。

如果您必须自己编写代码,那么该算法非常简单,而且效率很高。

  1. 选择簇数 (k)
  2. 制作 k 个点(它们将成为质心)
  3. 随机化所有这些点的位置
  4. 计算每个点到所有质心的欧几里得距离 将
  5. 每个点的“隶属度”分配给最近的质心
  6. 通过以下方式建立新的质心平均属于给定簇的所有点的位置
  7. Goto 4 直到实现收敛,或者所做的更改无关紧要。

OpenCV is one of the most horribly written libraries I've ever had to use.
On the other hand, Matlab does it very neatly.

If you have to code it yourself, the algorithm is incredibly simple for how efficient it is.

  1. Pick number of clusters (k)
  2. Make k points (they're going to be the centroids)
  3. Randomize all these points location
  4. Calculate Euclidean distance from each point to all centroids
  5. Assign 'membership' of each point to the nearest centroid
  6. Establish the new centroids by averageing locations of all points belonging to a given cluster
  7. Goto 4 Until convergence is achieved, or changes made are irrelevant.
小巷里的女流氓 2024-08-01 20:37:56

“集体智能编程”。 我强烈推荐它。

我知道你必须翻译成 Java,但这看起来并不太困难。

There's a very nice Python implementation of K-means clustering in "Programming Collective Intelligence". I highly recommend it.

I realize that you'll have to translate to Java, but it doesn't look to be too difficult.

落花浅忆 2024-08-01 20:37:56

确实,KMeans 是一个非常简单的算法。 有什么好的理由为什么不自己手动编码呢? 我在 Qt 中完成了它,然后将代码移植到普通的旧 STL,没有太多问题。

我开始支持 Joel 的想法:没有外部依赖,所以请随意告诉我你无法控制的大型软件有什么好处,其他人在这个问题上已经提到这不是一个好的软件。软件/

谈话是廉价的,真正的男人向世界展示他们的代码:
http://github.com/elcuco/data_mining_demo

我应该稍微清理一下代码以使其更通用,当前版本尚未移植到 STL,但这是一个开始!

Really, KMeans is a really easy algorithm. Any good reason why not hand coding it yourself? I did it in Qt and then ported the code to plain old STL, without too much problems.

I am started to be a fan to Joel's idea: no external dependencies, so please feel free to tell me what's good about a large piece of software you don't control, and others on this question have already mentioned it's not a good piece of software/

Talk is cheap, real man show their code to the world:
http://github.com/elcuco/data_mining_demo

I should clean the code a little to be more generic, and current version is not ported to STL, but it's a start!

风流物 2024-08-01 20:37:56

非常老的问题,但我注意到没有提到 Java 机器学习库 它有一个 < a href="http://java-ml.sourceforge.net/api/0.1.7/net/sf/javaml/clustering/KMeans.html" rel="nofollow">K-Means 并包括 < a href="http://java-ml.sourceforge.net/src/tutorials/clustering/TutorialClusterEvaluation.java" rel="nofollow">一些关于其用法的文档。

该项目不是很活跃,但最后一个版本相对较新(2012 年 7 月)

Very old question but I noticed there is no mention of the Java Machine Learning Library which has an implementation of K-Means and includes some documentation about it's usage.

The project is not very active but the last version is relatively recent (July 2012)

绝不放开 2024-08-01 20:37:56

似乎每个发帖的人都忘记提及事实上的图像处理库:OpenCV http://sourceforge.net/projects/ opencvlibrary/. 您必须围绕 C OpenCV 代码编写 JNI 包装器才能使 KMeans 工作,但额外的好处是

  1. 您会知道 KMeans 算法经过了深度优化
  2. OpenCV 广泛使用 GPU,因此运行速度极快。

主要缺点是您必须编写 JNI 包装器。 我曾经需要一个模板匹配例程,并面临许多替代方案,但我发现 OpenCV 是迄今为止最好的,尽管我被迫为其编写一个 JNI 包装器。

It seems everyone who posted forgot to mention the defacto image processing library: OpenCV http://sourceforge.net/projects/opencvlibrary/. You would have to write a JNI wrapper around the C OpenCV code to get KMeans to work but the added benefit would be

  1. You would know that the KMeans algorithm is heavily optimized
  2. OpenCV makes use of your GPU extensively so it runs blazing fast

The main draw back is that you would have to write a JNI wrapper. I once needed a template matching routine and was faced with many alternatives but I found OpenCV to be by far the best, even though I was forced to write a JNI wrapper for it.

岁月打碎记忆 2024-08-01 20:37:56
//Aim:To implement Kmeans clustering algorithm.
//Program
import java.util.*;
class k_means
{
static int count1,count2,count3;
static int d[];
static int k[][];
static int tempk[][];
static double m[];
static double diff[];
static int n,p;

static int cal_diff(int a) // This method will determine the cluster in which an element go at a particular step.
{
int temp1=0;
for(int i=0;i<p;++i)
{
if(a>m[i])
diff[i]=a-m[i];
else
diff[i]=m[i]-a;
}
int val=0;
double temp=diff[0];
for(int i=0;i<p;++i)
{
if(diff[i]<temp)
{
temp=diff[i];
val=i;
}
}//end of for loop
return val;
}

static void cal_mean() // This method will determine intermediate mean values
{
for(int i=0;i<p;++i)
m[i]=0; // initializing means to 0
int cnt=0;
for(int i=0;i<p;++i)
{
cnt=0;
for(int j=0;j<n-1;++j)
{
if(k[i][j]!=-1)
{
m[i]+=k[i][j];
++cnt;
}}
m[i]=m[i]/cnt;
}
}

static int check1() // This checks if previous k ie. tempk and current k are same.Used as terminating case.
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
if(tempk[i][j]!=k[i][j])
{
return 0;
}
return 1;
}

public static void main(String args[])
{
Scanner scr=new Scanner(System.in);
/* Accepting number of elements */
System.out.println("Enter the number of elements ");
n=scr.nextInt();
d=new int[n];
/* Accepting elements */
System.out.println("Enter "+n+" elements: ");
for(int i=0;i<n;++i)
d[i]=scr.nextInt();
/* Accepting num of clusters */
System.out.println("Enter the number of clusters: ");
p=scr.nextInt();
/* Initialising arrays */
k=new int[p][n];
tempk=new int[p][n];
m=new double[p];
diff=new double[p];
/* Initializing m */
for(int i=0;i<p;++i)
m[i]=d[i];

int temp=0;
int flag=0;
do
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
{
k[i][j]=-1;
}
for(int i=0;i<n;++i) // for loop will cal cal_diff(int) for every element.
{
temp=cal_diff(d[i]);
if(temp==0)
k[temp][count1++]=d[i];
else
if(temp==1)
k[temp][count2++]=d[i];
else
if(temp==2)
k[temp][count3++]=d[i]; 
}
cal_mean(); // call to method which will calculate mean at this step.
flag=check1(); // check if terminating condition is satisfied.
if(flag!=1)
/*Take backup of k in tempk so that you can check for equivalence in next step*/
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
tempk[i][j]=k[i][j];

System.out.println("\n\nAt this step");
System.out.println("\nValue of clusters");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}//end of for loop
System.out.println("\nValue of m ");
for(int i=0;i<p;++i)
System.out.print("m"+(i+1)+"="+m[i]+"  ");

count1=0;count2=0;count3=0;
}
while(flag==0);

System.out.println("\n\n\nThe Final Clusters By Kmeans are as follows: ");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}
}
}
/*
Enter the number of elements
8
Enter 8 elements:
2 3 6 8 12 15 18 22
Enter the number of clusters:
3

At this step
Value of clusters
K1{ 2 }
K2{ 3 }
K3{ 6 8 12 15 18 22 }
Value of m
m1=2.0  m2=3.0  m3=13.5

At this step
Value of clusters
K1{ 2 }
K2{ 3 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.0  m2=5.666666666666667  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

The Final Clusters By Kmeans are as follows:
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 } */
//Aim:To implement Kmeans clustering algorithm.
//Program
import java.util.*;
class k_means
{
static int count1,count2,count3;
static int d[];
static int k[][];
static int tempk[][];
static double m[];
static double diff[];
static int n,p;

static int cal_diff(int a) // This method will determine the cluster in which an element go at a particular step.
{
int temp1=0;
for(int i=0;i<p;++i)
{
if(a>m[i])
diff[i]=a-m[i];
else
diff[i]=m[i]-a;
}
int val=0;
double temp=diff[0];
for(int i=0;i<p;++i)
{
if(diff[i]<temp)
{
temp=diff[i];
val=i;
}
}//end of for loop
return val;
}

static void cal_mean() // This method will determine intermediate mean values
{
for(int i=0;i<p;++i)
m[i]=0; // initializing means to 0
int cnt=0;
for(int i=0;i<p;++i)
{
cnt=0;
for(int j=0;j<n-1;++j)
{
if(k[i][j]!=-1)
{
m[i]+=k[i][j];
++cnt;
}}
m[i]=m[i]/cnt;
}
}

static int check1() // This checks if previous k ie. tempk and current k are same.Used as terminating case.
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
if(tempk[i][j]!=k[i][j])
{
return 0;
}
return 1;
}

public static void main(String args[])
{
Scanner scr=new Scanner(System.in);
/* Accepting number of elements */
System.out.println("Enter the number of elements ");
n=scr.nextInt();
d=new int[n];
/* Accepting elements */
System.out.println("Enter "+n+" elements: ");
for(int i=0;i<n;++i)
d[i]=scr.nextInt();
/* Accepting num of clusters */
System.out.println("Enter the number of clusters: ");
p=scr.nextInt();
/* Initialising arrays */
k=new int[p][n];
tempk=new int[p][n];
m=new double[p];
diff=new double[p];
/* Initializing m */
for(int i=0;i<p;++i)
m[i]=d[i];

int temp=0;
int flag=0;
do
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
{
k[i][j]=-1;
}
for(int i=0;i<n;++i) // for loop will cal cal_diff(int) for every element.
{
temp=cal_diff(d[i]);
if(temp==0)
k[temp][count1++]=d[i];
else
if(temp==1)
k[temp][count2++]=d[i];
else
if(temp==2)
k[temp][count3++]=d[i]; 
}
cal_mean(); // call to method which will calculate mean at this step.
flag=check1(); // check if terminating condition is satisfied.
if(flag!=1)
/*Take backup of k in tempk so that you can check for equivalence in next step*/
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
tempk[i][j]=k[i][j];

System.out.println("\n\nAt this step");
System.out.println("\nValue of clusters");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}//end of for loop
System.out.println("\nValue of m ");
for(int i=0;i<p;++i)
System.out.print("m"+(i+1)+"="+m[i]+"  ");

count1=0;count2=0;count3=0;
}
while(flag==0);

System.out.println("\n\n\nThe Final Clusters By Kmeans are as follows: ");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}
}
}
/*
Enter the number of elements
8
Enter 8 elements:
2 3 6 8 12 15 18 22
Enter the number of clusters:
3

At this step
Value of clusters
K1{ 2 }
K2{ 3 }
K3{ 6 8 12 15 18 22 }
Value of m
m1=2.0  m2=3.0  m3=13.5

At this step
Value of clusters
K1{ 2 }
K2{ 3 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.0  m2=5.666666666666667  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

The Final Clusters By Kmeans are as follows:
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 } */
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文