K-均值算法

发布于 2024-07-25 20:37:56 字数 99 浏览 11 评论 0原文

我正在尝试用 Java 编写 k-means 算法。我计算了许多数组，每个数组都包含许多系数。我需要使用 k 均值算法来对所有这些数据进行分组。你知道这个算法的任何实现吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

别挽留 2024-08-01 20:37:56

分类、聚类和分组是信息检索成熟的领域。这里有一个非常好的（Java）库/软件（开源）称为 WEKA。那里有几种聚类算法。尽管有一个学习曲线，但当您遇到更困难的问题时它可能会很有用。

回复收藏 0 原文

我一直都在从未离去 2024-08-01 20:37:56

OpenCV 是我用过的写得最糟糕的库之一。
另一方面，Matlab 做得非常巧妙。

如果您必须自己编写代码，那么该算法非常简单，而且效率很高。

选择簇数 (k)
制作 k 个点（它们将成为质心）
随机化所有这些点的位置
计算每个点到所有质心的欧几里得距离将
每个点的“隶属度”分配给最近的质心
通过以下方式建立新的质心平均属于给定簇的所有点的位置
Goto 4 直到实现收敛，或者所做的更改无关紧要。

回复收藏 0 原文

小巷里的女流氓 2024-08-01 20:37:56

“集体智能编程”。我强烈推荐它。

我知道你必须翻译成 Java，但这看起来并不太困难。

回复收藏 0 原文

落花浅忆 2024-08-01 20:37:56

确实，KMeans 是一个非常简单的算法。有什么好的理由为什么不自己手动编码呢？我在 Qt 中完成了它，然后将代码移植到普通的旧 STL，没有太多问题。

我开始支持 Joel 的想法：没有外部依赖，所以请随意告诉我你无法控制的大型软件有什么好处，其他人在这个问题上已经提到这不是一个好的软件。软件/

谈话是廉价的，真正的男人向世界展示他们的代码：
http://github.com/elcuco/data_mining_demo

我应该稍微清理一下代码以使其更通用，当前版本尚未移植到 STL，但这是一个开始！

回复收藏 0 原文

风流物 2024-08-01 20:37:56

非常老的问题，但我注意到没有提到 Java 机器学习库它有一个 < a href="http://java-ml.sourceforge.net/api/0.1.7/net/sf/javaml/clustering/KMeans.html" rel="nofollow">K-Means 并包括 < a href="http://java-ml.sourceforge.net/src/tutorials/clustering/TutorialClusterEvaluation.java" rel="nofollow">一些关于其用法的文档。

该项目不是很活跃，但最后一个版本相对较新（2012 年 7 月）

回复收藏 0 原文

绝不放开 2024-08-01 20:37:56

似乎每个发帖的人都忘记提及事实上的图像处理库：OpenCV http://sourceforge.net/projects/ opencvlibrary/. 您必须围绕 C OpenCV 代码编写 JNI 包装器才能使 KMeans 工作，但额外的好处是

您会知道 KMeans 算法经过了深度优化
OpenCV 广泛使用 GPU，因此运行速度极快。

主要缺点是您必须编写 JNI 包装器。我曾经需要一个模板匹配例程，并面临许多替代方案，但我发现 OpenCV 是迄今为止最好的，尽管我被迫为其编写一个 JNI 包装器。

回复收藏 0 原文

岁月打碎记忆 2024-08-01 20:37:56

//Aim:To implement Kmeans clustering algorithm.
//Program
import java.util.*;
class k_means
{
static int count1,count2,count3;
static int d[];
static int k[][];
static int tempk[][];
static double m[];
static double diff[];
static int n,p;

static int cal_diff(int a) // This method will determine the cluster in which an element go at a particular step.
{
int temp1=0;
for(int i=0;i<p;++i)
{
if(a>m[i])
diff[i]=a-m[i];
else
diff[i]=m[i]-a;
}
int val=0;
double temp=diff[0];
for(int i=0;i<p;++i)
{
if(diff[i]<temp)
{
temp=diff[i];
val=i;
}
}//end of for loop
return val;
}

static void cal_mean() // This method will determine intermediate mean values
{
for(int i=0;i<p;++i)
m[i]=0; // initializing means to 0
int cnt=0;
for(int i=0;i<p;++i)
{
cnt=0;
for(int j=0;j<n-1;++j)
{
if(k[i][j]!=-1)
{
m[i]+=k[i][j];
++cnt;
}}
m[i]=m[i]/cnt;
}
}

static int check1() // This checks if previous k ie. tempk and current k are same.Used as terminating case.
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
if(tempk[i][j]!=k[i][j])
{
return 0;
}
return 1;
}

public static void main(String args[])
{
Scanner scr=new Scanner(System.in);
/* Accepting number of elements */
System.out.println("Enter the number of elements ");
n=scr.nextInt();
d=new int[n];
/* Accepting elements */
System.out.println("Enter "+n+" elements: ");
for(int i=0;i<n;++i)
d[i]=scr.nextInt();
/* Accepting num of clusters */
System.out.println("Enter the number of clusters: ");
p=scr.nextInt();
/* Initialising arrays */
k=new int[p][n];
tempk=new int[p][n];
m=new double[p];
diff=new double[p];
/* Initializing m */
for(int i=0;i<p;++i)
m[i]=d[i];

int temp=0;
int flag=0;
do
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
{
k[i][j]=-1;
}
for(int i=0;i<n;++i) // for loop will cal cal_diff(int) for every element.
{
temp=cal_diff(d[i]);
if(temp==0)
k[temp][count1++]=d[i];
else
if(temp==1)
k[temp][count2++]=d[i];
else
if(temp==2)
k[temp][count3++]=d[i]; 
}
cal_mean(); // call to method which will calculate mean at this step.
flag=check1(); // check if terminating condition is satisfied.
if(flag!=1)
/*Take backup of k in tempk so that you can check for equivalence in next step*/
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
tempk[i][j]=k[i][j];

System.out.println("\n\nAt this step");
System.out.println("\nValue of clusters");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}//end of for loop
System.out.println("\nValue of m ");
for(int i=0;i<p;++i)
System.out.print("m"+(i+1)+"="+m[i]+"  ");

count1=0;count2=0;count3=0;
}
while(flag==0);

System.out.println("\n\n\nThe Final Clusters By Kmeans are as follows: ");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}
}
}
/*
Enter the number of elements
8
Enter 8 elements:
2 3 6 8 12 15 18 22
Enter the number of clusters:
3

At this step
Value of clusters
K1{ 2 }
K2{ 3 }
K3{ 6 8 12 15 18 22 }
Value of m
m1=2.0  m2=3.0  m3=13.5

At this step
Value of clusters
K1{ 2 }
K2{ 3 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.0  m2=5.666666666666667  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

The Final Clusters By Kmeans are as follows:
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 } */

//Aim:To implement Kmeans clustering algorithm.
//Program
import java.util.*;
class k_means
{
static int count1,count2,count3;
static int d[];
static int k[][];
static int tempk[][];
static double m[];
static double diff[];
static int n,p;

static int cal_diff(int a) // This method will determine the cluster in which an element go at a particular step.
{
int temp1=0;
for(int i=0;i<p;++i)
{
if(a>m[i])
diff[i]=a-m[i];
else
diff[i]=m[i]-a;
}
int val=0;
double temp=diff[0];
for(int i=0;i<p;++i)
{
if(diff[i]<temp)
{
temp=diff[i];
val=i;
}
}//end of for loop
return val;
}

static void cal_mean() // This method will determine intermediate mean values
{
for(int i=0;i<p;++i)
m[i]=0; // initializing means to 0
int cnt=0;
for(int i=0;i<p;++i)
{
cnt=0;
for(int j=0;j<n-1;++j)
{
if(k[i][j]!=-1)
{
m[i]+=k[i][j];
++cnt;
}}
m[i]=m[i]/cnt;
}
}

static int check1() // This checks if previous k ie. tempk and current k are same.Used as terminating case.
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
if(tempk[i][j]!=k[i][j])
{
return 0;
}
return 1;
}

public static void main(String args[])
{
Scanner scr=new Scanner(System.in);
/* Accepting number of elements */
System.out.println("Enter the number of elements ");
n=scr.nextInt();
d=new int[n];
/* Accepting elements */
System.out.println("Enter "+n+" elements: ");
for(int i=0;i<n;++i)
d[i]=scr.nextInt();
/* Accepting num of clusters */
System.out.println("Enter the number of clusters: ");
p=scr.nextInt();
/* Initialising arrays */
k=new int[p][n];
tempk=new int[p][n];
m=new double[p];
diff=new double[p];
/* Initializing m */
for(int i=0;i<p;++i)
m[i]=d[i];

int temp=0;
int flag=0;
do
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
{
k[i][j]=-1;
}
for(int i=0;i<n;++i) // for loop will cal cal_diff(int) for every element.
{
temp=cal_diff(d[i]);
if(temp==0)
k[temp][count1++]=d[i];
else
if(temp==1)
k[temp][count2++]=d[i];
else
if(temp==2)
k[temp][count3++]=d[i]; 
}
cal_mean(); // call to method which will calculate mean at this step.
flag=check1(); // check if terminating condition is satisfied.
if(flag!=1)
/*Take backup of k in tempk so that you can check for equivalence in next step*/
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
tempk[i][j]=k[i][j];

System.out.println("\n\nAt this step");
System.out.println("\nValue of clusters");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}//end of for loop
System.out.println("\nValue of m ");
for(int i=0;i<p;++i)
System.out.print("m"+(i+1)+"="+m[i]+"  ");

count1=0;count2=0;count3=0;
}
while(flag==0);

System.out.println("\n\n\nThe Final Clusters By Kmeans are as follows: ");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}
}
}
/*
Enter the number of elements
8
Enter 8 elements:
2 3 6 8 12 15 18 22
Enter the number of clusters:
3

At this step
Value of clusters
K1{ 2 }
K2{ 3 }
K3{ 6 8 12 15 18 22 }
Value of m
m1=2.0  m2=3.0  m3=13.5

At this step
Value of clusters
K1{ 2 }
K2{ 3 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.0  m2=5.666666666666667  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

The Final Clusters By Kmeans are as follows:
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 } */

回复收藏 0 原文

~没有更多了~