K-均值算法
我正在尝试用 Java 编写 k-means 算法。 我计算了许多数组,每个数组都包含许多系数。 我需要使用 k 均值算法来对所有这些数据进行分组。 你知道这个算法的任何实现吗?
I'm trying to program a k-means algorithm in Java. I have calculated a number of arrays, each of them containing a number of coefficients. I need to use a k-means algorithm in order to group all this data. Do you know of any implementation of this algorithm?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
分类、聚类和分组是信息检索成熟的领域。 这里有一个非常好的(Java)库/软件(开源) 称为 WEKA。 那里有几种聚类算法。 尽管有一个学习曲线,但当您遇到更困难的问题时它可能会很有用。
Classification, Clustering and grouping are well developed areas of IR. There is a very good (Java) library/software (open source) here Called WEKA. There are several algorithms for clustering there. Although there is a learning curve, it might useful when you encounter harder problems.
OpenCV 是我用过的写得最糟糕的库之一。
另一方面,Matlab 做得非常巧妙。
如果您必须自己编写代码,那么该算法非常简单,而且效率很高。
OpenCV is one of the most horribly written libraries I've ever had to use.
On the other hand, Matlab does it very neatly.
If you have to code it yourself, the algorithm is incredibly simple for how efficient it is.
“集体智能编程”。 我强烈推荐它。
我知道你必须翻译成 Java,但这看起来并不太困难。
There's a very nice Python implementation of K-means clustering in "Programming Collective Intelligence". I highly recommend it.
I realize that you'll have to translate to Java, but it doesn't look to be too difficult.
确实,KMeans 是一个非常简单的算法。 有什么好的理由为什么不自己手动编码呢? 我在 Qt 中完成了它,然后将代码移植到普通的旧 STL,没有太多问题。
我开始支持 Joel 的想法:没有外部依赖,所以请随意告诉我你无法控制的大型软件有什么好处,其他人在这个问题上已经提到这不是一个好的软件。软件/
谈话是廉价的,真正的男人向世界展示他们的代码:
http://github.com/elcuco/data_mining_demo
我应该稍微清理一下代码以使其更通用,当前版本尚未移植到 STL,但这是一个开始!
Really, KMeans is a really easy algorithm. Any good reason why not hand coding it yourself? I did it in Qt and then ported the code to plain old STL, without too much problems.
I am started to be a fan to Joel's idea: no external dependencies, so please feel free to tell me what's good about a large piece of software you don't control, and others on this question have already mentioned it's not a good piece of software/
Talk is cheap, real man show their code to the world:
http://github.com/elcuco/data_mining_demo
I should clean the code a little to be more generic, and current version is not ported to STL, but it's a start!
非常老的问题,但我注意到没有提到 Java 机器学习库 它有一个 < a href="http://java-ml.sourceforge.net/api/0.1.7/net/sf/javaml/clustering/KMeans.html" rel="nofollow">K-Means 并包括 < a href="http://java-ml.sourceforge.net/src/tutorials/clustering/TutorialClusterEvaluation.java" rel="nofollow">一些关于其用法的文档。
该项目不是很活跃,但最后一个版本相对较新(2012 年 7 月)
Very old question but I noticed there is no mention of the Java Machine Learning Library which has an implementation of K-Means and includes some documentation about it's usage.
The project is not very active but the last version is relatively recent (July 2012)
似乎每个发帖的人都忘记提及事实上的图像处理库:OpenCV http://sourceforge.net/projects/ opencvlibrary/. 您必须围绕 C OpenCV 代码编写 JNI 包装器才能使 KMeans 工作,但额外的好处是
主要缺点是您必须编写 JNI 包装器。 我曾经需要一个模板匹配例程,并面临许多替代方案,但我发现 OpenCV 是迄今为止最好的,尽管我被迫为其编写一个 JNI 包装器。
It seems everyone who posted forgot to mention the defacto image processing library: OpenCV http://sourceforge.net/projects/opencvlibrary/. You would have to write a JNI wrapper around the C OpenCV code to get KMeans to work but the added benefit would be
The main draw back is that you would have to write a JNI wrapper. I once needed a template matching routine and was faced with many alternatives but I found OpenCV to be by far the best, even though I was forced to write a JNI wrapper for it.