我正在寻找一种进行数据聚类分析的方法。这超出了我的能力范围,但我知道这是可以做到的。我正在寻找对我拥有的数据进行聚类并以可视化方式呈现的方法。我想到的是树状图,但我也愿意接受其他建议。
是否有任何已经编写的脚本或类可以帮助我完成这项任务?我更喜欢留在 LAMP 内。
谢谢。
I am looking for a way to do some data cluster analysis. This is way out of my league but i know it can be done. I am looking for ways to cluster the data that i have and present it in a visual way. One that come to mind is a dendrogram but i am open to other suggestions as well.
Are there any scripts or classes that have been already written that would be able to help me with this task? I prefer to stay within LAMP.
Thanks.
发布评论
评论(1)
我所知道的最完整的开源工具是胡萝卜2开源文档聚类框架。它们主要以 java 和 .NET 为中心,但可以通过 REST 接口与 Ruby 和 PHP5 一起使用。应该相对容易集成到您选择使用的任何框架中。
这是他们的主页 - http://project.carrot2.org/index.html
这是他们的他们的集群引擎和可视化的在线演示,您可能会对圆形可视化感兴趣(输入查询后,有 3 个可视化输出选项卡,这是中间的一个) - http://search.carrot2.org/stable/search
这是他们的商业产品,lingo3g - http://search.carrotsearch.com/carrot2-webapp/search 。它对大多数查询进行聚类的速度提高了 6-8 倍,提供不同(更好?)的结果聚类,并提供层次聚类和相应的可视化。如果您想使用它,您可以请求试用,您可以向他们发送电子邮件,他们将为您提供所有相应材料的访问权限(如 carrot2 开源下载),并提供 2 个月的试用许可证。
如果这不是您想要的,而您只需要原始的库集合,您还可以查看 apache 的 mahout 项目。
The most complete open source tool I know is the carrot2 open source framework for document clustering. They're primarily java and .NET centric but can be used with Ruby and PHP5 through the REST interface. Should be relatively easy to integrate into whatever framework you choose to work with.
This is their homepage - http://project.carrot2.org/index.html
This is their online demo of their cluster engine and visualization, the circle visualization may interest you (once you enter a query there's 3 visual output tabs, it's the middle one) - http://search.carrot2.org/stable/search
This is their commercial product, lingo3g - http://search.carrotsearch.com/carrot2-webapp/search . It's 6-8x faster at clustering most queries, gives different (better?) clustering of results, and provides hierarchical clustering and corresponding visualization. If you want to use that you can request a trial you can send them an e-mail, they'll give you access to all the corresponding materials (as the carrot2 open source download) with a 2 month trial license.
If this isn't what you're looking for and you want just a raw collection of libraries, you can also check out apache's mahout project.