数据挖掘引擎和框架?

发布于 2024-10-03 09:45:17 字数 55 浏览 10 评论 0原文

您知道并使用哪些开源/免费数据挖掘引擎和框架来处理文本数据?

感谢您的任何建议!

What opensource/free data mining engines and frameworks do you know and use for textual data?

Thank you for any advice!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

夏了南城 2024-10-10 09:45:17

不太确定您在寻找什么。也许类似于 Lucene

Not really sure of what you're looking for. Perhaps something like Lucene?

So要识趣 2024-10-10 09:45:17

Apache Mahout 是一个开源 Machile 学习库,可以与或不与 MapReduce (Apache Hadoop) 一起使用。

它提供了 Java 中的以下算法实现:

  • 协同过滤
  • 基于用户和项目的推荐器
  • K 均值、模糊 K 均值聚类
  • 均值平移聚类
  • 狄利克雷过程聚类
  • 潜在狄利克雷分配
  • 奇异值分解
  • 并行频繁模式挖掘
  • 互补朴素贝叶斯分类器
  • 基于随机森林决策树分类器

您可以阅读更多内容:
http://mahout.apache.org/

http://girlincomputerscience.blogspot.com.br/2010/11/apache-mahout.html

http://www.ibm.com/developerworks/java/library/j-mahout/

Apache Mahout is an OpenSource Machile Learning library, that can be used with or without MapReduce (Apache Hadoop).

It provides the folloeing algorithms implementation in Java:

  • Collaborative Filtering
  • User and Item based recommenders
  • K-Means, Fuzzy K-Means clustering
  • Mean Shift clustering
  • Dirichlet process clustering
  • Latent Dirichlet Allocation
  • Singular value decomposition
  • Parallel Frequent Pattern mining
  • Complementary Naive Bayes classifier
  • Random forest decision tree based classifier

You can read more:
http://mahout.apache.org/

http://girlincomputerscience.blogspot.com.br/2010/11/apache-mahout.html

http://www.ibm.com/developerworks/java/library/j-mahout/

上课铃就是安魂曲 2024-10-10 09:45:17

RapidMiner 是免费且开源的,可在 Windows、Mac、Linux 上运行,是一个基于图形工作流程的优秀程序。它运行所有 Weka 代码,并与 R 集成。

RapidMiner is free and open source and runs on windows, mac, linux, and is a nice graphical workflow based program. It runs all Weka code, and integrates with R.

昔梦 2024-10-10 09:45:17

Weka 和 Rapidminer 在集群方面没有那么强。他们主要进行分类和类似的预测,但很少进行聚类。看看 ELKI,它就像 WEKA 一个大学项目,但有大量的集群和异常值检测方法。

Weka and Rapidminer aren't that strong on clustering. They mostly do classification and similar predictions, but very little clustering. Have a look at ELKI, which is like WEKA a university project, but has tons of clustering and outlier detection methods.

能怎样 2024-10-10 09:45:17

我不了解引擎或框架,但我使用过这个名为 Weka< 的工具/a>,它实现了很多算法。

I don't know about engines or frameworks, but I've used this tool called Weka, it has plenty of algorithms implemented in it.

追我者格杀勿论 2024-10-10 09:45:17

对于文本处理(而不是数值数据挖掘和聚类),NLTK 工具包值得一看。目的是教授 Python 中的自然语言处理技术。因此它非常适合进行修改,如果您选择使用 Python,您一定会发现许多有用的组件类和实现。

And for text processing (rather than numeric data mining and clustering) then the NLTK toolkit is worth a look. This is intended to teach Natural Language Processing techniques in Python. So it is ideal for tinkering with, and you are bound to find many of the component classes and implementations useful if you choose to use Python.

三岁铭 2024-10-10 09:45:17

RapidMiner 是我首选的数据挖掘解决方案:
http://www.RapidMiner.com/

这是数据挖掘专家中最流行的数据挖掘工具的调查:
http://www.kdnuggets.com/2011/05 /tools-used-analytics-data-mining.html

KDnuggets 2011 年民意调查:RapidMiner 是全球数据挖掘专家中使用最广泛的数据挖掘解决方案。

RapidMiner is my prefered data mining solution:
http://www.RapidMiner.com/

Here is survey of the most popular data mining tools among data mining experts:
http://www.kdnuggets.com/2011/05/tools-used-analytics-data-mining.html

KDnuggets Poll 2011: RapidMiner is the most widely used data mining solution among data mining experts world-wide.

別甾虛僞 2024-10-10 09:45:17

我是一个用于频繁模式挖掘的 Java 开源软件的作者。它提供了挖掘顺序模式、关联规则、频繁项集等的算法。

虽然它不是专门为文本挖掘而设计的,但其中一些算法可以应用于挖掘文本中的频繁模式。例如,如果您想查找在多个句子中经常一起出现的一些单词序列,您可以应用顺序模式挖掘算法。但要做到这一点,您需要在应用我的软件之前进行一些预处理,以便您的文本文件采用正确的格式。

您可以在这里检查该软件:
http://www.philippe-fournier-viger.com/spmf/

I'm the author of a Java open-source software for frequent pattern mining. It offers algorithms for mining sequential patterns, association rules, frequent itemsets, etc.

Although it is not specifically designed for text mining, some of the algorithms could be applied in to mine frequent patterns in text. For example, if you want to find some sequences of words that appear often together in several sentences you could apply a sequential pattern mining algorithm. But to do that you would need to to some pre-processing before applying my software so that your text file are in the proper format.

You can check the software here:
http://www.philippe-fournier-viger.com/spmf/

疯了 2024-10-10 09:45:17

Apache Mahout 提供了一系列流行的算法,这些算法也可以应用于文本数据,并且具有相当大的可扩展性! Apache UIMA 不提供数据挖掘算法,而是一个广泛用于自然语言处理的框架。

Apache Mahout offers a bunch of popular algorithms that can also be applied on textual data and is also quite scalable! Apache UIMA doesn't offer data mining algorithms but is a framework that is widely used in natural language processing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文