如何以编程方式挖掘 tweeter sql 转储。
我有一个高音 mysql 转储。 我想在这个转储上建立一个分类器。 我想知道是否有可用的软件包以及我应该使用什么类型的分类器。 我想使用java构建这个分类器。
I have a tweeter mysql dump.
I want to build a classifier on this dump.
I want to know whether there are packages available which i can use and what type of classifier i should use.
I want to build this classifier using java.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我建议您使用 WEKA: http://www.cs.waikato.ac.nz /ml/weka/ -- WEKA 包含大量数据挖掘算法和实用程序。
它有一个 GUI,您可以在其中试验数据上分类器和过滤器的各种配置和组合,当您构建了一个好的模型时,您可以将 WEKA 嵌入到您的 java 程序中(它也是 java),并将其与预先构建的模型来预测类别,或使用它来不断完善模型。或者在使用 WEKA 进行实验后,您可以在自己的应用程序中实现生成的决策树或其他内容,这样您就不必包含 WEKA。
您可能想要使用推文的“词袋”表示,并使用多层感知器、朴素贝叶斯或 J48 等分类器——所有这些都可以在 WEKA 中进行实验。
查看此页面:http://weka.wikispaces.com/Text+categorization+with+ WEKA——页面底部有一个文本分类的示例。
干杯,
I would suggest you use WEKA: http://www.cs.waikato.ac.nz/ml/weka/ -- WEKA contains a large number of data mining algorithms and utilities.
It has a GUI where you can experiment with various configurations and combinations of classifiers and filters on your data, and when you have built a good model, you can either embed WEKA in your java program (it is also java), and use it with a pre-built model to predict class, or use it to continuously refine the model. Or after using WEKA to experiment, you can implement the resulting decision tree or whatever in your own application so you don't have to include WEKA.
You probably want to use the 'bag-of-words' representation of the tweets, and use a classifier such as multilayer-perceptron, naive-bayes or J48 -- all available to experiment with in WEKA.
Check out this page: http://weka.wikispaces.com/Text+categorization+with+WEKA -- it has an example of text categorization at the bottom of the page.
Cheers,
http://mloss.org/software/downloads/
这个链接有一些包。 (与机器学习相关)
这是为那些可能有兴趣做同样事情的人准备的。
因此回答我自己的问题。
享受。
http://mloss.org/software/downloads/
This link have some packages. (related to Machine Learning )
This is for some one who might be interested in doing the same.
Hence answering my own question.
Enjoy.