选择哪种分类算法?

发布于 2024-10-17 11:20:44 字数 67 浏览 4 评论 0原文

我想将文本文档分为四类。我还有很多已经分类的样本,可以用于训练。我希望算法能够即时学习。请建议一个适合此要求的最佳算法。

I would like to classify text documents into four categories. Also I have lot of samples which are already classified that can be used for training. I would like the algorithm to learn on the fly.. please suggest an optimal algorithm that works for this requirement.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

云归处 2024-10-24 11:20:44

如果“即时”指的是在线学习(训练和分类可以交叉进行),我建议使用 k-最近邻算法。它可以在 Weka 和包 TiMBL 中找到。

感知器也能够做到这一点。

在这种情况下,“最佳”并不是一个明确定义的术语。

If by "on the fly" you mean online learning (where training and classification can be interleaved), I suggest the k-nearest neighbor algorithm. It's available in Weka and in the package TiMBL.

A perceptron will also be able to do this.

"Optimal" isn't a well-defined term in this context.

≈。彩虹 2024-10-24 11:20:44

有几种可以即时学习的算法。示例:k 最近邻、朴素贝叶斯、神经网络。您可以尝试这些方法在样本语料库上的适用程度。

there are several algorithms which can be learned on fly. Examples: k-nearest neighbors, naive Bayes, neural networks. You can try how appropriate each of these methods are on a sample corpus.

转身泪倾城 2024-10-24 11:20:44

由于您有未标记的数据,您可能希望使用有帮助的模型。我首先想到的是非线性 NCA:通过保留来学习非线性嵌入
类邻域结构,(Salakhutdinov,Hinton)

Since you have unlabeled data you might want to use a model where this helps. The first thing that comes to my mind is nonlinear NCA: Learning a Nonlinear Embedding by Preserving
Class Neighbourhood Structure, (Salakhutdinov, Hinton)
.

卷耳 2024-10-24 11:20:44

嗯......我不得不说文档分类与你们的想法有点不同。

通常,在文档分类中,经过预处理后,测试数据总是非常巨大,例如,O(N^2)...因此计算成本可能太高。

我想到的另一个典型分类器是判别分类器......它不需要数据集的生成模型。训练结束后,你要做的就是将你的单个条目放入算法中,它就会被分类。

祝你好运。例如,您可以查看 E. Alpadin 的书《机器学习简介》。

Well....I have to say that document classification is kind of different what you guys are thinking.

Typically, in document classification, after preprocessing, the test data is always extremely huge, for example, O(N^2)...Therefore it might be too computationally expensive.

The another typical classifier that came into my mind is discriminant classifier...which doesn't need the generative model for your dataset. After training, you have to do is to put your single entry to the algorithm, and it is gonna be classified.

Good luck with this. For example, you can check E. Alpadin's book, Introduction to Machine Learning.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文