使用 NaiveBayes 实现期望最大化算法

发布于 2025-01-04 23:25:08 字数 242 浏览 2 评论 0原文

我已经实现了具有良好文本过滤的朴素贝叶斯文档分类，并且我已经接受了具有良好准确性的统计结果，我需要使用 EM 算法来增强我的结果。

但我不知道我是否可以将 EM 算法与朴素贝叶斯结果一起应用，或者将算法应用于数据并重新开始，因此我可以比较结果

在这两种情况下我都需要 >理解这个问题上的EM算法，因为它真的让我很困惑

任何解释清楚的文档将不胜感激

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何处潇湘 2025-01-11 23:25:08

EM 通常可以帮助您处理未标记的数据。如果你有一些未标记的数据，你基本上会在这样的循环中使用它。

estimate some initial parameters, perhaps even randomly
while not converged:
  relabel data using estimates
  update estimates using new labels

如果你正在进行监督学习，重新标记步骤会破坏你的标签，并且可能会使你的分类变得更糟。

另一方面，这个是关于用于文本分类的半监督朴素贝叶斯的一个很好的、详细的教程。如果您有一些小型标记文档集和大量未标记文档，则可以使用它们来估计初始参数，然后对未标记数据执行迭代步骤，最终得到更好的分类器。

EM generally helps you with unlabeled data. If you have some unlabeled data, you basically use it in a cycle like this

estimate some initial parameters, perhaps even randomly
while not converged:
  relabel data using estimates
  update estimates using new labels

If you are doing supervised learning, the relabel step is blowing away your labels, and is likely to make your classification worse.

On the other hand, this is a nice, detailed tutorial on semi-supervised naive bayes for text classification. If you have some small set of labelled documents and a large set of unlabeled documents, you can use them to estimate the initial parameters, and then do the iterative steps on unlabeled data, and end up with a better classifier.

回复收藏 0 原文

~没有更多了~