当前位置：文江博客话题详情

在 Java 中实现朴素贝叶斯算法 - 需要一些指导

发布于 2024-09-02 11:44:11 字数 1356 浏览 3 评论 0原文

作为学校作业，我需要实现朴素贝叶斯算法，我打算用 Java 来实现。

在试图理解它是如何完成的过程中，我读了《数据挖掘 - 实用机器学习工具和技术》一书，其中有一个关于这个主题的部分，但我仍然不确定阻碍我进步的一些主要观点。

由于我在这里寻求指导而不是解决方案，我会告诉你们我的想法，我认为正确的方法，并作为回报要求纠正/指导，这将非常感激。请注意，我是朴素贝叶斯算法、数据挖掘和一般编程的绝对初学者，因此您可能会看到下面愚蠢的评论/计算：

我给出的训练数据集有 4 个数字和标准化的属性/特征（在范围内） [0 1]）使用Weka（无缺失值）和一个名义类（是/否）

1）来自csv文件的数据是数字因此

(array class yes and array class no)

sum of the values in row / number of values in that row

(n-mean)^2/(2*SD^2),

P( yes | E)

P( no | E)

multiply the PDF value of all 4 given attributes and compare which is larger

在Java中，我正在使用ArrayList的ArrayList 和 Double 来存储属性值。

最后我不确定如何获取新数据？我应该要求输入文件（如 csv）还是命令提示符并要求 4 个值？

我现在就停在这里（确实还有更多问题），但我担心考虑到它需要多长时间，不会得到任何答复。我非常感谢那些花时间阅读我的问题和评论的人。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

绾颜 2024-09-09 11:44:11

你所做的几乎是正确的。

 + 然后找到 P( yes | E) 和 P( no | E) 我将所有 4 个给定属性的 PDF 值相乘，然后比较哪个较大，这表明它所属的类

在这里，您忘记乘以先前的 P（是）或 P（否）。记住决策公式：

P(Yes | E) ~= P(Attr_1 | Yes) * P(Attr_2 | Yes) * P(Attr_3 | Yes) * P(Attr_4 | Yes) * P(Yes)

对于朴素贝叶斯（以及任何其他监督学习/分类算法），您需要有训练数据和测试数据。您使用训练数据来训练模型并对测试数据进行预测。您可以简单地使用训练数据作为测试数据。或者您可以将 csv 文件分成两部分，一份用于训练，一份用于测试。您还可以对 csv 文件进行交叉验证。

What you are doing is almost correct.

         + Then to find P( yes | E) and P( no | E) i multiply the PDF value of all 4 given attributes and compare which is larger, which indicates the class it belongs to

Here, you forgot to multiply the prior P(yes) or P(no). Remember the decision formulae:

P(Yes | E) ~= P(Attr_1 | Yes) * P(Attr_2 | Yes) * P(Attr_3 | Yes) * P(Attr_4 | Yes) * P(Yes)

For Naive Bayes (and any other supervised learning/classification algorithms), you need to have training data and testing data. You use training data to train the model and do prediction on the testing data. You could simply use training data as testing data. Or you can split the csv file into two pieces, one for training and one for testing. You could also do cross validation on the csv file.

回复收藏 0 原文

~没有更多了~