使用 J48 和 IBk (KNN) 算法进行分类
我已经用许多不同类型的蘑菇录制了一张唱片。这些应该分为食用的和有毒的。分类必须使用 k 最近邻 (1) 和 J48 来执行。
两种算法的精度均为 99.88%。对我来说相关的是假阳性率。 J48 的税率为 0.3%,KNN 为 0%。所以我想说 KNN 更适合所选问题。
但是,我不知道为什么。对于某些记录,KNN 比 J48 更好,有一个通用的答案吗?
第二件事是我应该使用 10 倍交叉验证。那到底是什么?
提前致谢
I have given a record with many different types of mushrooms. These should be classified into edible and poisonous. The classification have to perform with k-nearest-neighbors (1) and J48.
Both algorithms shows a precision of 99.88%. Relevant for me is the false-positive rate. J48 has a rate of 0.3% and KNN of 0%. So I would say KNN is better suited for the chosen problem.
However, I dont know an answer why. Is there a general a answer why KNN is bether for some records than the J48?
The second thing is that I should use a 10-fold-cross-validation. What is that exatly?
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不。这在很大程度上取决于数据集、两种算法的设置以及您进行评估的方式(您确实使用了单独的训练和测试集,不是吗?)。
10 折交叉验证意味着:您将数据集分成 10 个大小相等的“折叠”,然后对于每个折叠 i
并取平均准确度。请参阅维基百科或任何有关机器学习的书籍。
No. It depends strongly on the dataset, the settings for both algorithms and the way you're doing the evaluation (you did use separate training and test sets, didn't you?).
10-fold cross validation means: you split your dataset in 10 equally-sized "folds", then for each of those folds i
and take the average accuracy. See Wikipedia or any book on machine learning.