贝叶斯分类器的训练数据大小
我正在使用 apache mahout 在客户支持领域执行情绪分析。由于我无法获得适当的训练数据集,因此我自己制作了数据集。现在我有 100 封积极情绪支持邮件和 100 封消极情绪支持邮件。
但问题是,我无法达到准确性。它保持在 55% 左右,这是可悲的。大约 70% 左右的准确度就令人满意。另请注意,我正在使用 apache mahout 的免费朴素贝叶斯分类器。
准确来说,是否是数据集规模较小导致了准确性下降?如果不是,我应该在哪里调整?
I am using apache mahout for performing sentiment analysis in the customer support domain. Since I am not able to get a proper training data set, I made my own. Now I have 100 support mails for positive sentiment and 100 for negative.
But the problem is, I am not able to achieve accuracy. It stays somewhere around 55%, which is pathetic. Some 70% and around accuracy will be satisfactory. And also note that I am using a complimentary naive bayes classifier of apache mahout.
Coming to the question precisely, is it the smaller data set size that is bringing down the accuracy? If not, where should I tweak?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
只是为了那些将来研究这个问题的人的利益,我将分享我将分类器的准确性从 50% 调整到 78% 左右的方法 对
这应该会大大提高你的准确性。
Only for the benefit of those looking into this question in future, I will share the ways in which I tweaked the accuracy of my classifier from 50 to around 78%
This should dramatically raise your accuracy.