使用 apache mahout 进行情感分析
我计划开发一个系统来预测给定文本的情绪(简称情绪分析)。
我也更喜欢 apache mahout,因为它是非常巨大的数据,并且系统应该是可实时扩展的。请向我推荐 apache mahout 提供的算法,该算法适用于情感分析。
I am planning to develop a system that would predict the mood of a given text(sentiment analysis in short).
I would also prefer apache mahout because, it is seriously huge data and the system should be scalable realtime. Kindly suggest me algorithms that apache mahout provides, which will be suitable for sentiment analysis.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您已经标记了训练数据,那么您可以尝试朴素贝叶斯分类器,这是最简单的监督学习之一现有的算法(并且由 Mahout 支持)。如果由于某种原因这还不够,那么您可以尝试更多复杂的算法,例如逻辑回归等。
如果您没有标记数据,那么您就不走运了 - 您将需要一些数据才能使其发挥作用(例如,通过雇用有人通过 Amazon's Mechanical Turk 为您标记您的数据)
顺便说一句,数据大小是多少我们在谈论什么? (如果它达到几百 GB,那么您不需要 hadoop/mahout 来训练此类模型 - 当然,除非您已经在 hadoop 中拥有该数据..)
If you have labeled training data then you could try Naive Bayes classifier which is one of the simplest supervised learning algorithms out there (and is supported by Mahout). If that is not sufficient for some reason then you could try more involved algorithms such as logistic regression etc.
If you don't have labeled data then you are out of luck - you will need to get some for this to work (e.g. by hiring someone to label your data for you via Amazon's Mechanical Turk)
By the way, what size of the data are we talking about? (if it is is up to a few hundred of gigabytes then you don't need hadoop/mahout to train this type of models - unless you have that data sitting in hadoop already of course..)