简单的情感分析
看来进行基本情感分析的最简单、最天真的方法是使用贝叶斯分类器(我在 SO 上找到的内容证实了这一点)。 有什么反驳或其他建议吗?
It appears that the simplest, naivest way to do basic sentiment analysis is with a Bayesian classifier (confirmed by what I'm finding here on SO). Any counter-arguments or other suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
具有词袋表示的贝叶斯分类器是最简单的统计方法。 通过转向更高级的分类器和特征表示,您可以获得更好的结果,但代价是更加复杂。
统计方法并不是唯一的方法。 另一个主要选择是基于规则的方法,可以更好地理解文本的结构。 据我所知,这些方法实际上并不像统计方法那样有效。
我推荐 Manning 和 Schütze 的《统计自然语言处理基础》第 16 章“文本分类”。
A Bayesian classifier with a bag of words representation is the simplest statistical method. You can get significantly better results by moving to more advanced classifiers and feature representation, at the cost of more complexity.
Statistical methods aren't the only game in town. Rule based methods that have more understanding of the structure of the text are the other main option. From what I have seen, these don't actually perform as well as statistical methods.
I recommend Manning and Schütze's Foundations of Statistical Natural Language Processing chapter 16, Text Categorization.
我想不出更简单、更朴素的方法来进行情感分析,但您可能会考虑使用支持向量机而不是朴素贝叶斯(在某些机器学习工具包中,这可以是直接替代)。 看看“竖起大拇指?使用机器学习技术进行情感分类”由 Bo Pang、Lillian Lee 和 Shivakumar Vaithyanathan 撰写,这是关于这些技术的最早的论文之一,并给出了一系列相关技术的准确度结果的良好表格,其中没有一个更复杂(来自客户视角)比其他任何人都重要。
I can't think of a simpler, more naive way to do Sentiment Analysis, but you might consider using a Support Vector Machine instead of Naive Bayes (in some machine learning toolkits, this can be a drop-in replacement). Have a look at "Thumbs up? Sentiment Classification using Machine Learning Techniques" by Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan which was one of the earliest papers on these techniques, and gives a good table of accuracy results on a family of related techniques, none of which are any more complicated (from a client perspective) than any of the others.
基于 Ken 上面提供的答案,
Tony 和 Niger 发表了另一篇论文“使用具有不同信息源的支持向量机进行情感分析”,
该论文着眼于分配更多的特征,而不仅仅是 Pang 和 Lee 使用的一袋词。 在这里,他们利用 wordnet 来确定形容词的语义差异以及文本中主题的情感接近程度,作为 SVM 的附加功能。 与之前基于情感对文本进行分类的尝试相比,它们显示出更好的结果。
Building upon the answer provided by Ken above, there is another paper
"Sentiment analysis using support vector machines with diverse information sources" by Tony and Niger,
which looks at assigning more features than just a bag of words used by Pang and Lee. Here, they leverage wordnet to determine semantic differentiation of adjectives, and proximity of the sentiment towards the topic in the text, as additional features for SVM. They show better results than previous attempts to classify text based on sentiment.