情感分析的最佳算法方法

发布于 2024-10-03 00:45:10 字数 1432 浏览 6 评论 0 原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

屋顶上的小猫咪 2024-10-10 00:45:10

我认为你的算法没有什么特别错误的地方,这是一种相当简单且实用的方法,但在很多情况下它会出错。

  1. 不明确的情感词 - “这个产品效果非常好”与“这个产品非常好”

  2. 错过否定 - “几百万年后我永远不会说这个产品值得购买”

  3. 引用/间接文本 - “我爸爸说这个产品很糟糕,但我不同意”

  4. 比较 - “这个产品就像头上的一个洞一样有用”

  5. 任何微妙的东西 - “这个产品丑陋、缓慢且缺乏灵感,但它是市场上唯一能完成这项工作的东西”

我使用产品评论作为示例,而不是新闻报道,但你明白我的意思了。事实上,新闻文章可能更难,因为它们经常试图展示争论的双方,并倾向于使用某种风格来传达观点。例如,最后一个例子在评论文章中很常见。

就 NLP 帮助您解决这些问题而言,词义消歧(甚至只是词性标记)可能有助于 (1),句法解析可能有助于解决(2)中的远程依赖关系,某种分块可能有助于 (3)。不过,这都是研究级别的工作,据我所知,没有什么可以直接使用的。问题(4)和(5)更难,我此时举手放弃。

我会坚持使用您所采用的方法,并仔细查看输出,看看它是否符合您的要求。当然,这就提出了一个问题,即你希望你首先理解“情感”的定义是什么……

I don't think there's anything particularly wrong with your algorithm, it's a fairly straightforward and practical way to go, but there are a lot of situations where it will get make mistakes.

  1. Ambiguous sentiment words - "This product works terribly" vs. "This product is terribly good"

  2. Missed negations - "I would never in a millions years say that this product is worth buying"

  3. Quoted/Indirect text - "My dad says this product is terrible, but I disagree"

  4. Comparisons - "This product is about as useful as a hole in the head"

  5. Anything subtle - "This product is ugly, slow and uninspiring, but it's the only thing on the market that does the job"

I'm using product reviews for examples instead of news stories, but you get the idea. In fact, news articles are probably harder because they will often try to show both sides of an argument and tend to use a certain style to convey a point. The final example is quite common in opinion pieces, for example.

As far as NLP helping you with any of this, word sense disambiguation (or even just part-of-speech tagging) may help with (1), syntactic parsing might help with the long range dependencies in (2), some kind of chunking might help with (3). It's all research level work though, there's nothing that I know of that you can directly use. Issues (4) and (5) are a lot harder, I throw up my hands and give up at this point.

I'd stick with the approach you have and look at the output carefully to see if it is doing what you want. Of course that then raises the issue of what you want you understand the definition of "sentiment" to be in the first place...

凡间太子 2024-10-10 00:45:10

我最喜欢的例子是“只读这本书”。它不包含明确的情感词,并且高度依赖于上下文。如果它出现在电影评论中,则意味着这部电影很糟糕,浪费你的时间,但这本书很好。然而,如果它是在书评中,它会传达积极的情绪。

那么“这是市场上最小的[移动]电话”呢?早在90年代,这是一个巨大的赞扬。今天它可能表明它太小了。

我认为这是了解情感分析复杂性的起点:http://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey.html(康奈尔大学的莉莲·李)。

my favorite example is "just read the book". it contains no explicit sentiment word and it is highly depending on the context. If it apears in a movie review it means that the-movie-sucks-it's-a-waste-of-your-time-but-the-book-is-good. However, if it is in a book review it delivers a positive sentiment.

And what about - "this is the smallest [mobile] phone in the market". back in the '90, it was a great praise. Today it may indicate that it is a way too small.

I think this is the place to start in order to get the complexity of sentiment analysis: http://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey.html (by Lillian Lee of Cornell).

信愁 2024-10-10 00:45:10

您可能会发现 OpinionFinder 系统和描述它的论文很有用。
它可以在 http://www.cs.pitt.edu/mpqa/ 与其他意见分析资源。

它超越了文档级别的极性分类,而是尝试在句子级别寻找个人观点。

You may find the OpinionFinder system and the papers describing it useful.
It is available at http://www.cs.pitt.edu/mpqa/ with other resources for opinion analysis.

It goes beyond polarity classification at the document level, but try to find individual opinions at the sentence level.

小苏打饼 2024-10-10 00:45:10

我相信您提到的所有问题的最佳答案是阅读刘兵教授的“情感分析与观点挖掘”书。这本书是情感分析领域最好的书。太棒了。只需看一下它,您就会找到所有“为什么”和“如何”问题的答案!

I believe the best answer to all of the questions that you mentioned is reading the book under the title of "Sentiment Analysis and opinion mining" by Professor Bing Liu. This book is the best of its own in the field of sentiment analysis. it is amazing. Just take a look at it and you will find the answer to all your 'why' and 'how' questions!

眼泪都笑了 2024-10-10 00:45:10

Machine-learning techniques are probably better.

Whitelaw, Garg, and Argamon have a technique that achieves 92% accuracy, using a technique similar to yours for dealing with negation, and support vector machines for text classification.

感情旳空白 2024-10-10 00:45:10

您为什么不尝试类似于 SpamAsassin 垃圾邮件过滤器的工作原理呢?内涵挖掘和观点挖掘之间确实没有太大区别。

Why don't you try something similar to how SpamAsassin spam filter works? There really not much difference between intension mining and opinion mining.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文