使用情绪分析来检测矛盾的论点?
我在情感分析或自然语言处理方面根本没有太多背景,但我在业余时间读了一些相关内容。我想进行和实验来分析论坛主题/评论,例如 reddit、digg、博客等。我特别感兴趣的是计算激烈的宗教和政治主题的支持、反对和中立评论的数量辩论。这就是我的想法。
1) 找到一条线索,表明原发帖者定义了敏感的政治或宗教话题。
2) 对于每条评论,将其归类为支持原始发布者或采取相反或中立立场。
3)比较各种媒介的支持或反对论点数量,以确定哪些平台是良好的“辩论平台”(即平衡论点数量)。
我预计的一个大问题是,激烈的话题会引起支持方和反对方的强烈反应,因此简单的快乐/悲伤情绪分析不会解决这个问题。我只是出于自己的好奇心对这个项目感兴趣,所以如果有人知道类似的研究或实用程序来进行这个实验,我有兴趣听到更多。
有人可以为这个任务推荐一个好的情感分析、词典、训练集等吗?
I don't have much background in sentiment analysis or natural language processing at all, but I have been reading a bit about it in my spare time. I would like to conduct and experiment to analyze forum threads/comments such as reddit, digg, blogs, etc. I'm particularity interested in doing something like counting the number of for, against, and neutral comments for threads of heated religious and political debates. Here's what I am thinking.
1) Find a thread that the original poster has defined a touchy political or religious topic.
2) For each comment categorize it as supporting the original poster or otherwise taking a contradicting or neutral stance.
3) Compare various mediums with the numbers of for or against arguments to determine what platforms are good "debate platforms" (i.e. balanced argument counts).
One big problem that I'm anticipating is that heated topics will invoke strong reactions from both supporting and contradicting parties so a simple happy/sad sentiment analysis won't cut it. I'm just sort of interested in this project for my own curiosities, so if anyone knows of similar research or utilities to conduct this experiment I'd be interested to hear more.
Can someone recommend a good sentiment analysis, word dictionary, training set, etc. for this task?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
恕我直言,如果不遇到语义,这是不可能的。考虑一下这句话:
您的人工智能可能需要识别惯用的子句,例如“不反对”或其他“不......”片段。这并非不可能;-)
另一个问题是,“not”或多或少是一个停用词,它的排名可能会在前 100 名,导致熵较低(尽管它对每个词都有很高的“语义”值)未使用的句子)。另请注意,省略“废除”也会导致句子的“极性”翻转。
IMHO this is not possible without running into semantics. Consider the sentence:
Your AI may need to recognise idiomatic subfrases like "not against", or other "not ..." snippets. This is not impossible ;-)
An additional problem is, that "not" is more or less a stopword, its rank will probably be in the top-100, causing a low entropy (though it has a high "semantic" value to every sentence where it is unsed). Also note that omitting "the abolishment of", will cause the "polarity" of the sentence to flip as well.
您可以尝试使用 词袋 [或者更好:使用 n-grams 作为标记放入袋子]
方法基本上是:
分类示例中的单词。
k-最近邻居来决定新评论是否是
支持/反对/中立。
另外,您可能还想看看 Apache Mahout。
You can try to use the bag of words [or even better: use n-grams as tokens to the bag]
The approach is basically:
words from the classified examples.
k-nearest neighbors to decide if the new comment is a
pro/against/neutral.
Also, you might want to have a look on Apache Mahout.