This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 months ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(5)
情感分析的关键挑战是:-
1) 命名实体识别 - 这个人实际上在谈论什么,例如 300 斯巴达是一群希腊人还是一部电影?
2) 照应解析——解决代词或名词短语所指代的问题。 “我们看了电影然后去吃晚饭;太糟糕了。” “它”指的是什么?
3)解析——句子的主语和宾语是什么,动词和/或形容词实际上指的是哪一个?
4) 讽刺——如果你不认识作者,你就不知道“坏”是指坏还是好。
5) Twitter - 缩写、缺少大写字母、拼写错误、标点符号错误、语法错误……
The key challenges for sentiment analysis are:-
1) Named Entity Recognition - What is the person actually talking about, e.g. is 300 Spartans a group of Greeks or a movie?
2) Anaphora Resolution - the problem of resolving what a pronoun, or a noun phrase refers to. "We watched the movie and went to dinner; it was awful." What does "It" refer to?
3) Parsing - What is the subject and object of the sentence, which one does the verb and/or adjective actually refer to?
4) Sarcasm - If you don't know the author you have no idea whether 'bad' means bad or good.
5) Twitter - abbreviations, lack of capitals, poor spelling, poor punctuation, poor grammar, ...
我同意 Hightechrider 的观点,即这些领域的情感分析准确性可以得到提高。我还要补充一点,情感分析在很大程度上往往是在封闭域文本上完成的。尝试在开放域文本上执行此操作通常会导致准确性非常差/F1 度量/您有什么,否则它是伪开放域,因为它只查看某些语法结构。因此,我想说主题敏感的情感分析可以识别上下文并据此做出决策,这是一个令人兴奋的研究领域(和行业产品)。
我还将他的第五点从 Twitter 扩展到其他社交媒体网站(例如 Facebook、Youtube),在这些网站上,简短、不合语法的话语很常见。
I agree with Hightechrider that those are areas where Sentiment Analysis accuracy can see improvement. I would also add that sentiment analysis tends to be done on closed-domain text for the most part. Attempts to do it on open domain text usually winds up having very bad accuracy/F1 measure/what have you or else it is pseudo-open-domain because it only looks at certain grammatical constructions. So I would say topic-sensitive sentiment analysis that can identify context and make decisions based on that is an exciting area for research (and industry products).
I'd also expand his 5th point from Twitter to other social media sites (e.g. Facebook, Youtube), where short, ungrammatical utterances are commonplace.
我认为答案是语言的复杂性、语法和拼写错误。人们表达观点的方式有很多种,例如,讽刺可能会被错误地解释为极其积极的情绪。
I think the answer is the language complexity, mistakes in grammar, and spelling. There is vast of ways people expresses there opinions, e.g., sarcasms could be wrongly interpreted as extremely positive sentiment.
这个问题可能太笼统了,因为情感分析有多种类型(文档级别、句子级别、比较情感分析等),每种类型都有一些特定的问题。
一般来说,我同意@Ian Mercer的回答,并且我会添加其他3个问题:
The question may be too generic, because there are several types of sentiment analysis (document level, sentence level, comparative sentiment analysis, etc.) and each type has some specific problems.
Generally speaking, I agree with the answer by @Ian Mercer, and I would add 3 other issues:
虽然这是一个有点老的问题,但让我具体添加一些与阿拉伯语情绪分析相关的注释。阿拉伯语具有复杂的形态和方言多样性,需要先进的预处理和词汇构建过程,这超出了英语的需要。
请参阅
Although this is a little bit an old question, let me add some note related to Arabic sentiment anlsysis in specific. Arabic language has morphological complexities and dialectal varieties which require advanced preprocessing and lexical building processes that surpass what is needed for the English language.
Please, refer to