我可以使用 NLTK 来确定评论是正面评论还是负面评论吗?
您能否向我展示一个简单的示例,使用 http://www.nltk.org/code 来确定是否字符串表达快乐或不安的情绪?
Can you show me a simple example using http://www.nltk.org/code to determine if a string about a happy or upset mood?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
NLTK 无法开箱即用,但如果您正在寻找该领域的一些相关研究,请查看攻击性语言检测。可以采用相同的方法来检测不是攻击性/非攻击性而是快乐/不快乐的评论。该项目中用于文本分类的主要软件包称为 WEKA并使用多个分类器,根据之前的示例进行训练,以确定语言是否具有攻击性(并且在此方法中使用可调阈值)。
NLTK cannot out of the box, but if you are looking for some related research on that area, take a look at this paper on Offensive Language Detection. The same methods could be adapted to detect comments which are not offensive/unoffensive, but instead happy/unhappy. The primary software package being used in this project for text classification is called WEKA and uses multiple classifiers, trained on previous examples, to determine whether language is offensive or not (and in this method uses a tunable threshold).
Pattern 也是值得尝试的东西:你可以看到两个意见挖掘实验在项目主页上。
http://www.clips.ua.ac.be/pages/pattern -examples-100days
http://www.clips.ua .ac.be/pages/pattern-examples-elections
Pattern is something worthwhile a test drive too: you can see two opinion mining experiments right on the project homepage.
http://www.clips.ua.ac.be/pages/pattern-examples-100days
http://www.clips.ua.ac.be/pages/pattern-examples-elections
不。
这是一项远远超出 NLTK 或任何已知或可以现实想象的语法解析器能力的任务。请参阅 NLTK 书籍,了解它的任务类型可以完成与你既定目标相去甚远的事情。
举一个便宜的例子:
使用 NLTK 对其进行解析,您可以得到
解析树会告诉我“enjoyed”是简单句子的中心(过去时)动词。享受某事是好的。训练一些东西通常是一件好事。动名词、名词、比较级等都是相对中性的。所以给这个评分 0.90。
但我的意思是,我要么用你的纸打我的狗,要么让它在纸上排泄,你可能会认为这不是一件好事。
雇用一个人来执行此识别任务。
为那些认为经过训练的分类器也很有用的人添加了:
使用在您喜欢的任何数据集上训练的任何分类器,对来自真实客户评论语料库的真实条目进行分类:
我获得的最糟糕的情绪分类是“完全模棱两可”,但人们可以很容易地确定这绝不是赞美。这不是随机挑选的数据,而是为没有“仇恨”或“suxz”或类似内容的负面偏见而选择的数据。
Nopey.
This is a task far beyond the capabilities of NLTK or any grammatical parser that is known or can be realistically imagined. Look at the NLTK Book to see what sorts of tasks it can accomplish which are far, far from your stated purpose.
As a cheap example:
Parse that up with NLTK and you can get
Where the parse tree would tell me that 'enjoyed' is the central (past-tense) verb of the simple sentence. To enjoy something is good. To train something is generally a good thing. Gerunds, nouns, comparatives, and such are relatively neutral. So give this a Good score of 0.90.
Except I really mean that I either hit my dog with your paper or let it excrete on the paper which you'd probably consider a not Good thing.
Hire a person for this recognition task.
Added for those who imagine that even trained classifiers are of much use:
Classify this real entry from a real customer review corpus using any classifier you like trained on any dataset you like:
The worst mood classification I obtained was "totally equivocal" yet humans can easily determine that this is anything but complimentary. This wasn't a randomly picked datum, rather one that was selected for negative bias without "hate" or "suxz" or similar.
您正在寻找一种使用机器学习分类器来确定一段文本是正面还是负面的技术。许多研究团队对此进行了各种不同的尝试(例如http://research.yahoo。 com/pub/2387 和 http://lingcog.iit.edu/doc/ valuation_sentiment_cikm.pdf)我们在确定产品评论是正面还是负面时的准确度约为 80% 到 90%。
由于您的问题很简短,我不清楚确定产品评论是正面还是负面是否与您想要完成的任务相同,或者仅仅是一项相关任务,但我建议从简单的 bag 开始 -使用贝叶斯分类器(NLTK 应该能够处理)进行词内分类,然后根据结果的准确性改进您的技术。
不幸的是,我从未使用过 NLTK(也没有使用过 Python),因此我无法为您提供如何使用 NLTK 的代码示例。
You're looking for a technique that uses a machine learning classifier to determine whether a piece of text is positive or negative. There have been various different attempts at this by a number of research teams (e.g. http://research.yahoo.com/pub/2387 and http://lingcog.iit.edu/doc/appraisal_sentiment_cikm.pdf) we can get about 80% to 90% accuracy at determining whether a product review is positive or negative.
Due to the brevity of your question, it's not obvious to me whether determining whether a product review is positive or negative is the same task you're trying to accomplish, or merely a related task, but I'd suggest starting simple with bag-of-words classification with a Bayesian classifier (which NLTK should be able to handle), and then improve your techniques from there depending on how the accuracy turns out.
Unfortunately, I've never used NLTK (nor Python for that matter) so I can't give you a code example of how to use NLTK for this.