我可以使用 NLTK 来确定评论是正面评论还是负面评论吗?

发布于 2024-09-14 03:31:36 字数 125 浏览 5 评论 0原文

您能否向我展示一个简单的示例,使用 http://www.nltk.org/code 来确定是否字符串表达快乐或不安的情绪?

Can you show me a simple example using http://www.nltk.org/code to determine if a string about a happy or upset mood?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

做个ˇ局外人 2024-09-21 03:31:36

NLTK 无法开箱即用,但如果您正在寻找该领域的一些相关研究,请查看攻击性语言检测。可以采用相同的方法来检测不是攻击性/非攻击性而是快乐/不快乐的评论。该项目中用于文本分类的主要软件包称为 WEKA并使用多个分类器,根据之前的示例进行训练,以确定语言是否具有攻击性(并且在此方法中使用可调阈值)。

NLTK cannot out of the box, but if you are looking for some related research on that area, take a look at this paper on Offensive Language Detection. The same methods could be adapted to detect comments which are not offensive/unoffensive, but instead happy/unhappy. The primary software package being used in this project for text classification is called WEKA and uses multiple classifiers, trained on previous examples, to determine whether language is offensive or not (and in this method uses a tunable threshold).

月牙弯弯 2024-09-21 03:31:36

Pattern 也是值得尝试的东西:你可以看到两个意见挖掘实验在项目主页上。

http://www.clips.ua.ac.be/pages/pattern -examples-100days

http://www.clips.ua .ac.be/pages/pattern-examples-elections

Pattern is something worthwhile a test drive too: you can see two opinion mining experiments right on the project homepage.

http://www.clips.ua.ac.be/pages/pattern-examples-100days

http://www.clips.ua.ac.be/pages/pattern-examples-elections

半岛未凉 2024-09-21 03:31:36

不。

这是一项远远超出 NLTK 或任何已知或可以现实想象的语法解析器能力的任务。请参阅 NLTK 书籍,了解它的任务类型可以完成与你既定目标相去甚远的事情。

举一个便宜的例子:

我真的很喜欢用你的论文来训练我的狗。

使用 NLTK 对其进行解析,您可以得到

[('I', 'PRP'), ('really', 'RB'), ('enjoyed', 'VBD'), 
 ('using', 'VBG'), ('your', 'PRP

解析树会告诉我“enjoyed”是简单句子的中心(过去时)动词。享受某事是好的。训练一些东西通常是一件好事。动名词、名词、比较级等都是相对中性的。所以给这个评分 0.90。

但我的意思是,我要么用你的纸打我的狗,要么让它在纸上排泄,你可能会认为这不是一件好事。

雇用一个人来执行此识别任务。

为那些认为经过训练的分类器也很有用的人添加了

使用在您喜欢的任何数据集上训练的任何分类器,对来自真实客户评论语料库的真实条目进行分类:

该相机在以下时间内持续自动对焦
自动模式并伴有蜂鸣声
无法停止。真的会是
如果他们有一个选择就好了
停止这个自动对焦。如果你想
获得日期和时间
图像,这只能通过他们
读取图像日期的软件
以及来自图像元数据的时间。
因此,如果您使用读卡器并且
复制图像 - 你必须再一次
通过他们的软件打开它们
输入日期和时间。在那方面也是如此,
没有直接的方法来添加日期
和时间
- 你必须说“打印图像”到另一个目录,其中有
指定日期和时间的选项
。即使是最轻微的晃动
完全扭曲了你的形象。室内的
图像不太清晰。你必须
即使是“打开”闪光灯也能得到它
你的房间光线很好。镜头盖是
真烦人。电影剪辑
拍摄时总会有一些“噪音”
它 - 你无法避免这一点。

我获得的最糟糕的情绪分类是“完全模棱两可”,但人们可以很容易地确定这绝不是赞美。这不是随机挑选的数据,而是为没有“仇恨”或“suxz”或类似内容的负面偏见而选择的数据。

), ('paper', 'NN'), ('to', 'TO'), ('train', 'VB'), ('my', 'PRP

解析树会告诉我“enjoyed”是简单句子的中心(过去时)动词。享受某事是好的。训练一些东西通常是一件好事。动名词、名词、比较级等都是相对中性的。所以给这个评分 0.90。

但我的意思是,我要么用你的纸打我的狗,要么让它在纸上排泄,你可能会认为这不是一件好事。

雇用一个人来执行此识别任务。

为那些认为经过训练的分类器也很有用的人添加了

使用在您喜欢的任何数据集上训练的任何分类器,对来自真实客户评论语料库的真实条目进行分类:

该相机在以下时间内持续自动对焦
自动模式并伴有蜂鸣声
无法停止。真的会是
如果他们有一个选择就好了
停止这个自动对焦。如果你想
获得日期和时间
图像,这只能通过他们
读取图像日期的软件
以及来自图像元数据的时间。
因此,如果您使用读卡器并且
复制图像 - 你必须再一次
通过他们的软件打开它们
输入日期和时间。在那方面也是如此,
没有直接的方法来添加日期
和时间
- 你必须说“打印图像”到另一个目录,其中有
指定日期和时间的选项
。即使是最轻微的晃动
完全扭曲了你的形象。室内的
图像不太清晰。你必须
即使是“打开”闪光灯也能得到它
你的房间光线很好。镜头盖是
真烦人。电影剪辑
拍摄时总会有一些“噪音”
它 - 你无法避免这一点。

我获得的最糟糕的情绪分类是“完全模棱两可”,但人们可以很容易地确定这绝不是赞美。这不是随机挑选的数据,而是为没有“仇恨”或“suxz”或类似内容的负面偏见而选择的数据。

), ('dog', 'NN')]

解析树会告诉我“enjoyed”是简单句子的中心(过去时)动词。享受某事是好的。训练一些东西通常是一件好事。动名词、名词、比较级等都是相对中性的。所以给这个评分 0.90。

但我的意思是,我要么用你的纸打我的狗,要么让它在纸上排泄,你可能会认为这不是一件好事。

雇用一个人来执行此识别任务。

为那些认为经过训练的分类器也很有用的人添加了

使用在您喜欢的任何数据集上训练的任何分类器,对来自真实客户评论语料库的真实条目进行分类:

该相机在以下时间内持续自动对焦
自动模式并伴有蜂鸣声
无法停止。真的会是
如果他们有一个选择就好了
停止这个自动对焦。如果你想
获得日期和时间
图像,这只能通过他们
读取图像日期的软件
以及来自图像元数据的时间。
因此,如果您使用读卡器并且
复制图像 - 你必须再一次
通过他们的软件打开它们
输入日期和时间。在那方面也是如此,
没有直接的方法来添加日期
和时间
- 你必须说“打印图像”到另一个目录,其中有
指定日期和时间的选项
。即使是最轻微的晃动
完全扭曲了你的形象。室内的
图像不太清晰。你必须
即使是“打开”闪光灯也能得到它
你的房间光线很好。镜头盖是
真烦人。电影剪辑
拍摄时总会有一些“噪音”
它 - 你无法避免这一点。

我获得的最糟糕的情绪分类是“完全模棱两可”,但人们可以很容易地确定这绝不是赞美。这不是随机挑选的数据,而是为没有“仇恨”或“suxz”或类似内容的负面偏见而选择的数据。

Nopey.

This is a task far beyond the capabilities of NLTK or any grammatical parser that is known or can be realistically imagined. Look at the NLTK Book to see what sorts of tasks it can accomplish which are far, far from your stated purpose.

As a cheap example:

I really enjoyed using your paper to train my dog.

Parse that up with NLTK and you can get

[('I', 'PRP'), ('really', 'RB'), ('enjoyed', 'VBD'), 
 ('using', 'VBG'), ('your', 'PRP

Where the parse tree would tell me that 'enjoyed' is the central (past-tense) verb of the simple sentence. To enjoy something is good. To train something is generally a good thing. Gerunds, nouns, comparatives, and such are relatively neutral. So give this a Good score of 0.90.

Except I really mean that I either hit my dog with your paper or let it excrete on the paper which you'd probably consider a not Good thing.

Hire a person for this recognition task.

Added for those who imagine that even trained classifiers are of much use:

Classify this real entry from a real customer review corpus using any classifier you like trained on any dataset you like:

This camera keeps on autofocussing in
auto mode with a buzzing sound which
can't be stopped. It would be really
good if they have given an option to
stop this autofocussing. If you want
to have the date and time on the
image, it's only through their
software which reads the image's date
and time from the image's meta-data.
So if you use your card reader and
copy images - you got to once again
open them through their software to
put the date and time. In that too,
there isn't a direct way to add date
and time
- you got to say 'print images' to a different directory in which there is
an option to specify the date and time
. Even the slightest of the shakes
totally distorts your image. Indoor
images weren't so clear. You got to
have flash 'on' to get it even though
your room is well lit. The lens cap is
a really annoying. the movie clips
taken will always have some 'noise' in
it - you can't avoid that.

The worst mood classification I obtained was "totally equivocal" yet humans can easily determine that this is anything but complimentary. This wasn't a randomly picked datum, rather one that was selected for negative bias without "hate" or "suxz" or similar.

), ('paper', 'NN'), ('to', 'TO'), ('train', 'VB'), ('my', 'PRP

Where the parse tree would tell me that 'enjoyed' is the central (past-tense) verb of the simple sentence. To enjoy something is good. To train something is generally a good thing. Gerunds, nouns, comparatives, and such are relatively neutral. So give this a Good score of 0.90.

Except I really mean that I either hit my dog with your paper or let it excrete on the paper which you'd probably consider a not Good thing.

Hire a person for this recognition task.

Added for those who imagine that even trained classifiers are of much use:

Classify this real entry from a real customer review corpus using any classifier you like trained on any dataset you like:

This camera keeps on autofocussing in
auto mode with a buzzing sound which
can't be stopped. It would be really
good if they have given an option to
stop this autofocussing. If you want
to have the date and time on the
image, it's only through their
software which reads the image's date
and time from the image's meta-data.
So if you use your card reader and
copy images - you got to once again
open them through their software to
put the date and time. In that too,
there isn't a direct way to add date
and time
- you got to say 'print images' to a different directory in which there is
an option to specify the date and time
. Even the slightest of the shakes
totally distorts your image. Indoor
images weren't so clear. You got to
have flash 'on' to get it even though
your room is well lit. The lens cap is
a really annoying. the movie clips
taken will always have some 'noise' in
it - you can't avoid that.

The worst mood classification I obtained was "totally equivocal" yet humans can easily determine that this is anything but complimentary. This wasn't a randomly picked datum, rather one that was selected for negative bias without "hate" or "suxz" or similar.

), ('dog', 'NN')]

Where the parse tree would tell me that 'enjoyed' is the central (past-tense) verb of the simple sentence. To enjoy something is good. To train something is generally a good thing. Gerunds, nouns, comparatives, and such are relatively neutral. So give this a Good score of 0.90.

Except I really mean that I either hit my dog with your paper or let it excrete on the paper which you'd probably consider a not Good thing.

Hire a person for this recognition task.

Added for those who imagine that even trained classifiers are of much use:

Classify this real entry from a real customer review corpus using any classifier you like trained on any dataset you like:

This camera keeps on autofocussing in
auto mode with a buzzing sound which
can't be stopped. It would be really
good if they have given an option to
stop this autofocussing. If you want
to have the date and time on the
image, it's only through their
software which reads the image's date
and time from the image's meta-data.
So if you use your card reader and
copy images - you got to once again
open them through their software to
put the date and time. In that too,
there isn't a direct way to add date
and time
- you got to say 'print images' to a different directory in which there is
an option to specify the date and time
. Even the slightest of the shakes
totally distorts your image. Indoor
images weren't so clear. You got to
have flash 'on' to get it even though
your room is well lit. The lens cap is
a really annoying. the movie clips
taken will always have some 'noise' in
it - you can't avoid that.

The worst mood classification I obtained was "totally equivocal" yet humans can easily determine that this is anything but complimentary. This wasn't a randomly picked datum, rather one that was selected for negative bias without "hate" or "suxz" or similar.

故人如初 2024-09-21 03:31:36

您正在寻找一种使用机器学习分类器来确定一段文本是正面还是负面的技术。许多研究团队对此进行了各种不同的尝试(例如http://research.yahoo。 com/pub/2387http://lingcog.iit.edu/doc/ valuation_sentiment_cikm.pdf)我们在确定产品评论是正面还是负面时的准确度约为 80% 到 90%。

由于您的问题很简短,我不清楚确定产品评论是正面还是负面是否与您想要完成的任务相同,或者仅仅是一项相关任务,但我建议从简单的 bag 开始 -使用贝叶斯分类器(NLTK 应该能够处理)进行词内分类,然后根据结果的准确性改进您的技术。

不幸的是,我从未使用过 NLTK(也没有使用过 Python),因此我无法为您提供如何使用 NLTK 的代码示例。

You're looking for a technique that uses a machine learning classifier to determine whether a piece of text is positive or negative. There have been various different attempts at this by a number of research teams (e.g. http://research.yahoo.com/pub/2387 and http://lingcog.iit.edu/doc/appraisal_sentiment_cikm.pdf) we can get about 80% to 90% accuracy at determining whether a product review is positive or negative.

Due to the brevity of your question, it's not obvious to me whether determining whether a product review is positive or negative is the same task you're trying to accomplish, or merely a related task, but I'd suggest starting simple with bag-of-words classification with a Bayesian classifier (which NLTK should be able to handle), and then improve your techniques from there depending on how the accuracy turns out.

Unfortunately, I've never used NLTK (nor Python for that matter) so I can't give you a code example of how to use NLTK for this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文