当前位置：文江博客话题详情

我可以使用 NLTK 来确定评论是正面评论还是负面评论吗？

发布于 2024-09-14 03:31:36 字数 125 浏览 5 评论 0原文

您能否向我展示一个简单的示例，使用 http://www.nltk.org/code 来确定是否字符串表达快乐或不安的情绪？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

做个ˇ局外人 2024-09-21 03:31:36

NLTK 无法开箱即用，但如果您正在寻找该领域的一些相关研究，请查看攻击性语言检测。可以采用相同的方法来检测不是攻击性/非攻击性而是快乐/不快乐的评论。该项目中用于文本分类的主要软件包称为 WEKA并使用多个分类器，根据之前的示例进行训练，以确定语言是否具有攻击性（并且在此方法中使用可调阈值）。

回复收藏 0 原文

月牙弯弯 2024-09-21 03:31:36

Pattern 也是值得尝试的东西：你可以看到两个意见挖掘实验在项目主页上。

http://www.clips.ua.ac.be/pages/pattern -examples-100days

http://www.clips.ua .ac.be/pages/pattern-examples-elections

回复收藏 0 原文

半岛未凉 2024-09-21 03:31:36

不。

这是一项远远超出 NLTK 或任何已知或可以现实想象的语法解析器能力的任务。请参阅 NLTK 书籍，了解它的任务类型可以完成与你既定目标相去甚远的事情。

举一个便宜的例子：

我真的很喜欢用你的论文来训练我的狗。

使用 NLTK 对其进行解析，您可以得到

[('I', 'PRP'), ('really', 'RB'), ('enjoyed', 'VBD'), 
 ('using', 'VBG'), ('your', 'PRP
解析树会告诉我“enjoyed”是简单句子的中心（过去时）动词。享受某事是好的。训练一些东西通常是一件好事。动名词、名词、比较级等都是相对中性的。所以给这个评分 0.90。
但我的意思是，我要么用你的纸打我的狗，要么让它在纸上排泄，你可能会认为这不是一件好事。
雇用一个人来执行此识别任务。
 为那些认为经过训练的分类器也很有用的人添加了：
使用在您喜欢的任何数据集上训练的任何分类器，对来自真实客户评论语料库的真实条目进行分类：

该相机在以下时间内持续自动对焦

  自动模式并伴有蜂鸣声

  无法停止。真的会是

  如果他们有一个选择就好了

  停止这个自动对焦。如果你想

  获得日期和时间

  图像，这只能通过他们

  读取图像日期的软件

  以及来自图像元数据的时间。

  因此，如果您使用读卡器并且

  复制图像 - 你必须再一次

  通过他们的软件打开它们

  输入日期和时间。在那方面也是如此，

  没有直接的方法来添加日期

  和时间

  - 你必须说“打印图像”到另一个目录，其中有

  指定日期和时间的选项

  。即使是最轻微的晃动

  完全扭曲了你的形象。室内的

  图像不太清晰。你必须

  即使是“打开”闪光灯也能得到它

  你的房间光线很好。镜头盖是

  真烦人。电影剪辑

  拍摄时总会有一些“噪音”

  它 - 你无法避免这一点。

我获得的最糟糕的情绪分类是“完全模棱两可”，但人们可以很容易地确定这绝不是赞美。这不是随机挑选的数据，而是为没有“仇恨”或“suxz”或类似内容的负面偏见而选择的数据。
), ('paper', 'NN'), 
 ('to', 'TO'), ('train', 'VB'), ('my', 'PRP
解析树会告诉我“enjoyed”是简单句子的中心（过去时）动词。享受某事是好的。训练一些东西通常是一件好事。动名词、名词、比较级等都是相对中性的。所以给这个评分 0.90。
但我的意思是，我要么用你的纸打我的狗，要么让它在纸上排泄，你可能会认为这不是一件好事。
雇用一个人来执行此识别任务。
 为那些认为经过训练的分类器也很有用的人添加了：
使用在您喜欢的任何数据集上训练的任何分类器，对来自真实客户评论语料库的真实条目进行分类：

该相机在以下时间内持续自动对焦

  自动模式并伴有蜂鸣声

  无法停止。真的会是

  如果他们有一个选择就好了

  停止这个自动对焦。如果你想

  获得日期和时间

  图像，这只能通过他们

  读取图像日期的软件

  以及来自图像元数据的时间。

  因此，如果您使用读卡器并且

  复制图像 - 你必须再一次

  通过他们的软件打开它们

  输入日期和时间。在那方面也是如此，

  没有直接的方法来添加日期

  和时间

  - 你必须说“打印图像”到另一个目录，其中有

  指定日期和时间的选项

  。即使是最轻微的晃动

  完全扭曲了你的形象。室内的

  图像不太清晰。你必须

  即使是“打开”闪光灯也能得到它

  你的房间光线很好。镜头盖是

  真烦人。电影剪辑

  拍摄时总会有一些“噪音”

  它 - 你无法避免这一点。

我获得的最糟糕的情绪分类是“完全模棱两可”，但人们可以很容易地确定这绝不是赞美。这不是随机挑选的数据，而是为没有“仇恨”或“suxz”或类似内容的负面偏见而选择的数据。
), ('dog', 'NN')]

解析树会告诉我“enjoyed”是简单句子的中心（过去时）动词。享受某事是好的。训练一些东西通常是一件好事。动名词、名词、比较级等都是相对中性的。所以给这个评分 0.90。

但我的意思是，我要么用你的纸打我的狗，要么让它在纸上排泄，你可能会认为这不是一件好事。

雇用一个人来执行此识别任务。

为那些认为经过训练的分类器也很有用的人添加了：

使用在您喜欢的任何数据集上训练的任何分类器，对来自真实客户评论语料库的真实条目进行分类：

该相机在以下时间内持续自动对焦
自动模式并伴有蜂鸣声
无法停止。真的会是
如果他们有一个选择就好了
停止这个自动对焦。如果你想
获得日期和时间
图像，这只能通过他们
读取图像日期的软件
以及来自图像元数据的时间。
因此，如果您使用读卡器并且
复制图像 - 你必须再一次
通过他们的软件打开它们
输入日期和时间。在那方面也是如此，
没有直接的方法来添加日期
和时间
- 你必须说“打印图像”到另一个目录，其中有
指定日期和时间的选项
。即使是最轻微的晃动
完全扭曲了你的形象。室内的
图像不太清晰。你必须
即使是“打开”闪光灯也能得到它
你的房间光线很好。镜头盖是
真烦人。电影剪辑
拍摄时总会有一些“噪音”
它 - 你无法避免这一点。

我获得的最糟糕的情绪分类是“完全模棱两可”，但人们可以很容易地确定这绝不是赞美。这不是随机挑选的数据，而是为没有“仇恨”或“suxz”或类似内容的负面偏见而选择的数据。

Nopey.

This is a task far beyond the capabilities of NLTK or any grammatical parser that is known or can be realistically imagined. Look at the NLTK Book to see what sorts of tasks it can accomplish which are far, far from your stated purpose.

As a cheap example:

I really enjoyed using your paper to train my dog.

Parse that up with NLTK and you can get

[('I', 'PRP'), ('really', 'RB'), ('enjoyed', 'VBD'), 
 ('using', 'VBG'), ('your', 'PRP
Where the parse tree would tell me that 'enjoyed' is the central (past-tense) verb of the simple sentence. To enjoy something is good. To train something is generally a good thing. Gerunds, nouns, comparatives, and such are relatively neutral. So give this a Good score of 0.90.
Except I really mean that I either hit my dog with your paper or let it excrete on the paper which you'd probably consider a not Good thing.
Hire a person for this recognition task. 
Added for those who imagine that even trained classifiers are of much use:
Classify this real entry from a real customer review corpus using any classifier you like trained on any dataset you like:

This camera keeps on autofocussing in

  auto mode with a buzzing sound which

  can't be stopped. It would be really

  good if they have given an option to

  stop this autofocussing. If you want

  to have the date and time on the

  image, it's only through their

  software  which reads the image's date

  and time from the image's meta-data.

  So if you use your card reader and

  copy images - you got to once again

  open them through their software to

  put the date and time. In that too,

  there isn't a direct way to add date

  and time

  - you got to say 'print images' to a different directory in which there is

  an option to specify the date and time

  . Even the slightest of the shakes

  totally distorts your image. Indoor

  images weren't so clear. You got to

  have flash 'on' to get it even though

  your room is well lit. The lens cap is

  a really annoying. the movie clips

  taken will always have some 'noise' in

  it - you can't avoid that.

The worst mood classification I obtained was "totally equivocal" yet  humans can easily determine that this is anything but complimentary. This wasn't a randomly picked datum, rather one that was selected for negative bias without "hate" or "suxz" or similar. 
), ('paper', 'NN'), 
 ('to', 'TO'), ('train', 'VB'), ('my', 'PRP
Where the parse tree would tell me that 'enjoyed' is the central (past-tense) verb of the simple sentence. To enjoy something is good. To train something is generally a good thing. Gerunds, nouns, comparatives, and such are relatively neutral. So give this a Good score of 0.90.
Except I really mean that I either hit my dog with your paper or let it excrete on the paper which you'd probably consider a not Good thing.
Hire a person for this recognition task. 
Added for those who imagine that even trained classifiers are of much use:
Classify this real entry from a real customer review corpus using any classifier you like trained on any dataset you like:

This camera keeps on autofocussing in

  auto mode with a buzzing sound which

  can't be stopped. It would be really

  good if they have given an option to

  stop this autofocussing. If you want

  to have the date and time on the

  image, it's only through their

  software  which reads the image's date

  and time from the image's meta-data.

  So if you use your card reader and

  copy images - you got to once again

  open them through their software to

  put the date and time. In that too,

  there isn't a direct way to add date

  and time

  - you got to say 'print images' to a different directory in which there is

  an option to specify the date and time

  . Even the slightest of the shakes

  totally distorts your image. Indoor

  images weren't so clear. You got to

  have flash 'on' to get it even though

  your room is well lit. The lens cap is

  a really annoying. the movie clips

  taken will always have some 'noise' in

  it - you can't avoid that.

The worst mood classification I obtained was "totally equivocal" yet  humans can easily determine that this is anything but complimentary. This wasn't a randomly picked datum, rather one that was selected for negative bias without "hate" or "suxz" or similar. 
), ('dog', 'NN')]

Where the parse tree would tell me that 'enjoyed' is the central (past-tense) verb of the simple sentence. To enjoy something is good. To train something is generally a good thing. Gerunds, nouns, comparatives, and such are relatively neutral. So give this a Good score of 0.90.

Except I really mean that I either hit my dog with your paper or let it excrete on the paper which you'd probably consider a not Good thing.

Hire a person for this recognition task.

Added for those who imagine that even trained classifiers are of much use:

Classify this real entry from a real customer review corpus using any classifier you like trained on any dataset you like:

This camera keeps on autofocussing in
auto mode with a buzzing sound which
can't be stopped. It would be really
good if they have given an option to
stop this autofocussing. If you want
to have the date and time on the
image, it's only through their
software which reads the image's date
and time from the image's meta-data.
So if you use your card reader and
copy images - you got to once again
open them through their software to
put the date and time. In that too,
there isn't a direct way to add date
and time
- you got to say 'print images' to a different directory in which there is
an option to specify the date and time
. Even the slightest of the shakes
totally distorts your image. Indoor
images weren't so clear. You got to
have flash 'on' to get it even though
your room is well lit. The lens cap is
a really annoying. the movie clips
taken will always have some 'noise' in
it - you can't avoid that.

The worst mood classification I obtained was "totally equivocal" yet humans can easily determine that this is anything but complimentary. This wasn't a randomly picked datum, rather one that was selected for negative bias without "hate" or "suxz" or similar.

回复收藏 0 原文

故人如初 2024-09-21 03:31:36

您正在寻找一种使用机器学习分类器来确定一段文本是正面还是负面的技术。许多研究团队对此进行了各种不同的尝试（例如http://research.yahoo。 com/pub/2387 和 http://lingcog.iit.edu/doc/ valuation_sentiment_cikm.pdf）我们在确定产品评论是正面还是负面时的准确度约为 80% 到 90%。

由于您的问题很简短，我不清楚确定产品评论是正面还是负面是否与您想要完成的任务相同，或者仅仅是一项相关任务，但我建议从简单的 bag 开始 -使用贝叶斯分类器（NLTK 应该能够处理）进行词内分类，然后根据结果的准确性改进您的技术。

不幸的是，我从未使用过 NLTK（也没有使用过 Python），因此我无法为您提供如何使用 NLTK 的代码示例。

回复收藏 0 原文

~没有更多了~