NLP:定性“积极” 与“负”相对 句子

发布于 2024-07-06 15:40:40 字数 151 浏览 9 评论 0原文

我需要您的帮助来确定分析行业特定句子(即电影评论)“正面”与“负面”的最佳方法。 我以前见过 OpenNLP 等库,但它太底层了——它只给了我基本的句子构成; 我需要的是一个更高层次的结构: - 希望有单词表 - 希望可以根据我的数据集进行训练

谢谢!

I need your help in determining the best approach for analyzing industry-specific sentences (i.e. movie reviews) for "positive" vs "negative". I've seen libraries such as OpenNLP before, but it's too low-level - it just gives me the basic sentence composition; what I need is a higher-level structure:
- hopefully with wordlists
- hopefully trainable on my set of data

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

远山浅 2024-07-13 15:40:40

您正在寻找的内容通常被称为情绪分析。 通常,情感分析无法处理微妙的细节,例如讽刺或反讽,但如果你向其投入大量数据,它的效果会相当好。

情感分析通常需要大量的预处理。 至少是分词、句子边界检测和词性标注。 有时,句法分析可能很重要。 正确地做到这一点是计算语言学研究的一个完整分支,我不建议您提出自己的解决方案,除非您首先花时间研究该领域。

OpenNLP 有一些工具可以帮助情感分析,但如果您想要更严肃的东西,您应该查看 LingPipe工具包。 它有一些内置的 SA 功能和一个不错的教程。 您可以根据自己的数据集来训练它,但不要认为这完全是微不足道的:-)。

谷歌搜索这个词可能还会给你一些可以使用的资源。 如果您有任何更具体的问题,请询问,我正在密切关注 nlp-tag ;-)

What you are looking for is commonly dubbed Sentiment Analysis. Typically, sentiment analysis is not able to handle delicate subtleties, like sarcasm or irony, but it fares pretty well if you throw a large set of data at it.

Sentiment analysis usually needs quite a bit of pre-processing. At least tokenization, sentence boundary detection and part-of-speech tagging. Sometimes, syntactic parsing can be important. Doing it properly is an entire branch of research in computational linguistics, and I wouldn't advise you with coming up with your own solution unless you take your time to study the field first.

OpenNLP has some tools to aid sentiment analysis, but if you want something more serious, you should look into the LingPipe toolkit. It has some built-in SA-functionality and a nice tutorial. And you can train it on your own set of data, but don't think that it is entirely trivial :-).

Googling for the term will probably also give you some resources to work with. If you have any more specific question, just ask, I'm watching the nlp-tag closely ;-)

清君侧 2024-07-13 15:40:40

一些情感分析方法使用其他文本分类任务中流行的策略。 最常见的是将电影评论转换为词向量,并将其作为训练数据输入到分类器算法中。 最流行的数据挖掘包可以在这方面为您提供帮助。 您可以看看情感分类教程< /a> 说明如何使用开源 RapidMiner 工具包 进行实验。

顺便说一句,有一个良好的数据集可供使用研究目的与检测电影评论的意见有关。 它基于 IMDB 用户评论,您可以查看许多 该领域的相关研究工作以及他们如何使用数据集。

值得记住的是,这些方法的有效性只能从统计角度来判断,因此您几乎可以假设会存在错误分类和难以发现意见的情况。 正如本线程中已经注意到的那样,检测诸如讽刺和讽刺之类的内容确实非常困难。

Some approaches to sentiment analysis use strategies popular on other text classification tasks. The most common being transforming your film review into a word vector, and feeding it into a classifier algorithm as training data. Most popular data mining packages can help you here. You could have a look at this tutorial on sentiment classification illustrating how to do an experiment using the open source RapidMiner toolkit.

Incidentally, there is a good data set made available for research purposes related to detecting opinion on film reviews. It is based on IMDB user reviews, and you can check many related research work on the area and how they use the data set.

Its worth bearing in mind that the effectiveness of these methods can only be judged from a statistical viewpoint, so you can pretty much assume there will be misclassifications and cases where opinion is hard to detect. As already noticed in this thread, detecting things like irony and sarcasm can be very difficult indeed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文