按心情存储句子

发布于 2024-11-27 01:56:31 字数 93 浏览 3 评论 0原文

让我们从一个简单的问题开始。假设我有一个 350 个字符的句子,并且希望将该句子放入“好心情”桶或“坏心情”桶中。

设计一个算法来存储句子的最佳方法是什么?

Let's start with a simple problem. Let's say that I have a 350 char sentence and would like to bucket the sentence into either a "Good mood" bucket or a "Bad mood" bucket.

What would be the best way to design an algorithm to bucket the sentence?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

溇涏 2024-12-04 01:56:31

按语气对一堆句子进行手工分类。然后将它们输入朴素贝叶斯分类器。使用类似 SpamBayes 作为起点。

Hand-classify a bunch of sentences by mood. Then feed these into a naive Bayes classifier. Use something like SpamBayes as a starting point.

骄傲 2024-12-04 01:56:31

一个简单/天真的建议是首先将每个句子分成单独的单词,或者使用正则表达式并从“肯定”列表中扫描特定单词(例如“喜欢”、“快乐”、“可以”、“做”) ”等)和“负面”列表(“不喜欢”、“悲伤”、“不能”、“不”),找出每个句子中哪个更普遍,并相应地对其进行分类。

根据您的要求和数据集,这可能就足够了,或者您可能想研究更高级的技术,例如 贝叶斯过滤

A simple/naive suggestion would be to either first split each sentence down into individual words, or use a regex and scan for specific words from both a "positive" list (e.g. "like", "happy", "can", "do", etc) and a "negative" list ("dislike", "sad", "can't", "don't"), work out which is more prevalent in each sentence, and bucket it accordingly.

Depending on your requirements and data-set this may be adequate, or you might want to investigate more advanced techniques like Bayesian filtering.

一生独一 2024-12-04 01:56:31

根据句子的领域和所需的准确性,这可能是一个极其困难的问题。围绕情感分析的学术论文很多;一个好的开始可能是这里 - 一篇简短而经典的论文。

我建议采取的步骤将逐渐产生越来越好的分类器:

  1. 对一些文档进行手动分类,并使用它们来训练现成的算法。我建议使用SVM(例如使用WEKA中的LibSVM,或SVMLight),但如上所述,朴素贝叶斯或决策树也可能有效。

  2. 对更多文档进行手动分类,并从基于一元模型的模型转变为更复杂的模型,例如基于二元模型或词性模型。这可以通过 TagHelper 工具轻松完成,该工具将获取您的文本并使用这些技术将它们转换为 WEKA 就绪文件。这将为每个术语的情绪添加一些上下文(例如“不”和“坏”与“不错”)。

  3. 最后,您可以添加自定义规则和字典,这将为您的算法添加特定于领域的知识。它们可能表示为同一分类引擎的附加功能,或附加的分类步骤。

Depending on the domain of the sentences and on the required accuracy, this might be an extremely hard problem. There are many academic papers around sentiment analysis; a good start might be here - a short and classic paper.

The steps I'd suggest to take, would gradually lead to a better and better classifier:

  1. Hand classify some documents, and use them to train a ready made algorithm. I'd suggest using SVM (e.g. using LibSVM in WEKA, or SVMLight), but Naive bayes or decision trees, as suggested above, might work too.

  2. Hand classify some more documents, and move from a unigram-based model to a more sophisticated one, e.g. bigram or parts-of-speech based. This can be done quite easily with TagHelper tools, which will take your texts and transform them to WEKA-ready files using these techniques. This will add some context to the mood of each term (eg "not" and "bad" vs. "not bad").

  3. Finally, you can add custom made rules and dictionaries, which would add domain-specific knowledge to your algorithm. They might be represented as additional features for the same classification engine, or as an additional classification step.

何止钟意 2024-12-04 01:56:31

这称为情感分析,维基百科文章对可用技术有很好的描述。一个简单的方法是使用 Google Prediction API,并使用一组积极、消极和中性的情感句子。

This is called Sentiment Analysis, and the Wikipedia article has a good description of available techniques. One easy way out would be to use the Google Prediction API, and train it with a set of positive, negative, and neutral sentiment sentences.

2024-12-04 01:56:31

您可以使用 Weka 工具来训练一些运行良好的分类器在你的情况下。我建议尝试 J48 算法,我相信它是一种实现用于训练决策树的 C4.5 算法。

You can play around with the Weka tool to train some classifier that will work well in your case. I would recommend trying the J48 algorithm which I believe is an implementation of the C4.5 algorithm for training decision trees.

任性一次 2024-12-04 01:56:31

尝试从一堆这样的句子中进行机器学习。使用一些功能,例如表情符号作为情绪指示器。观察质量并添加/修改您的功能集。

Try machine learning from a bunch of such sentences. Use some features, for example smilies as indicators of mood. Observe the quality and add / modify your feature set.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文