想知道贝叶斯分类器是否是正确的方法?
我想知道贝叶斯分类器对于应用程序是否有意义,其中相同的短语“冷饮”(例如)在与某些事物(啤酒、苏打水)相关时是“好”,但在与其他事物(牛排、苏打水)相关时是“坏”。披萨、汉堡)?
我想知道的是,训练一个贝叶斯分类器(“啤酒冷”和“苏打冷”是“好”)是否会抵消训练“冷牛排”和“冷汉堡”是“坏”)。
或者,是否可以(正确地)训练贝叶斯“冷饮”可能是“好”或“坏”,具体取决于它与什么相关?
我在这里和其他地方找到了很多关于贝叶斯的好信息,但无法确定它是否适合这种类型的应用程序,其中短语好或坏的答案是“取决于”?
I'm wondering if a Bayes classifier makes sense for an application where the same phrase "served cold" (for example) is "good" when associated some things (beer, soda) but "bad" when related to other things (steak, pizza, burger)?
What I'm wondering is if training a Bayes classifier that ("beer cold" and "soda cold" are "good") cancels out training it that "steak served cold" and "burger served cold" are "bad").
Or, can Bayes (correctly) be trained that "served cold" might be "good" or "bad" depending on what it is associated with?
I found a lot of good info on Bayes, here and elsewhere, but was unable to determine if it's suitable for this type of application where the answer to a phrase being good or bad is "it depends"?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
朴素贝叶斯分类器假设属性之间是独立的。例如,假设您有以下数据:
苹果水果红色 BAD
苹果 水果 绿色 坏
香蕉果黄GOOD
番茄蔬菜红GOOD
独立性是指属性(名称、果实、颜色)独立;例如,“苹果”可以是“水果”或“蔬菜”。在这种情况下,属性“名称”和“水果”是相关的,因此朴素贝叶斯分类器太天真了(它可能会将“苹果果黄”分类为“坏”,因为它是一个苹果,而且它是一个水果 - 但并非所有苹果水果?)。
为了回答你原来的问题,朴素贝叶斯分类器假设类别(好或坏)独立地取决于每个属性,但事实并非如此——我喜欢热披萨和冷苏打水。
编辑:如果您正在寻找具有一定实用性但理论上可能存在大量 I 型和 II 型错误的分类器,朴素贝叶斯就是这样一个分类器。朴素贝叶斯总比没有好,但是使用不太朴素的分类器具有可衡量的价值。
A Naive Bayes classifier assumes independence between attributes. For example, assume you have the following data:
apple fruit red BAD
apple fruit green BAD
banana fruit yellow GOOD
tomato vegetable red GOOD
Independence means that the attributes (name, fruit, color) are independent; for example, that "apple" could be either "fruit" or "vegetable". In this case the attributes "name" and "fruit" are dependent so a Naive Bayes classifier is too naive (it would likely classify "apple fruit yellow" as BAD because it's an apple AND it's a fruit -- but aren't all apples fruits?).
To answer your original question, a Naive Bayes classifer assumes that class (GOOD or BAD) depends upon each attribute independently, which isn't the case -- I like my pizza hot and my soda cold.
EDIT: If you're looking for a classifier that has some utility but in theory could have numerous Type I and Type II errors, Naive Bayes is such a classifier. Naive Bayes is better than nothing, but there's measurable value in using a less naive classifier.
我不会像丹尼尔建议的那样很快就驳回贝叶斯。
贝叶斯的质量(数学上的性能)首先取决于训练数据的数量和质量,以及您在开发算法时所做的假设。
给你一个简短的例子,如果你只喂它{'啤酒冷'=>; :好,“披萨冷”=> :bad}“冷”这个词实际上不会影响分类。它只会决定所有啤酒都好,所有披萨都不好(看看它有多聪明?:))
无论如何,答案太短,无法详细解释这一点,我建议阅读 Paul Graham 的文章,了解他如何开发垃圾邮件过滤器 - 请注意,他基于贝叶斯制作了自己的算法,而不仅仅是现成的分类器。根据我(到目前为止很短)的经验,似乎您最好跟随他为手头的特定问题开发特定版本的算法,这样您就可以控制各种特定领域的假设。
如果您有兴趣,可以在此处关注我的尝试(在 ruby 中): http:// /arubyguy.com/2011/03/03/bayes-classification-update/
I wouldn't dismiss Bayes as fast as Daniel suggested.
The quality (performance in math-speak) of Bayes depends on amount and quality of training data above all, and on the assumptions you make when you develop your algorithm.
To give you a short example, if you feed into it only {'beer cold' => :good, 'pizza cold' => :bad} the word 'cold' won't actually affect classification. It will just decide that all beers are good and all pizzas are bad (see how smart it is? :))
Anyway, the answer is too short to explain this in detail, I would recommend reading Paul Graham's essay on how he developed his spam filter - note that he made his own algorithm based on Bayes and not just off-the-shelf classifier. In my (so far short) experience it seems that you are better off following him in developing specific version of algorithm for specific problem at hand so you have control over various domain specific assumptions.
You can follow my attempts (in ruby) here if you are interested: http://arubyguy.com/2011/03/03/bayes-classification-update/