朴素贝叶斯分类器 - 多重决策
我需要知道朴素贝叶斯分类器是否 可用于生成多个决策。我不能 找到任何有证据支持的例子 多项决定。我是这个领域的新手。所以,我有点 使困惑。
实际上我需要开发字符识别软件。 在那里我需要确定给定的字符是什么。 看来贝叶斯分类器可以用来识别 给定的字符是否是特定字符, 但它不能给出任何其他建议。
例如,如果给出“3”的图像(我们认为它是“3”), 如果系统无法将其识别为“3”。如果看起来像 '2' 对于系统,系统应返回 '2'。
我对朴素贝叶斯分类器的想法 是,一旦我们训练数据,我们就可以问 系统给定的字符是否为特定字符 或不。例如。我们绘制一个特定数字的图像并询问 系统是否为“2”。
我进一步注意到 KNN(k 最近邻)给出了多个决策。 赋予该字符,它决定最接近的兼容 训练数据中给出的字符。
如果有人能解释我是否 朴素贝叶斯分类器可用于制作多个 诸如上述的决定。
I need to know whether the Naive bayesian classifier
can be used to generate multiple decisions. I couldn't
find any examples which have any evidence in supporting
multiple decisions. I'm new to this area. So, I'm bit
confused.
Actually I need to develop character recognition software.
There I need to identify what the given character is.
It seems the Bayesian classifier can be used to identify
whether a character given is a particular character or not,
but it cannot give any other suggestions.
For example, if an image of '3' is given(we think it's '3'),
if the system cannot identify it as '3'. If it seems like
'2' for the system, system should return '2'.
The idea that I have about Naive Bayesian classifier
is, once we train data we can ask
the system whether the given character is a particular character
or not. Eg. We draw an image of a particular number and ask
the system whether it's '2' or not.
I further noticed KNN(k nearest neighbor) gives multiple decisions.
A character given to that, it decides a nearest compatible
character given in training data.
It's highly appreciated if someone could explain me whether
the Naive Bayesian classifier can be used to make multiple
decisions such as above.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
朴素贝叶斯分类器的假设是数据维度是独立的(朴素部分)并且模型是生成的(贝叶斯部分)。换句话说,您可以对如何从世界状态生成数据进行建模 - P(data|world_state),其中 world_state 可以是连续变量或分类变量(具有多个类类别)。这与判别模型形成鲜明对比,判别模型忽略数据生成并通过直接“破解”数据来描述世界状态的后验概率:P(world_state|data)
以下是实现朴素贝叶斯分类器必须遵循的步骤:
1. 使用生成模型(例如高斯分布)对数据进行建模。每个类都有自己的高斯。在朴素模型中,您对每个数据维度采用高斯乘积。在更完整的模型中,高斯的维数等于数据的维数。
2. 计算出每个类别的先验(例如,为每个类别分配单一概率的分类分布);
3. 通过将高斯函数拟合到数据来学习参数;
4. 通过贝叶斯公式评估测试数据类:
公式 1 中的第一项称为后验,第二项是似然,最后一项是先验。当您计算后验最大值 (MAP)(负责数据生成的最可能类)时,[2] 中显示的分母通常会被忽略。然而,分母对于理解类模型如何协同工作非常重要。
例如,您可以为每个类别创建一个非常复杂的生成模型,但您的后验看起来非常简单,因为在归一化过程中,其中一个可能性被减少到 0。在这种情况下,最好放弃贝叶斯方法并创建一个判别模型的参数比生成模型的参数少。在下图中,纵轴是世界状态(类)的概率,而横轴代表数据。
The assumption of a Naive Bayesian classifier is that data dimensions are independent (naive part) and that the model is generative (Bayesian part). In other words you model how data are generated from world states - P(data|world_state), where world_state can be continues or categorical variable (with multiple classes-categories). This runs in contrast to discriminative models that ignore data generation and describe a posterior probability of world states via 'hacking' the data directly: P(world_state|data)
Here are the steps you have to follow to implement a Naive Bayesian classifier:
1. Model your data with a generative model, for example, a Gaussian distribution. Each class would have its own Gaussian. In naive model you take product of Gaussians for each data dimension. In more complete model, the dimensionality of a Gaussian is equal to the dimensionality of the data.
2. Figure out a prior for each of your classes (for example, a categorical distribution with a single probability assigned to each class);
3. Learn parameters by fitting Gaussians to your data;
4. Evaluate test data class via a Bayesian formula:
The first term in formula 1 is called posterior, the second one is a likelihood and the last one is a prior. The denominator shown in [2] often gets ignored when you calculate a maximum of a posterior (MAP) that is the most probable class responsible for data generation. However, the denominator is very important in understanding how class models work together.
For example, you can create a very sophisticated generative model for each class but your posterior would look very simple due to the fact that during normalization one of the likelihoods was reduced to 0. In this case it is better to abandon Bayesian approach and create a discriminative model with fewer parameters than you put in the generative one. In the diagram below, vertical axes are probabilities of a world state (class) while horizontal axes represent data.
贝叶斯分类器应该给出一个项目属于多个类别中每个类别的概率。绝对有可能有两个以上的课程。
根据类别的概率,您通常会想要做出决定,这可以通过选择最有可能的类别来完成。这可能就是为什么您认为它只提供了一种可能性。
A Bayes classifier should give a probability for an item to belong to each of several classes. It's definitely possible to have more than two classes.
With the probabilities for the classes, you will normally want to make a decision, which can be done by, e.g., choosing the most likely class. This may be why you're seeing it as providing just one possibility.