使用 MATLAB 中的 PRTools 计算未知错误分类的后验分布
我正在使用 PRTools MATLAB 库来训练一些分类器、生成测试数据并测试分类器。
我有以下详细信息:
- N:测试示例总数
- k:测试示例数量 每个错误分类 分类器和类
我想做的
:计算并绘制错误分类的未知概率(表示为 q)的贝叶斯后验分布,即作为 q 本身的概率密度函数(因此,P(q) 将在 q 上绘制,从 0 到 1)。
我有这个(数学公式,不是 matlab 代码!):
Posterior = Likelihood * Prior / Normalization constant =
P(q|k,N) = P(k|q,N) * P(q|N) / P(k|N)
先验设置为 1,所以我只需要计算似然和归一化常数。
我知道可能性可以表示为(其中 B(N,k) 是二项式系数):
P(k|q,N) = B(N,k) * q^k * (1-q)^(N-k)
...因此归一化常数只是上面后验的积分,从 0 到 1:(
P(k|N) = B(N,k) * integralFromZeroToOne( q^k * (1-q)^(N-k) )
二项式系数 ( B (N,k) ) 可以省略,因为它出现在似然性和归一化常数中)
现在,我听说归一化常数的积分应该能够作为一系列计算......类似于
k!(N-k)! / (N+1)!
:正确吗? (我有这个系列的一些讲义,但无法弄清楚它是针对归一化常数积分,还是针对错误分类(q)的整体分布)
此外,欢迎提示如何实际计算这个? (阶乘很容易产生截断错误,对吗?)...以及,如何实际计算最终图(q 上的后验分布,从 0 到 1)。
I'm using the PRTools MATLAB library to train some classifiers, generating test data and testing the classifiers.
I have the following details:
- N: Total # of test examples
- k: # of
mis-classification for each
classifier and class
I want to do:
Calculate and plot Bayesian posterior distributions of the unknown probabilities of mis-classification (denoted q), that is, as probability density functions over q itself (so, P(q) will be plotted over q, from 0 to 1).
I have that (math formulae, not matlab code!):
Posterior = Likelihood * Prior / Normalization constant =
P(q|k,N) = P(k|q,N) * P(q|N) / P(k|N)
The prior is set to 1, so I only need to calculate the likelihood and normalization constant.
I know that the likelihood can be expressed as (where B(N,k) is the binomial coefficient):
P(k|q,N) = B(N,k) * q^k * (1-q)^(N-k)
... so the Normalization constant is simply an integral of the posterior above, from 0 to 1:
P(k|N) = B(N,k) * integralFromZeroToOne( q^k * (1-q)^(N-k) )
(The Binomial coefficient ( B(N,k) ) can be omitted though as it appears in both the likelihood and normalization constant)
Now, I've heard that the integral for the normalization constant should be able to be calculated as a series ... something like:
k!(N-k)! / (N+1)!
Is that correct? (I have some lecture notes with this series, but can't figure out if it is for the normalization constant integral, or for the overall distribution of mis-classification (q))
Also, hints are welcome as how to practically calculate this? (factorials are easily creating truncation errors right?) ... AND, how to practically calculate the final plot (the posterior distribution over q, from 0 to 1).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我确实没有对贝叶斯后验分布做太多工作(并且有一段时间没有),但我会尽力帮助您提供帮助。首先,
您可以使用 nchoosek() 在 Matlab 中计算二项式系数,尽管文档中确实指出大系数可能存在精度问题。 N 和 k 有多大?
其次,根据 Mathematica,
其中
csc()
是余割函数,Gamma()
是 伽玛函数。然而,Gamma(x) = (x-1)!我们稍后会使用它。问题是我们在底部有一个函数 Gamma(kN) 并且 kN 将为负。然而,反射公式将帮助我们解决这个问题,以便我们最终得到:看来你的笔记是正确的。
I really haven't done much with Bayesian posterior distributions ( and not for a while), but I'll try to help with what you've given. First,
and you can calculate the binomial coefficients in Matlab with nchoosek() though it does say in the docs that there can be accuracy problems for large coefficients. How big are N and k?
Second, according to Mathematica,
where
csc()
is the cosecant function andGamma()
is the gamma function. However, Gamma(x) = (x-1)! which we'll use in a moment. The problem is that we have a function Gamma(k-N) on the bottom and k-N will be negative. However, the reflection formula will help us with that so that we end up with:Apparently, your notes were correct.
令
q
为错误分类的概率。那么您在N
次运行中观察到k
个错误分类的概率由下式给出:然后,您需要为
q
假设一个合适的先验,其范围在 0 和1. 上述的共轭先验是 beta 分布。如果q ~ Beta(a,b)
那么后验也是Beta分布。供您参考,后验是:希望有帮助。
Let
q
be the probability of mis-classification. Then the probability that you would observek
mis-classifications inN
runs is given by:You need to then assume a suitable prior for
q
which is bounded between 0 and 1. A conjugate prior for the above is the beta distribution. Ifq ~ Beta(a,b)
then the posterior is also a Beta distribution. For your info the posterior is:Hope that helps.