当前位置：文江博客话题详情

如何根据少量证据有效地估计概率？

发布于 2024-08-10 05:55:29 字数 946 浏览 4 评论 0原文

几个月来我一直在试图找到这个问题的答案（用于机器学习应用程序），这似乎不应该是一个非常困难的问题，但我是一名软件工程师，数学从来都不是我的优势之一。

场景如下：

我有一枚（可能）重量不均匀的硬币，我想计算出它正面朝上的概率。我知道来自同一个盒子的硬币的平均概率为 p，而且我也知道这些概率的标准差（称之为 s）。

（如果除了平均值和标准偏差之外的其他硬币的概率的其他汇总属性也有用的话，我可能也能得到它们。）

我抛硬币n次，它出现了正面 >h 次。

简单的方法是概率仅为 h/n - 但如果 n 很小，则不太可能准确。

是否有一种计算有效的方法（即不涉及非常非常大或非常非常小的数字）来考虑 p 和 s 以得出更准确的结果概率估计，即使n很小？

如果任何答案可以使用伪代码而不是数学符号，我将不胜感激，因为我发现大多数数学符号都是难以理解的;-)

其他答案： SO上还有一些类似的答案，但提供的答案并不令人满意。例如这不是计算性的高效，因为它很快涉及的数字比双精度浮点数所能表示的要小得多。这个结果是不正确的。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

望笑 2024-08-17 05:55:29

不幸的是，如果不了解一些基本数学，你就无法进行机器学习——这就像向某人寻求编程帮助，但又不想了解“变量”、“子例程”以及所有 if-then 之类的东西。

更好的方法称为贝叶斯积分，但有一种更简单的近似方法，称为“最大后验概率”(MAP)。这与通常的想法非常相似，只是您可以放入先验分布。

很花哨的说法，但您可能会问，h/(h+t) 公式从何而来？当然这是显而易见的，但事实证明这是你“没有先验”时得到的答案。当您添加先验时，下面的方法将更加复杂。贝叶斯积分将是下一个目标，但这更困难，而且也许没有必要。

据我了解，问题有两个：首先，你从硬币袋中取出一枚硬币。这枚硬币有一个称为“θ”的“头晕”，因此它给出了翻转的头“θ”分数。但是这枚硬币的 theta 来自主分布，我想我假设它是高斯分布，均值 P 和标准差 S。

接下来你要做的是写下看到整个 shebang 的总非标准化概率（称为可能性），所有数据：（h 头，t 尾）

L = (theta)^h * (1-theta)^t * Gaussian(theta; P, S)。

Gaussian(theta; P, S) = exp( -(theta-P)^2/(2*S^2) ) / sqrt(2*Pi*S^2)

这就是“先画1个值”的意思来自高斯的 theta”，然后使用该 theta 从一枚硬币上画出 h 个正面和 t 个反面。

MAP 原理说，如果您不知道 theta，请在给定您已知的数据的情况下找到使 L 最大化的值。你可以用微积分来做到这一点。让它变得简单的技巧是先取对数。定义 LL = log(L)。当 L 最大化时，LL 也将最大化。

所以
LL = hlog(theta) + tlog(1-theta) + -(theta-P)^2 / (2*S^2)) - 1/2 * log(2*pi *S^2)

通过微积分寻找极值，您可以找到 dLL/dtheta = 0 的 theta 值。
由于日志的最后一项没有 theta，因此您可以忽略它。

dLL/dtheta = 0 = (h/theta) + (P-theta)/S^2 - (t/(1-theta)) = 0。

如果您能解出这个方程的 theta，您将得到答案，即 MAP给定正面数 h 和反面数 t 的 θ 估计值。

如果您想要快速逼近，请尝试执行牛顿法的一步，即从您提出的 theta 开始，即最明显的（称为最大似然）估计值 theta = h/(h+t)。

这个“明显”的估计从何而来？如果您执行上述操作但不输入高斯先验：h/theta - t/(1-theta) = 0，您将得出 theta = h/(h+t)。

如果你的先验概率非常小（通常是这种情况），而不是接近 0.5，那么 theta 上的高斯先验可能是不合适的，因为它预测一些具有负概率的权重，显然是错误的。更合适的是对数 theta 的高斯先验（“对数正态分布”）。以同样的方式插入并完成微积分。

Unfortunately you can't do machine learning without knowing some basic math---it's like asking somebody for help in programming but not wanting to know about "variables" , "subroutines" and all that if-then stuff.

The better way to do this is called a Bayesian integration, but there is a simpler approximation called "maximum a postieri" (MAP). It's pretty much like the usual thinking except you can put in the prior distribution.

Fancy words, but you may ask, well where did the h/(h+t) formula come from? Of course it's obvious, but it turns out that it is answer that you get when you have "no prior". And the method below is the next level of sophistication up when you add a prior. Going to Bayesian integration would be the next one but that's harder and perhaps unnecessary.

As I understand it the problem is two fold: first you draw a coin from the bag of coins. This coin has a "headsiness" called theta, so that it gives a head theta fraction of the flips. But the theta for this coin comes from the master distribution which I guess I assume is Gaussian with mean P and standard deviation S.

What you do next is to write down the total unnormalized probability (called likelihood) of seeing the whole shebang, all the data: (h heads, t tails)

L = (theta)^h * (1-theta)^t * Gaussian(theta; P, S).

Gaussian(theta; P, S) = exp( -(theta-P)^2/(2*S^2) ) / sqrt(2*Pi*S^2)

This is the meaning of "first draw 1 value of theta from the Gaussian" and then draw h heads and t tails from a coin using that theta.

The MAP principle says, if you don't know theta, find the value which maximizes L given the data that you do know. You do that with calculus. The trick to make it easy is that you take logarithms first. Define LL = log(L). Wherever L is maximized, then LL will be too.

so
LL = hlog(theta) + tlog(1-theta) + -(theta-P)^2 / (2*S^2)) - 1/2 * log(2*pi*S^2)

By calculus to look for extrema you find the value of theta such that dLL/dtheta = 0.
Since the last term with the log has no theta in it you can ignore it.

dLL/dtheta = 0 = (h/theta) + (P-theta)/S^2 - (t/(1-theta)) = 0.

If you can solve this equation for theta you will get an answer, the MAP estimate for theta given the number of heads h and the number of tails t.

If you want a fast approximation, try doing one step of Newton's method, where you start with your proposed theta at the obvious (called maximum likelihood) estimate of theta = h/(h+t).

And where does that 'obvious' estimate come from? If you do the stuff above but don't put in the Gaussian prior: h/theta - t/(1-theta) = 0 you'll come up with theta = h/(h+t).

If your prior probabilities are really small, as is often the case, instead of near 0.5, then a Gaussian prior on theta is probably inappropriate, as it predicts some weight with negative probabilities, clearly wrong. More appropriate is a Gaussian prior on log theta ('lognormal distribution'). Plug it in the same way and work through the calculus.

回复收藏 0 原文

东风软 2024-08-17 05:55:29

您可以使用 p 作为估计概率的先验。这与进行伪计数平滑基本相同。即，用作

(h + c * p) / (n + c)

您的估计。当h和n很大时，就变成h / n。当h和n很小时，这只是c * p / c = p。 c 的选择取决于您。您可以基于 s 但最终您必须决定多小才算太小。

You can use p as a prior on your estimated probability. This is basically the same as doing pseudocount smoothing. I.e., use

(h + c * p) / (n + c)

as your estimate. When h and n are large, then this just becomes h / n. When h and n are small, this is just c * p / c = p. The choice of c is up to you. You can base it on s but in the end you have to decide how small is too small.

回复收藏 0 原文

日裸衫吸 2024-08-17 05:55:29

您在这个问题上没有足够的信息。

盒子里有多少枚硬币？如果是两个，那么在某些情况下（例如一枚硬币总是正面，另一枚硬币总是反面），知道 p 和 s 会很有用。如果数量较多，特别是如果只有一些硬币的重量很小，那么它就没有用处。

小n是什么？ 2？ 5？ 10？ 100？加权硬币出现正面/反面的概率是多少？ 100/0、60/40、50.00001/49.99999？权重如何分配？每枚硬币是否有两种可能的权重之一？它们遵循钟形曲线吗？归结为：加权/未加权硬币之间的差异、加权硬币的分布以及盒子中硬币的数量都将决定 n 必须是多少

，才能让您充满信心地解决这个问题。

您尝试执行的操作的名称是伯努利试验。知道名字应该有助于找到更好的资源。

对评论的回应：

如果 p 的差异那么小，那么您将必须进行大量试验，并且无法绕过它。

假设偏差均匀分布，p 仍为 0.5，并且所有标准差将告诉您至少某些代币存在较小的偏差。

在这种情况下，抛掷多少次将由硬币的重量决定。即使抛掷 500 次，您也不会获得检测 0.51/0.49 分裂的强烈置信度（大约 2/3）。

回复收藏 0 原文

℡寂寞咖啡 2024-08-17 05:55:29

一般来说，您要寻找的是最大似然估计。 Wolfram 演示项目提供了在给定抛掷样本的情况下估计硬币正面落地的概率的说明。

回复收藏 0 原文

清音悠歌 2024-08-17 05:55:29

好吧，我不是数学家，但我认为简单的贝叶斯方法很直观且广泛适用，足以值得深入研究。上面的其他人已经建议了这一点，但也许如果你像我一样，你会更喜欢更冗长的内容。
用这个术语来说，您有一组互斥的假设 H 和一些数据 D，并且您想要找到给定数据时每个假设 Hi 正确的（后验）概率。如果您必须选择一个假设，那么您可能会选择具有最大后验概率（如上所述的 MAP）的假设。正如 Matt 上面指出的，贝叶斯方法与最大似然法（找到最大化 Pr(D|H) 的 H）的区别在于，您还拥有一些关于哪些假设最有可能的先验信息，并且您希望合并这些先验。

所以你可以从基本概率 Pr(H|D) = Pr(D|H)*Pr(H)/Pr(D) 得到。您可以通过为您想要测试的每个假设创建一系列离散概率 Hi（例如 [0.0,0.05, 0.1 ... 0.95, 1.0]），然后确定您的先前 Pr(H ）对于每个 Hi - 上面假设您有先验的正态分布，如果可以接受，您可以使用平均值和标准差来获得每个 Pr(Hi) - 或者如果您愿意，可以使用其他分布。对于抛硬币，Pr(D|H) 当然是通过二项式确定的，该二项式使用 n 次试验中观察到的成功次数和正在测试的特定 Hi。分母 Pr(D) 可能看起来令人畏惧，但我们假设我们已经用假设覆盖了所有基础，因此 Pr(D) 是 Pr(D|Hi)Pr(H) 对所有 H 的总和。

非常简单，如果你稍微想一下，如果你再想一下，也许就不是这样了。

回复收藏 0 原文

~没有更多了~