如何根据少量证据有效地估计概率?

发布于 2024-08-10 05:55:29 字数 946 浏览 4 评论 0原文

几个月来我一直在试图找到这个问题的答案(用于机器学习应用程序),这似乎不应该是一个非常困难的问题,但我是一名软件工程师,数学从来都不是我的优势之一。

场景如下:

我有一枚(可能)重量不均匀的硬币,我想计算出它正面朝上的概率。我知道来自同一个盒子的硬币的平均概率为 p,而且我也知道这些概率的标准差(称之为 s)。

(如果除了平均值和标准偏差之外的其他硬币的概率的其他汇总属性也有用的话,我可能也能得到它们。)

我抛硬币n次,它出现了正面 >h 次。

简单的方法是概率仅为 h/n - 但如果 n 很小,则不太可能准确。

是否有一种计算有效的方法(即不涉及非常非常大或非常非常小的数字)来考虑 ps 以得出更准确的结果概率估计,即使n很小?

如果任何答案可以使用伪代码而不是数学符号,我将不胜感激,因为我发现大多数数学符号都是难以理解的;-)


其他答案: SO上还有一些类似的答案,但提供的答案并不令人满意。例如不是计算性的高效,因为它很快涉及的数字比双精度浮点数所能表示的要小得多。 这个结果是不正确的。

I've been trying to find an answer to this for months (to be used in a machine learning application), it doesn't seem like it should be a terribly hard problem, but I'm a software engineer, and math was never one of my strengths.

Here is the scenario:

I have a (possibly) unevenly weighted coin and I want to figure out the probability of it coming up heads. I know that coins from the same box that this one came from have an average probability of p, and I also know the standard deviation of these probabilities (call it s).

(If other summary properties of the probabilities of other coins aside from their mean and stddev would be useful, I can probably get them too.)

I toss the coin n times, and it comes up heads h times.

The naive approach is that the probability is just h/n - but if n is small this is unlikely to be accurate.

Is there a computationally efficient way (ie. doesn't involve very very large or very very small numbers) to take p and s into consideration to come up with a more accurate probability estimate, even when n is small?

I'd appreciate it if any answers could use pseudocode rather than mathematical notation since I find most mathematical notation to be impenetrable ;-)


Other answers:
There are some other answers on SO that are similar, but the answers provided are unsatisfactory. For example this is not computationally efficient because it quickly involves numbers way smaller than can be represented even in double-precision floats. And this one turned out to be incorrect.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

望笑 2024-08-17 05:55:29

不幸的是,如果不了解一些基本数学,你就无法进行机器学习——这就像向某人寻求编程帮助,但又不想了解“变量”、“子例程”以及所有 if-then 之类的东西。

更好的方法称为贝叶斯积分,但有一种更简单的近似方法,称为“最大后验概率”(MAP)。这与通常的想法非常相似,只是您可以放入先验分布。

很花哨的说法,但您可能会问,h/(h+t) 公式从何而来?当然这是显而易见的,但事实证明这是你“没有先验”时得到的答案。当您添加先验时,下面的方法将更加复杂。贝叶斯积分将是下一个目标,但这更困难,而且也许没有必要。

据我了解,问题有两个:首先,你从硬币袋中取出一枚硬币。这枚硬币有一个称为“θ”的“头晕”,因此它给出了翻转的头“θ”分数。但是这枚硬币的 theta 来自主分布,我想我假设它是高斯分布,均值 P 和标准差 S。

接下来你要做的是写下看到整个 shebang 的总非标准化概率(称为可能性),所有数据:(h 头,t 尾)

L = (theta)^h * (1-theta)^t * Gaussian(theta; P, S)。

Gaussian(theta; P, S) = exp( -(theta-P)^2/(2*S^2) ) / sqrt(2*Pi*S^2)

这就是“先画1个值”的意思来自高斯的 theta”,然后使用该 theta 从一枚硬币上画出 h ​​个正面和 t 个反面。

MAP 原理说,如果您不知道 theta,请在给定您已知的数据的情况下找到使 L 最大化的值。你可以用微积分来做到这一点。让它变得简单的技巧是先取对数。定义 LL = log(L)。当 L 最大化时,LL 也将最大化。

所以
LL = hlog(theta) + tlog(1-theta) + -(theta-P)^2 / (2*S^2)) - 1/2 * log(2*pi *S^2)

通过微积分寻找极值,您可以找到 dLL/dtheta = 0 的 theta 值。
由于日志的最后一项没有 theta,因此您可以忽略它。

dLL/dtheta = 0 = (h/theta) + (P-theta)/S^2 - (t/(1-theta)) = 0。

如果您能解出这个方程的 theta,您将得到答案,即 MAP给定正面数 h 和反面数 t 的 θ 估计值。

如果您想要快速逼近,请尝试执行牛顿法的一步,即从您提出的 theta 开始,即最明显的(称为最大似然)估计值 theta = h/(h+t)。

这个“明显”的估计从何而来?如果您执行上述操作但不输入高斯先验:h/theta - t/(1-theta) = 0,您将得出 theta = h/(h+t)。

如果你的先验概率非常小(通常是这种情况),而不是接近 0.5,那么 theta 上的高斯先验可能是不合适的,因为它预测一些具有负概率的权重,显然是错误的。更合适的是对数 theta 的高斯先验(“对数正态分布”)。以同样的方式插入并完成微积分。

Unfortunately you can't do machine learning without knowing some basic math---it's like asking somebody for help in programming but not wanting to know about "variables" , "subroutines" and all that if-then stuff.

The better way to do this is called a Bayesian integration, but there is a simpler approximation called "maximum a postieri" (MAP). It's pretty much like the usual thinking except you can put in the prior distribution.

Fancy words, but you may ask, well where did the h/(h+t) formula come from? Of course it's obvious, but it turns out that it is answer that you get when you have "no prior". And the method below is the next level of sophistication up when you add a prior. Going to Bayesian integration would be the next one but that's harder and perhaps unnecessary.

As I understand it the problem is two fold: first you draw a coin from the bag of coins. This coin has a "headsiness" called theta, so that it gives a head theta fraction of the flips. But the theta for this coin comes from the master distribution which I guess I assume is Gaussian with mean P and standard deviation S.

What you do next is to write down the total unnormalized probability (called likelihood) of seeing the whole shebang, all the data: (h heads, t tails)

L = (theta)^h * (1-theta)^t * Gaussian(theta; P, S).

Gaussian(theta; P, S) = exp( -(theta-P)^2/(2*S^2) ) / sqrt(2*Pi*S^2)

This is the meaning of "first draw 1 value of theta from the Gaussian" and then draw h heads and t tails from a coin using that theta.

The MAP principle says, if you don't know theta, find the value which maximizes L given the data that you do know. You do that with calculus. The trick to make it easy is that you take logarithms first. Define LL = log(L). Wherever L is maximized, then LL will be too.

so
LL = hlog(theta) + tlog(1-theta) + -(theta-P)^2 / (2*S^2)) - 1/2 * log(2*pi*S^2)

By calculus to look for extrema you find the value of theta such that dLL/dtheta = 0.
Since the last term with the log has no theta in it you can ignore it.

dLL/dtheta = 0 = (h/theta) + (P-theta)/S^2 - (t/(1-theta)) = 0.

If you can solve this equation for theta you will get an answer, the MAP estimate for theta given the number of heads h and the number of tails t.

If you want a fast approximation, try doing one step of Newton's method, where you start with your proposed theta at the obvious (called maximum likelihood) estimate of theta = h/(h+t).

And where does that 'obvious' estimate come from? If you do the stuff above but don't put in the Gaussian prior: h/theta - t/(1-theta) = 0 you'll come up with theta = h/(h+t).

If your prior probabilities are really small, as is often the case, instead of near 0.5, then a Gaussian prior on theta is probably inappropriate, as it predicts some weight with negative probabilities, clearly wrong. More appropriate is a Gaussian prior on log theta ('lognormal distribution'). Plug it in the same way and work through the calculus.

东风软 2024-08-17 05:55:29

您可以使用 p 作为估计概率的先验。这与进行伪计数平滑基本相同。即,用作

(h + c * p) / (n + c)

您的估计。当hn很大时,就变成h / n。当hn很小时,这只是c * p / c = pc 的选择取决于您。您可以基于 s 但最终您必须决定多小才算太小。

You can use p as a prior on your estimated probability. This is basically the same as doing pseudocount smoothing. I.e., use

(h + c * p) / (n + c)

as your estimate. When h and n are large, then this just becomes h / n. When h and n are small, this is just c * p / c = p. The choice of c is up to you. You can base it on s but in the end you have to decide how small is too small.

日裸衫吸 2024-08-17 05:55:29

您在这个问题上没有足够的信息。

盒子里有多少枚硬币?如果是两个,那么在某些情况下(例如一枚硬币总是正面,另一枚硬币总是反面),知道 p 和 s 会很有用。如果数量较多,特别是如果只有一些硬币的重量很小,那么它就没有用处。

小n是什么? 2? 5? 10? 100?加权硬币出现正面/反面的概率是多少? 100/0、60/40、50.00001/49.99999?权重如何分配?每枚硬币是否有两种可能的权重之一?它们遵循钟形曲线吗? 归结为:加权/未加权硬币之间的差异、加权硬币的分布以及盒子中硬币的数量都将决定 n 必须是多少

,才能让您充满信心地解决这个问题。

您尝试执行的操作的名称是伯努利试验。知道名字应该有助于找到更好的资源。


对评论的回应:

如果 p 的差异那么小,那么您将必须进行大量试验,并且无法绕过它。

假设偏差均匀分布,p 仍为 0.5,并且所有标准差将告诉您至少某些代币存在较小的偏差。

在这种情况下,抛掷多少次将由硬币的重量决定。即使抛掷 500 次,您也不会获得检测 0.51/0.49 分裂的强烈置信度(大约 2/3)。

You don't have nearly enough info in this question.

How many coins are in the box? If it's two, then in some scenarios (for example one coin is always heads, the other always tails) knowing p and s would be useful. If it's more than a few, and especially if only some of the coins are only slightly weighted then it is not useful.

What is a small n? 2? 5? 10? 100? What is the probability of a weighted coin coming up heads/tail? 100/0, 60/40, 50.00001/49.99999? How is the weighting distributed? Is every coin one of 2 possible weightings? Do they follow a bell curve? etc.

It boils down to this: the differences between a weighted/unweighted coin, the distribution of weighted coins, and the number coins in your box will all decide what n has to be for you to solve this with a high confidence.

The name for what you're trying to do is a Bernoulli trial. Knowing the name should be helpful in finding better resources.


Response to comment:

If you have differences in p that small, you are going to have to do a lot of trials and there's no getting around it.

Assuming a uniform distribution of bias, p will still be 0.5 and all standard deviation will tell you is that at least some of the coins have a minor bias.

How many tosses, again, will be determined under these circumstances by the weighting of the coins. Even with 500 tosses, you won't get a strong confidence (about 2/3) detecting a .51/.49 split.

℡寂寞咖啡 2024-08-17 05:55:29

一般来说,您要寻找的是最大似然估计。 Wolfram 演示项目提供了在给定抛掷样本的情况下估计硬币正面落地的概率的说明。

In general, what you are looking for is Maximum Likelihood Estimation. Wolfram Demonstration Project has an illustration of estimating the probability of a coin landing head, given a sample of tosses.

清音悠歌 2024-08-17 05:55:29

好吧,我不是数学家,但我认为简单的贝叶斯方法很直观且广泛适用,足以值得深入研究。上面的其他人已经建议了这一点,但也许如果你像我一样,你会更喜欢更冗长的内容。
用这个术语来说,您有一组互斥的假设 H 和一些数据 D,并且您想要找到给定数据时每个假设 Hi 正确的(后验)概率。如果您必须选择一个假设,那么您可能会选择具有最大后验概率(如上所述的 MAP)的假设。正如 Matt 上面指出的,贝叶斯方法与最大似然法(找到最大化 Pr(D|H) 的 H)的区别在于,您还拥有一些关于哪些假设最有可能的先验信息,并且您希望合并这些先验。

所以你可以从基本概率 Pr(H|D) = Pr(D|H)*Pr(H)/Pr(D) 得到。您可以通过为您想要测试的每个假设创建一系列离散概率 Hi(例如 [0.0,0.05, 0.1 ... 0.95, 1.0]),然后确定您的先前 Pr(H )对于每个 Hi - 上面假设您有先验的正态分布,如果可以接受,您可以使用平均值和标准差来获得每个 Pr(Hi) - 或者如果您愿意,可以使用其他分布。对于抛硬币,Pr(D|H) 当然是通过二项式确定的,该二项式使用 n 次试验中观察到的成功次数和正在测试的特定 Hi。分母 Pr(D) 可能看起来令人畏惧,但我们假设我们已经用假设覆盖了所有基础,因此 Pr(D) 是 Pr(D|Hi)Pr(H) 对所有 H 的总和。

非常简单,如果你稍微想一下,如果你再想一下,也许就不是这样了。

Well I'm no math man, but I think the simple Bayesian approach is intuitive and broadly applicable enough to put a little though into it. Others above have already suggested this, but perhaps if your like me you would prefer more verbosity.
In this lingo, you have a set of mutually-exclusive hypotheses, H, and some data D, and you want to find the (posterior) probabilities that each hypothesis Hi is correct given the data. Presumably you would choose the hypothesis that had the largest posterior probability (the MAP as noted above), if you had to choose one. As Matt notes above, what distinguishes the Bayesian approach from only maximum likelihood (finding the H that maximizes Pr(D|H)) is that you also have some PRIOR info regarding which hypotheses are most likely, and you want to incorporate these priors.

So you have from basic probability Pr(H|D) = Pr(D|H)*Pr(H)/Pr(D). You can estimate these Pr(H|D) numerically by creating a series of discrete probabilities Hi for each hypothesis you wish to test, eg [0.0,0.05, 0.1 ... 0.95, 1.0], and then determining your prior Pr(H) for each Hi -- above it is assumed you have a normal distribution of priors, and if that is acceptable you could use the mean and stdev to get each Pr(Hi) -- or use another distribution if you prefer. With coin tosses the Pr(D|H) is of course determined by the binomial using the observed number of successes with n trials and the particular Hi being tested. The denominator Pr(D) may seem daunting but we assume that we have covered all the bases with our hypotheses, so that Pr(D) is the summation of Pr(D|Hi)Pr(H) over all H.

Very simple if you think about it a bit, and maybe not so if you think about it a bit more.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文