面试问题:关于概率
一道面试题:
给定一个函数 f(x),1/4 次返回 0,3/4 次返回 1。 用f(x)写一个函数g(x),1/2次返回0,1/2次返回1。
我的实现是:
function g(x) = {
if (f(x) == 0){ // 1/4
var s = f(x)
if( s == 1) {// 3/4 * 1/4
return s // 3/16
} else {
g(x)
}
} else { // 3/4
var k = f(x)
if( k == 0) {// 1/4 * 3/4
return k // 3/16
} else {
g(x)
}
}
}
我对吗?你的解决方案是什么?(你可以使用任何语言)
An interview question:
Given a function f(x) that 1/4 times returns 0, 3/4 times returns 1.
Write a function g(x) using f(x) that 1/2 times returns 0, 1/2 times returns 1.
My implementation is:
function g(x) = {
if (f(x) == 0){ // 1/4
var s = f(x)
if( s == 1) {// 3/4 * 1/4
return s // 3/16
} else {
g(x)
}
} else { // 3/4
var k = f(x)
if( k == 0) {// 1/4 * 3/4
return k // 3/16
} else {
g(x)
}
}
}
Am I right? What's your solution?(you can use any language)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
如果连续调用 f(x) 两次,可能会出现以下结果(假设
对 f(x) 的连续调用是独立的、同分布的试验):
01 和 10 发生的概率相等。所以迭代直到你得到其中之一
情况下,然后适当地返回 0 或 1:
每次迭代仅调用一次 f(x) 并跟踪这两个值可能很诱人
最新的值,但这不起作用。假设第一个卷是 1,
概率为 3/4。您将循环直到第一个 0,然后返回 1(概率为 3/4)。
If you call f(x) twice in a row, the following outcomes are possible (assuming that
successive calls to f(x) are independent, identically distributed trials):
01 and 10 occur with equal probability. So iterate until you get one of those
cases, then return 0 or 1 appropriately:
It might be tempting to call f(x) only once per iteration and keep track of the two
most recent values, but that won't work. Suppose the very first roll is 1,
with probability 3/4. You'd loop until the first 0, then return 1 (with probability 3/4).
您的算法的问题在于它以很高的概率重复自身。我的代码:
我测量了您的算法和我的算法计算
f(x)
的平均次数。对于您的f(x)
,每次g(x)
计算大约会计算 5.3 次。根据我的算法,这个数字减少到 3.5 左右。到目前为止,其他答案也是如此,因为它们实际上与您所说的算法相同。PS:您的定义目前没有提到“随机”,但可能是假设的。请参阅我的另一个答案。
The problem with your algorithm is that it repeats itself with high probability. My code:
I've measured average number of times
f(x)
was calculated for your algorithm and for mine. For yoursf(x)
was calculated around 5.3 times per oneg(x)
calculation. With my algorithm this number reduced to around 3.5. The same is true for other answers so far since they are actually the same algorithm as you said.P.S.: your definition doesn't mention 'random' at the moment, but probably it is assumed. See my other answer.
您的解决方案是正确的,但效率有些低并且有更多重复的逻辑。下面是相同算法的更简洁形式的 Python 实现。
如果 f() 很昂贵,您会希望通过使用匹配/不匹配信息来尝试以更少的调用来返回,从而变得更加复杂。这是最有效的解决方案。
平均大约需要 2.6 次调用
g()
。它的工作方式是这样的。我们试图从 0 到 1 中随机选择一个数字,但是当我们知道这个数字是 0 还是 1 时,我们就停下来了。我们开始知道这个数字在 (0, 1) 区间内。 3/4 的数字位于区间的底部 3/4,1/4 位于区间的顶部 1/4。我们根据对 f(x) 的调用来决定哪个。这意味着我们现在处于一个较小的区间。
如果我们清洗、漂洗和重复足够多的次数,我们就可以尽可能精确地确定有限的数量,并且在原始间隔的任何区域中结束的概率绝对相等。特别是,我们结束的概率大于或小于 0.5。
如果你愿意,你可以重复这个想法,一一生成无穷无尽的比特流。事实上,这被证明是生成这种流的最有效方法,也是信息论中“熵”概念的来源。
Your solution is correct, if somewhat inefficient and with more duplicated logic. Here is a Python implementation of the same algorithm in a cleaner form.
If f() is expensive you'd want to get more sophisticated with using the match/mismatch information to try to return with fewer calls to it. Here is the most efficient possible solution.
This takes about 2.6 calls to
g()
on average.The way that it works is this. We're trying to pick a random number from 0 to 1, but we happen to stop as soon as we know whether the number is 0 or 1. We start knowing that the number is in the interval (0, 1). 3/4 of the numbers are in the bottom 3/4 of the interval, and 1/4 are in the top 1/4 of the interval. We decide which based on a call to
f(x)
. This means that we are now in a smaller interval.If we wash, rinse, and repeat enough times we can determine our finite number as precisely as possible, and will have an absolutely equal probability of winding up in any region of the original interval. In particular we have an even probability of winding up bigger than or less than 0.5.
If you wanted you could repeat the idea to generate an endless stream of bits one by one. This is, in fact, provably the most efficient way of generating such a stream, and is the source of the idea of entropy in information theory.
从字面意思来看,f(x) 如果调用四次,将始终返回 0 一次和 1 3 次。这与 f(x) 是概率函数不同,并且在多次迭代中 0 比 1 的比率将接近 1 比 3(1/4 与 3/4)。如果第一个解释有效,那么无论您从序列中的哪个位置开始,满足条件的 f(x) 唯一有效函数就是序列 0111 重复。 (或者1011或1101或1110,它们是来自不同起点的相同序列)。考虑到这个限制,
应该足够了。
Taking this statement literally, f(x) if called four times will always return zero once and 1 3 times. This is different than saying f(x) is a probabalistic function and the 0 to 1 ratio will approach 1 to 3 (1/4 vs 3/4) over many iterations. If the first interpretation is valid, than the only valid function for f(x) that will meet the criteria regardless of where in the sequence you start from is the sequence 0111 repeating. (or 1011 or 1101 or 1110 which are the same sequence from a different starting point). Given that constraint,
should suffice.
正如已经提到的,你的定义对于概率来说并不是那么好。通常,这不仅意味着概率好,而且
分布
也好。否则,您可以简单地编写 g(x) ,它将返回 1,0,1,0,1,0,1,0 - 它将返回 50/50,但数字不会是随机的。另一种作弊方法可能是:
此解决方案将比所有其他解决方案更好,因为它仅调用
f(x)
一次。但结果不会非常随机。As already mentioned your definition is not that good regarding probability. Usually it means that not only probability is good but
distribution
also. Otherwise you can simply write g(x) which will return 1,0,1,0,1,0,1,0 - it will return them 50/50, but numbers won't be random.Another cheating approach might be:
This solution will be better than all others since it calls
f(x)
only one time. But the results will not be very random.对 btilly 的答案中使用的相同方法进行了改进,每个
g()
结果平均实现了约 1.85 次对f()
的调用(下面记录的进一步改进实现了约 1.75,tbilly 的~2.6,吉姆刘易斯接受的答案~5.33)。代码出现在答案的下方。基本上,我以偶数概率生成 0 到 3 范围内的随机整数:然后调用者可以测试第 0 位的第一个 50/50 值,并测试第二位的位 1。原因:1/4 和 3/4 的
f()
概率映射到四分之一比一半更清晰。算法描述
btilly 解释了该算法,但我也会以自己的方式这样做...
该算法基本上生成一个介于 0 和 1 之间的随机实数数字
x
,然后根据该数字所属的“结果桶”返回一个结果:但是,仅给定
f()
生成随机实数是很困难的。我们必须首先知道我们的x
值应该在 0..1 范围内 - 我们将其称为初始“可能的 x”空间。然后,我们会仔细研究x
的实际值:f()
时:f()
返回0(4中概率为1),我们认为x
位于“可能x”空间的下四分之一,并消除该空间的上四分之三f()
返回1(4中的概率为3),我们认为x
位于“可能x”空间的上四分之三,并且消除该空间的下四分之一x
缩小到我们知道它应该映射到哪个结果值并且没有需要为x
获取更具体的值。考虑这个图可能有帮助,也可能没有帮助:-):
代码
如果有帮助,中介一次提供 50/50 个结果:
注意:这可以通过让算法从考虑 f() 切换来进一步调整==0 结果要磨练下四分之一,改为磨练上四分之一,基于此平均可以更快地解析结果桶。从表面上看,这在第三次调用 f() 时似乎很有用,因为上四分之一的结果将指示立即结果 3,而下四分之一的结果仍然跨越概率点 0.5,因此结果为 1 和 2。当我尝试时,结果实际上更糟。需要进行更复杂的调整才能看到实际的好处,我最终编写了第二次到第十一次调用 g() 的下截止值与上截止值的强力比较。我发现的最佳结果是平均值约为 1.75,这是由于第 1、2、5 和 8 次调用 g() 寻求低值(即设置
low = cutoff
)而产生的。A refinement of the same approach used in btilly's answer, achieving an average ~1.85 calls to
f()
perg()
result (further refinement documented below achieves ~1.75, tbilly's ~2.6, Jim Lewis's accepted answer ~5.33). Code appears lower in the answer.Basically, I generate random integers in the range 0 to 3 with even probability: the caller can then test bit 0 for the first 50/50 value, and bit 1 for a second. Reason: the
f()
probabilities of 1/4 and 3/4 map onto quarters much more cleanly than halves.Description of algorithm
btilly explained the algorithm, but I'll do so in my own way too...
The algorithm basically generates a random real number
x
between 0 and 1, then returns a result depending on which "result bucket" that number falls in:But, generating a random real number given only
f()
is difficult. We have to start with the knowledge that ourx
value should be in the range 0..1 - which we'll call our initial "possible x" space. We then hone in on an actual value forx
:f()
:f()
returns 0 (probability 1 in 4), we considerx
to be in the lower quarter of the "possible x" space, and eliminate the upper three quarters from that spacef()
returns 1 (probability 3 in 4), we considerx
to be in the upper three-quarters of the "possible x" space, and eliminate the lower quarter from that spacex
down to the point where we know which result value it should map to and have no need to get a more specific value forx
.It may or may not help to consider this diagram :-):
Code
If helpful, an intermediary to feed out 50/50 results one at a time:
NOTE: This can be further tweaked by having the algorithm switch from considering an f()==0 result to hone in on the lower quarter, to having it hone in on the upper quarter instead, based on which on average resolves to a result bucket more quickly. Superficially, this seemed useful on the third call to f() when an upper-quarter result would indicate an immediate result of 3, while a lower-quarter result still spans probability point 0.5 and hence results 1 and 2. When I tried it, the results were actually worse. A more complex tuning was needed to see actual benefits, and I ended up writing a brute-force comparison of lower vs upper cutoff for second through eleventh calls to g(). The best result I found was an average of ~1.75, resulting from the 1st, 2nd, 5th and 8th calls to g() seeking low (i.e. setting
low = cutoff
).这是一个基于中心极限定理的解决方案,最初是我的一个朋友提出的:
Here is a solution based on central limit theorem, originally due to a friend of mine:
由于 f() 的每次返回代表 TRUE 的概率为 3/4,因此通过一些代数我们可以适当地平衡概率。我们想要的是另一个函数 x(),它返回 TRUE 的平衡概率,因此
50% 的时间返回 true。
因此,让我们计算 x (p(x)) 的概率,给定 p(f) 和我们期望的总概率 (1/2):
因此 x() 应该以 2/3 的概率返回 TRUE,因为 2/3 * 3/4 = 6/12 = 1/2;
因此,以下应该适用于 g():
Since each return of f() represents a 3/4 chance of TRUE, with some algebra we can just properly balance the odds. What we want is another function x() which returns a balancing probability of TRUE, so that
returns true 50% of the time.
So let's find the probability of x (p(x)), given p(f) and our desired total probability (1/2):
So x() should return TRUE with a probability of 2/3, since 2/3 * 3/4 = 6/12 = 1/2;
Thus the following should work for g():
假设
并需要一个具有以下假设的函数
g[x]
,我相信以下
g[x]
的定义就足够了(Mathematica),或者,在C中
这是基于调用
{f[x], f[x+1]}
会产生以下结果对我们得到的每个结果求和
,其中 1 代表可能的 1/2将结果相加,任何其他总和构成另外的 1/2。
编辑。
正如 bdk 所说 - {0,0} 的可能性小于 {1,1} 因为
定义
但是,我自己也很困惑,因为给出了
f[x]
(Mathematica)或 C 中的以下 执行
f[x]
和g[x]
获得的结果似乎具有预期的分布。Assuming
and requiring a function
g[x]
with the following assumptionsI believe the following definition of
g[x]
is sufficient (Mathematica)or, alternatively in C
This is based on the idea that invocations of
{f[x], f[x+1]}
would produce the following outcomesSumming each of the outcomes we have
where a sum of 1 represents 1/2 of the possible sum outcomes, with any other sum making up the other 1/2.
Edit.
As bdk says - {0,0} is less likely than {1,1} because
However, I am confused myself because given the following definition for
f[x]
(Mathematica)or alternatively in C
then the results obtained from executing
f[x]
andg[x]
seem to have the expected distribution.这很像蒙蒂·霍尔悖论。
一般来说。
This is much like the Monty Hall paradox.
In general.