使用概率分布对数组进行排序
数组应按其值从高到低排序。
<?php
$items = array(
1 => f(1),
2 => f(2),
3 => f(3),
4 => f(4),
5 => f(5),
);
?>
排序后,我查看第 1、2、3、4、5 项中哪一项是第一个。我一次又一次地尝试。 之后
- 5 应该是第一项 1 的五倍
- 4 应该是第一项 1 的四倍
- 3 应该是第一项 1 的三倍
- 4 应该是第一项 2 的两倍
- ...
一个想法是
<?php
function f(key) {
return key / random();
}
?>
的,1'000'000 次尝试的结果是
key | times on top | ratio with key one | expected ratio
----+--------------+--------------------+---------------
5 | 374'365 | 6.75 | 5
4 | 267'863 | 4.83 | 4
3 | 185'707 | i am so lazy ... | 3
2 | 116'618 | | 2
1 | 55'447 | 1 | 1
对我来说看起来很奇怪,但也许
- f 有一个简单的问题吗?
- 有更好的f吗?
我的实现:
<?php
abstract class Test {
private $result;
protected abstract function f($x);
protected function iteration() {
$values = array(
1 => $this->f(1),
2 => $this->f(2),
3 => $this->f(3),
4 => $this->f(4),
5 => $this->f(5),
);
arsort($values);
$top = key($values);
if (!isset($this->result[$top])) {
$this->result[$top] = 1;
} else {
$this->result[$top]++;
}
}
public function run($iterations) {
$this->result = array();
for($i = 0; $i < $iterations; $i++) {
$this->iteration();
}
arsort($this->result);
return $this->result;
}
}
class MyTest extends Test {
protected function f($x) {
return $x / rand();
}
}
$test = new MyTest();
$result = $test->run(1000 * 1000);
print_r($result);
printf("Ratio of key 5 to 1, which should be 5: %f\n", $result[5] / $result[1]);
?>
我已经尝试了十亿轮。但比率还是 6.75 - 重点是:为什么不是 5?
结果
<?php
class BetterRandomGeneratorTest extends Test {
protected function f($x) {
return $x / mt_rand();
}
}
?>
是
Array
(
[5] => 3742816
[4] => 2674352
[3] => 1861444
[2] => 1168333
[1] => 553055
)
Ratio of key 5 to 1: 6.767529
An array shall be sorted high to low by its values.
<?php
$items = array(
1 => f(1),
2 => f(2),
3 => f(3),
4 => f(4),
5 => f(5),
);
?>
After sorting I look which item 1, 2, 3, 4, 5 is the first one. I try that again and again and again.
Afterwards
- 5 should be the first item five times more than 1
- 4 should be the first item four times more than 1
- 3 should be the first item three times more than 1
- 4 should be the first item two times more than 2
- ...
One idea is
<?php
function f(key) {
return key / random();
}
?>
which, for 1'000'000 tries resulted in
key | times on top | ratio with key one | expected ratio
----+--------------+--------------------+---------------
5 | 374'365 | 6.75 | 5
4 | 267'863 | 4.83 | 4
3 | 185'707 | i am so lazy ... | 3
2 | 116'618 | | 2
1 | 55'447 | 1 | 1
Looks wierd to me, but maybe
- there is a simple problem with f?
- there is a better f?
My implementation:
<?php
abstract class Test {
private $result;
protected abstract function f($x);
protected function iteration() {
$values = array(
1 => $this->f(1),
2 => $this->f(2),
3 => $this->f(3),
4 => $this->f(4),
5 => $this->f(5),
);
arsort($values);
$top = key($values);
if (!isset($this->result[$top])) {
$this->result[$top] = 1;
} else {
$this->result[$top]++;
}
}
public function run($iterations) {
$this->result = array();
for($i = 0; $i < $iterations; $i++) {
$this->iteration();
}
arsort($this->result);
return $this->result;
}
}
class MyTest extends Test {
protected function f($x) {
return $x / rand();
}
}
$test = new MyTest();
$result = $test->run(1000 * 1000);
print_r($result);
printf("Ratio of key 5 to 1, which should be 5: %f\n", $result[5] / $result[1]);
?>
I have tried a billion rounds. But again the ratio is 6.75 - the whole point is: why isn't it five?
The results for
<?php
class BetterRandomGeneratorTest extends Test {
protected function f($x) {
return $x / mt_rand();
}
}
?>
are
Array
(
[5] => 3742816
[4] => 2674352
[3] => 1861444
[2] => 1168333
[1] => 553055
)
Ratio of key 5 to 1: 6.767529
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一个简单的 f 可以做到这一点。
这是保证有效的,因为最大值同样可能是所选的 15 个随机数中的任何一个,并且该数字出现在
f(5)
中的时间为 1/3,而对于f(1)
。至于你的
f
出了什么问题,很简单。您的解决方案具有良好的对称性,恰好 80% 的时间f(1)
f(1) < f(5)。然而,当
f(1)
大于平均值时,f(1)
往往大于f(5)
em>f(5)
小于平均值。f(2)
、f(3)
和f(4)
也是如此。然而,所有f(2), ... f(5)
同时变小的情况并不常见。这会导致相关性导致f(1)
成为最大的频率比您天真的想象的要少。反之亦然,相关性往往比您天真的想象的更倾向于f(5)
。如果您想计算每个数字出现在顶部的确切概率,那么通过积分计算准确的答案应该不会太难。这个想法是,您将概率从 0 积分到 1,如果这是
f(i)
的random()
值,则f(i)
是最大值。 (因此,例如,对于 5,您需要集成(1-x/5)(1-x/4)(1-x/3)(1-x/2)
,而对于 1,您需要集成如果random()
大于 0.2,则对函数进行积分,该函数为 0,否则为(1-2x)(1-3x)(1-4x)(1-5x)< /code>.) 表达式将是复杂,并且比率不会得出很好的答案。
Here is a simple f which will do it.
This is guaranteed to work because the max is equally likely to be any of the 15 random numbers chosen, and 1/3 of the time that number will be in
f(5)
, versus 1/15 forf(1)
.As for what was wrong with your
f
, it is quite simple. Your solution has the nice symmetry that exactly 80% of the time,f(1) < f(5)
. Howeverf(1)
tends to be bigger thanf(5)
whenf(1)
is larger than average andf(5)
is smaller than average. Ditto forf(2)
,f(3)
andf(4)
. However it is unusual for all off(2), ... f(5)
to be small at once. This causes correlations that causef(1)
to be the largest less often than you would naively think. Vice versa correlations tend to come out in favor off(5)
more often than you would naively think.If you want to compute the exact probabilities of each number coming out on top, it shouldn't be too hard to compute exact answers with integration. The idea is that you integrate from 0 to 1 the probability that, if that was the value of
random()
forf(i)
thatf(i)
is the maximum. (So, for instance, for 5 you would integrate(1-x/5)(1-x/4)(1-x/3)(1-x/2)
while for 1 you would integrate a function that is 0 ifrandom()
is bigger than 0.2, and otherwise is(1-2x)(1-3x)(1-4x)(1-5x)
.) The expressions will be complicated, and the ratios won't come out to nice answers.