如何使用 random()={0..1} 生成任意范围内的数字并保持均匀性和密度?
生成 [x..y] 范围内的随机数,其中 x 和 y 是任意浮点数。使用函数 random(),它从 P 个均匀分布的数字(称为“密度”)中返回 [0..1] 范围内的随机浮点数。必须保持均匀分布,并且 P 也必须按比例缩放。
我认为,此类问题没有简单的解决方案。为了简化一点,我问你如何生成一个区间 [-0.5 .. 0.5] 中的数字,然后是 [0 .. 2] 中的数字,然后是 [-2 .. 0] 中的数字,同时保持均匀性和密度?因此,对于 [0 .. 2],它必须从 P*2 个均匀分布的数字中生成一个随机数。
显而易见的简单解决方案 random() * (x - y) + y
不会生成所有可能的数字,因为所有 abs(xy)>1.0
情况的密度较低。许多可能的值将被错过。请记住,random() 仅返回 P 个可能数字中的一个数字。然后,如果你将这个数字乘以 Q,它只会给出 P 个可能值中的一个,按 Q 缩放,但你也必须按 Q 缩放密度 P。
Generate a random number in range [x..y] where x and y are any arbitrary floating point numbers. Use function random(), which returns a random floating point number in range [0..1] from P uniformly distributed numbers (call it "density"). Uniform distribution must be preserved and P must be scaled as well.
I think, there is no easy solution for such problem. To simplify it a bit, I ask you how to generate a number in interval [-0.5 .. 0.5], then in [0 .. 2], then in [-2 .. 0], preserving uniformness and density? Thus, for [0 .. 2] it must generate a random number from P*2 uniformly distributed numbers.
The obvious simple solution random() * (x - y) + y
will generate not all possible numbers because of the lower density for all abs(x-y)>1.0
cases. Many possible values will be missed. Remember, that random() returns only a number from P possible numbers. Then, if you multiply such number by Q, it will give you only one of P possible values, scaled by Q, but you have to scale density P by Q as well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
如果我很好地理解你的问题,我会给你一个解决方案:但我会从范围中排除 1, 。
If I understand you problem well, I will provide you a solution: but I would exclude 1, from the range.
如果您确实想生成给定范围内具有统一数字密度的所有可能的浮点数,则需要考虑浮点格式。对于二进制指数的每个可能值,您都有不同的代码数字密度。直接生成方法需要显式处理这个问题,而间接生成方法仍然需要考虑它。我将开发一种直接方法;为了简单起见,以下内容专门指 IEEE 754 单精度(32-位)浮点数。
最困难的情况是任何包含零的区间。在这种情况下,为了产生完全均匀的分布,您需要将每个指数处理到最低值,再加上非标准化数字。作为一种特殊情况,您需要将零分为两种情况:+0 和 -0。
此外,如果您如此密切关注结果,则需要确保您使用的是良好的伪随机数生成器,并且具有足够大的状态空间,以便您可以期望它以接近均匀的概率命中每个值。这会取消 C/Unix
rand()
以及可能的*rand48()
库函数的资格;你应该使用类似 Mersenne Twister 的东西。关键是将目标区间分解为子区间,每个子区间都由不同的二进制指数和符号组合覆盖:在每个子区间内,浮点代码均匀分布。
第一步是选择适当的子区间,其概率与其大小成正比。如果间隔包含 0,或者以其他方式覆盖较大的动态范围,则这可能需要大量随机位,直至可用指数的整个范围。
特别是,对于 32 位 IEEE-754 数,有 256 个可能的指数值。每个指数控制的范围是下一个较大指数大小的一半,但非标准化情况除外,其大小与最小正常指数区域相同。零可以被认为是最小的非规范化数;如上所述,如果目标区间跨过零,+0 和 -0 的概率也许应该减半,以避免其权重加倍。
如果选择的子区间覆盖由特定指数控制的整个区域,则所需要做的就是用随机位(23 位,对于 32 位 IEEE-754 浮点数)填充尾数。但是,如果子区间未覆盖整个区域,则您将需要生成仅覆盖该子区间的随机尾数。
处理初始和辅助随机步骤的最简单方法可能是将目标区间舍入以包括部分覆盖的所有指数区域的整体,然后拒绝并重试落在其之外的数字。这允许以简单的 2 次方概率生成指数(例如,通过计算随机比特流中前导零的数量),并提供一种简单而准确的方法来生成仅覆盖部分数字的尾数。指数区间。 (这也是处理 +/-0 特殊情况的好方法。)
作为另一种特殊情况:为了避免生成比它们所在的指数区域小得多的目标区间的低效生成,“明显简单”的解决方案将在事实上,对于这样的间隔,生成相当统一的数字。如果您想要完全均匀的分布,则可以通过仅使用足够的随机位来覆盖该子间隔来生成子间隔尾数,同时仍然使用上述拒绝方法来消除目标间隔之外的值。
If you really want to generate all possible floating point numbers in a given range with uniform numeric density, you need to take into account the floating point format. For each possible value of your binary exponent, you have a different numeric density of codes. A direct generation method will need to deal with this explicitly, and an indirect generation method will still need to take it into account. I will develop a direct method; for the sake of simplicity, the following refers exclusively to IEEE 754 single-precision (32-bit) floating point numbers.
The most difficult case is any interval that includes zero. In that case, to produce an exactly even distribution, you will need to handle every exponent down to the lowest, plus denormalized numbers. As a special case, you will need to split zero into two cases, +0 and -0.
In addition, if you are paying such close attention to the result, you will need to make sure that you are using a good pseudorandom number generator with a large enough state space that you can expect it to hit every value with near-uniform probability. This disqualifies the C/Unix
rand()
and possibly the*rand48()
library functions; you should use something like the Mersenne Twister instead.The key is to dissect the target interval into subintervals, each of which is covered by different combination of binary exponent and sign: within each subinterval, floating point codes are uniformly distributed.
The first step is to select the appropriate subinterval, with probability proportional to its size. If the interval contains 0, or otherwise covers a large dynamic range, this may potentially require a number of random bits up to the full range of the available exponent.
In particular, for a 32-bit IEEE-754 number, there are 256 possible exponent values. Each exponent governs a range which is half the size of the next greater exponent, except for the denormalized case, which is the same size as the smallest normal exponent region. Zero can be considered the smallest denormalized number; as mentioned above, if the target interval straddles zero, the probability of each of +0 and -0 should perhaps be cut in half, to avoid doubling its weight.
If the subinterval chosen covers the entire region governed by a particular exponent, all that is necessary is to fill the mantissa with random bits (23 bits, for 32-bit IEEE-754 floats). However, if the subinterval does not cover the entire region, you will need to generate a random mantissa that covers only that subinterval.
The simplest way to handle both the initial and secondary random steps may be to round the target interval out to include the entirety of all exponent regions partially covered, then reject and retry numbers that fall outside it. This allows the exponent to be generated with simple power-of-2 probabilities (e.g., by counting the number of leading zeroes in your random bitstream), as well as providing a simple and accurate way of generating a mantissa that covers only part of an exponent interval. (This is also a good way of handling the +/-0 special case.)
As another special case: to avoid inefficient generation for target intervals which are much smaller than the exponent regions they reside in, the "obvious simple" solution will in fact generate fairly uniform numbers for such intervals. If you want exactly uniform distributions, you can generate the sub-interval mantissa by using only enough random bits to cover that sub-interval, while still using the aforementioned rejection method to eliminate values outside the target interval.
好吧,
[0..1] * 2 == [0..2]
(仍然统一)[0..1] - 0.5 == [-0.5..0.5]< /code> 等等,
不知道你在哪里经历过这样的面试?
更新:好吧,如果我们想开始关心乘法的精度损失(这很奇怪,因为不知何故你在原始任务中并不关心这一点,并假装我们关心“值的数量” ,我们可以开始迭代,为了做到这一点,我们还需要一个函数,它将返回
[0..1)
中均匀分布的随机值 - 这可以通过删除来完成。 1.0
值会出现。之后,我们可以将整个范围分成足够小的等份,以便不关心精度的损失,随机选择一个(我们有足够的随机性来做到这一点),并使用 [0..1) 函数在这个桶中选择一个数字除最后一个之外的所有部分。或者,您可以想出一种方法来编码足够多的值来关心,并且只需为此代码生成随机位,在这种情况下,您并不真正关心它是 [0..1] 还是只是 {0, 1} 。
well,
[0..1] * 2 == [0..2]
(still uniform)[0..1] - 0.5 == [-0.5..0.5]
etc.I wonder where have you experienced such an interview?
Update: well, if we want to start caring about losing precision on multiplication (which is weird, because somehow you did not care about that in the original task, and pretend we care about "number of values", we can start iterating. In order to do that, we need one more function, which would return uniformly distributed random values in
[0..1)
— which can be done by dropping the1.0
value would it ever appear. After that, we can slice the whole range in equal parts small enough to not care about losing precision, choose one randomly (we have enough randomness to do that), and choose a number in this bucket using [0..1) function for all parts but the last one.Or, you can come up with a way to code enough values to care about—and just generate random bits for this code, in which case you don't really care whether it's [0..1] or just {0, 1}.
让我重新表述一下您的问题:
让
random()
成为一个在[0,1)
上具有离散均匀分布的随机数生成器。令D
为random()
返回的可能值的数量,每个值都比前一个精确地1/D
大。创建一个在[L, U)
上具有离散均匀分布的随机数生成器rand(L, U)
,使得每个可能值精确为1/D< /code> 比之前的大。
——
一些简短的笔记。
也就是说,如果 N = 1,我们无能为力。
0.0
成为random()
的可能值之一。如果不是,那么当 U - LU - L
U - L
U - L
U - L
U - L
U - L
U - L
U - L
U - L
1/D
。我并不特别担心这个案子。最后,好东西。这里的关键见解是,可以通过独立选择结果的整体和小数部分来维持密度。
首先,请注意,给定
random()
,创建randomBit()
很简单。也就是说,如果我们想均匀地随机选择
{0, 1, 2, ..., 2^N - 1}
之一,使用randomBit() 就很简单
,只需生成每个位即可。将此称为random2(N)
。使用
random2()
我们可以选择{0, 1, 2, ..., N - 1}
之一:现在,如果
D
已知,那么问题就很微不足道了,因为我们可以将其简化为简单地随机均匀地选择floor((U - L) * D)
值之一,我们可以使用randomInt( )
。因此,我们假设
D
未知。现在,我们首先创建一个函数,以适当的密度生成[0, 2^N)
范围内的随机值。这很简单。rand2D()
是我们要求random()
的连续可能值之间的差异精确为1/D
的地方。如果不是,这里可能的值将不会具有均匀的密度。接下来,我们需要一个函数来选择
[0, V)
范围内具有适当密度的值。这与上面的randomInt()
类似。最后...
如果 L / D 不是整数,我们现在可能已经偏移了离散位置,但这并不重要。
--
最后一点,您可能已经注意到其中一些函数可能永远不会终止。这本质上是一个要求。例如,
random()
可能只有一位随机性。如果我要求您从三个值之一中进行选择,则您不能使用保证终止的函数统一随机地执行此操作。Let me rephrase your question:
Let
random()
be a random number generator with a discrete uniform distribution over[0,1)
. LetD
be the number of possible values returned byrandom()
, each of which is precisely1/D
greater than the previous. Create a random number generatorrand(L, U)
with a discrete uniform distribution over[L, U)
such that each possible value is precisely1/D
greater than the previous.--
A couple quick notes.
is, if N = 1 there is nothing we can do.
0.0
be one of the possible values forrandom()
. If it is not, then it is possible that the solution below will fail whenU - L < 1 / D
. I'm not particularly worried about that case.Finally, the good stuff. The key insight here is that the density can be maintained by independently selecting the whole and fractional parts of the result.
First, note that given
random()
it is trivial to createrandomBit()
. That is,Then, if we want to select one of
{0, 1, 2, ..., 2^N - 1}
uniformly at random, that is simple usingrandomBit()
, just generate each of the bits. Call thisrandom2(N)
.Using
random2()
we can select one of{0, 1, 2, ..., N - 1}
:Now, if
D
is known, then the problem is trivial as we can reduce it to simply choosing one offloor((U - L) * D)
values uniformly at random and we can do that withrandomInt()
.So, let's assume that
D
is not known. Now, let's first make a function to generate random values in the range[0, 2^N)
with the proper density. This is simple.rand2D()
is where we require that the difference between consecutive possible values forrandom()
be precisely1/D
. If not, the possible values here would not have uniform density.Next, we need a function that selects a value in the range
[0, V)
with the proper density. This is similar torandomInt()
above.And finally...
We now may have offset the discrete positions if
L / D
is not an integer, but that is unimportant.--
A last note, you may have noticed that several of these functions may never terminate. That is essentially a requirement. For example,
random()
may have only a single bit of randomness. If I then ask you to select from one of three values, you cannot do so uniformly at random with a function that is guaranteed to terminate.考虑这种方法:
我假设基本随机数生成器在
[0..1]
范围内生成数字
0, 1/(p-1), 2/(p-1), ..., (p-2)/(p-1), (p-1)/(p- 1)
如果目标区间长度小于等于1,
返回
随机()*(yx) + x
。否则,将每个数字
r
从基本RNG映射到目标范围:
[r*(p-1)*(yx)/p, (r+1/(p-1))*(p-1)*(yx)/p]
(即为每个 P 数字分配一个长度为
(yx)/p
的 P 间隔),然后在该间隔中递归生成另一个随机数,
将其添加到间隔开始。
伪代码:
Consider this approach:
I'm assuming the base random number generator in the range
[0..1]
generates among the numbers
0, 1/(p-1), 2/(p-1), ..., (p-2)/(p-1), (p-1)/(p-1)
If the target interval length is less than or equal to 1,
return
random()*(y-x) + x
.Else, map each number
r
from the base RNG to an interval in thetarget range:
[r*(p-1)*(y-x)/p, (r+1/(p-1))*(p-1)*(y-x)/p]
(i.e. for each of the P numbers assign one of P intervals with length
(y-x)/p
)Then recursively generate another random number in that interval and
add it to the interval begin.
Pseudocode:
在真正的数学中:解决方案只是提供的:
问题是,即使你有浮点数,也只有一定的分辨率。因此,您可以做的是应用上面的函数并添加另一个缩放到缺失部分的 random() 值。
如果我举一个实际的例子,我的意思就很清楚了:
例如,以 2 位精度从 0..1 获取 random() 返回值,即 0.XY,下限为 100,上限为 1100。
因此,通过上述算法,您可以得到结果 0.XY * (1100-100) + 100 = XY0.0 + 100。
你永远不会看到结果 201,因为最后的数字必须是 0。
这里的解决方案是再次生成一个随机值并将其添加 *10,这样你的精度就是一位数(这里你必须注意不要超出你给定的范围,这可能会发生,在这种情况下你必须丢弃结果并生成一个新的数字)。
也许您必须重复一次,频率取决于 random() 函数提供的位置数量以及您对最终结果的期望。
在标准 IEEE 格式中,精度有限(即双 53 位)。因此,当您以这种方式生成一个数字时,您永远不需要生成多个附加数字。
但你必须小心,当你添加新的数字时,不要超过给定的上限。有多种解决方案:首先,如果超出限制,则从新开始,生成新数字(不要切断或类似,因为这会改变分布)。
第二种可能性是检查丢失的较低位范围的间隔大小,并且
找到中间值,并生成一个适当的值,以保证结果适合。
In real math: the solution is just the provided:
The problem is that, even when you have floating point numbers, only have a certain resolution. So what you can do is apply above function and add another random() value scaled to the missing part.
If I make a practical example it becomes clear what I mean:
E.g. take random() return value from 0..1 with 2 digits accuracy, ie 0.XY, and lower with 100 and upper with 1100.
So with above algorithm you get as result 0.XY * (1100-100) + 100 = XY0.0 + 100.
You will never see 201 as result, as the final digit has to be 0.
Solution here would be to generate again a random value and add it *10, so you have accuracy of one digit (here you have to take care that you dont exceed your given range, which can happen, in this case you have to discard the result and generate a new number).
Maybe you have to repeat it, how often depends on how many places the random() function delivers and how much you expect in your final result.
In a standard IEEE format has a limited precision (i.e. double 53 bits). So when you generate a number this way, you never need to generate more than one additional number.
But you have to be careful that when you add the new number, you dont exceed your given upper limit. There are multiple solutions to it: First if you exceed your limit, you start from new, generating a new number (dont cut off or similar, as this changes the distribution).
Second possibility is to check the the intervall size of the missing lower bit range, and
find the middle value, and generate an appropiate value, that guarantees that the result will fit.
您必须考虑每次调用 RNG 时产生的熵量。下面是我刚刚编写的一些 C# 代码,演示了如何从低熵源累积熵并最终得到高熵随机值。
由于我使用的是 512 位哈希函数,因此这是您可以从 EntropyAccumulator 中获取的最大熵量。如果有必要的话,这个问题是可以解决的。
You have to consider the amount of entropy that comes from each call to your RNG. Here is some C# code I just wrote that demonstrates how you can accumulate entropy from low-entropy source(s) and end up with a high-entropy random value.
Since I'm using a 512-bit hash function, that is the max amount of entropy that you can get out of the EntropyAccumulator. This could be fixed, if necessarily.
如果我正确理解你的问题,那就是 rand() 生成间隔精细但最终离散的随机数。如果我们将其乘以较大的 (yx),则会以丢失 [x,y] 范围内的许多浮点值的方式分散这些精细间隔的浮点值。这样可以吗?
如果是这样,我想我们已经有了辩证法已经给出的解决方案。让我解释一下为什么他是对的。
首先,我们知道如何生成一个随机浮点数,然后向其中添加另一个浮点值。这可能会因加法而产生舍入误差,但只会出现在小数点最后一位。如果您想要更高的精度,请使用双精度数或具有更精细数值分辨率的东西。因此,考虑到这一点,问题并不比在 [0,yx] 范围内找到具有均匀密度的随机浮点数更难。假设 yx = z。显然,由于 z 是浮点数,因此它可能不是整数。我们分两步处理这个问题:首先生成小数点左边的随机数字,然后生成小数点右边的随机数字。均匀分布意味着它们的总和也均匀分布在 [0,z] 范围内。令 w 为 <= z 的最大整数。为了回答我们的简化问题,我们可以首先从 {0,1,...,w} 范围中选择一个随机整数。然后,步骤#2是将单位间隔中的随机浮点添加到该随机数。它不会与任何可能较大的值相乘,因此它具有与数字类型一样精细的分辨率。 (假设您使用的是理想的随机浮点数生成器。)
那么,随机整数是最大的(即 w)并且我们添加到其中的随机浮点数大于 z - w 的极端情况又如何呢?随机数超出允许的最大值?答案很简单:再次执行所有操作并检查新结果。重复此操作,直到获得允许范围内的数字。这是一个简单的证明,即均匀生成的随机数如果超出允许的范围,则被丢弃并再次生成,结果是在允许的范围内均匀生成的随机数。一旦您进行了这一关键观察,您就会发现辩证法满足您的所有标准。
If I understand your problem correctly, it's that rand() generates finely spaced but ultimately discrete random numbers. And if we multiply it by (y-x) which is large, this spreads these finely spaced floating point values out in a way that is missing many of the floating point values in the range [x,y]. Is that all right?
If so, I think we have a solution already given by Dialecticus. Let me explain why he is right.
First, we know how to generate a random float and then add another floating point value to it. This may produce a round off error due to addition, but it will be in the last decimal place only. Use doubles or something with finer numerical resolution if you want better precision. So, with that caveat, the problem is no harder than finding a random float in the range [0,y-x] with uniform density. Let's say y-x = z. Obviously, since z is a floating point it may not be an integer. We handle the problem in two steps: first we generate the random digits to the left of the decimal point and then generate the random digits to the right of it. Doing both uniformly means their sum is uniformly distributed across the range [0,z] too. Let w be the largest integer <= z. To answer our simplified problem, we can first pick a random integer from the range {0,1,...,w}. Then, step #2 is to add a random float from the unit interval to this random number. This isn't multiplied by any possibly large values, so it has as fine a resolution as the numerical type can have. (Assuming you're using an ideal random floating point number generator.)
So what about the corner case where the random integer was the largest one (i.e. w) and the random float we added to it was larger than z - w so that the random number exceeds the allowed maximum? The answer is simple: do all of it again and check the new result. Repeat until you get a digit in the allowed range. It's an easy proof that a uniformly generated random number which is tossed out and generated again if it's outside an allowed range results in a uniformly generated random in the allowed range. Once you make this key observation, you see that Dialecticus met all your criteria.
当您使用 random() 生成随机数时,您会得到一个介于 0 和 1 之间的浮点数,其精度(或密度,随您而定)未知。
当您将其与数字 (NUM) 相乘时,您会失去此精度,即 lg(NUM)(基于 10 的对数)。因此,如果乘以 1000 (NUM=1000),则会丢失最后 3 位数字 (lg(1000) = 3)。
您可以通过向原始数字添加一个较小的随机数(其中缺少 3 位数字)来纠正此问题。但你不知道精度,所以你无法确定它们到底在哪里。
我可以想象两种情况:
(X =范围开始,Y =范围结束)
1:您定义精度(PREC,例如20位数字,因此PREC = 20),并认为它足以生成随机数,因此表达式将是:
数字:(X = 500,Y = 1500,PREC = 20)
这有一些问题:
2:通过随机数猜测精度,
您定义一些尝试(例如4)通过生成随机数来计算精度并每次计算精度:
这是我的想法。
When you generate a random number with random(), you get a floating point number between 0 and 1 having an unknown precision (or density, you name it).
And when you multiply it with a number (NUM), you lose this precision, by lg(NUM) (10-based logarithm). So if you multiply by 1000 (NUM=1000), you lose the last 3 digits (lg(1000) = 3).
You may correct this by adding a smaller random number to the original, which has this missing 3 digits. But you don't know the precision, so you can't determine where are they exactly.
I can imagine two scenarios:
(X = range start, Y = range end)
1: you define the precision (PREC, eg. 20 digits, so PREC=20), and consider it enough to generate a random number, so the expression will be:
with numbers: (X = 500, Y = 1500, PREC = 20)
There are some problems with this:
2: guess the precision by random numbers
you define some tries (eg. 4) to calculate the precision by generating random numbers and count the precision every time:
That's my idea.