为什么 java.util.Random 使用掩码?

发布于 2024-10-31 12:18:15 字数 437 浏览 5 评论 0原文

简化(即,排除并发性)Random.next(int bits) 看起来像是

protected int next(int bits) {
    seed = (seed * multiplier + addend) & mask;
    return (int) (seed >>> (48 - bits));
}

使用掩码将种子减少到 48 位。为什么它比仅仅更好

protected int next(int bits) {
    seed = seed * multiplier + addend;
    return (int) (seed >>> (64 - bits));
}

?我读过很多关于随机数的文章,但没有发现这样做的原因。

Simplified (i.e., leaving concurrency out) Random.next(int bits) looks like

protected int next(int bits) {
    seed = (seed * multiplier + addend) & mask;
    return (int) (seed >>> (48 - bits));
}

where masking gets used to reduce the seed to 48 bits. Why is it better than just

protected int next(int bits) {
    seed = seed * multiplier + addend;
    return (int) (seed >>> (64 - bits));
}

? I've read quite a lot about random numbers, but see no reason for this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

撕心裂肺的伤痛 2024-11-07 12:18:15

原因是较低的位往往具有较低的周期(至少对于 Java 使用的算法而言)

来自 Wikipedia -线性同余发生器

如上所示,LCG 并不总是使用它们生成的值中的所有位。 Java 实现每次迭代都会生成 48 位,但仅返回这些值中的 32 个最高有效位。这是因为高阶位比低阶位具有更长的周期(见下文)。使用这种技术的 LCG 比不使用这种技术的 LCG 产生更好的值。

编辑:

进一步阅读后(方便地在维基百科上),a、c 和 m 的值必须满足这些条件才能具有完整的周期:

  1. c 和 m 必须互质

  2. a-1 可被 m 的所有质因数整除

  3. 如果 m 是 4 的倍数,则 a-1 是 4 的倍数< /p>

我能清楚地看出唯一仍然满意的是 #3。 #1 和 #2 需要检查,我有一种感觉,其中一个(或两个)都失败了。

The reason is that the lower bits tend to have a lower period (at least with the algorithm Java uses)

From Wikipedia - Linear Congruential Generator:

As shown above, LCG's do not always use all of the bits in the values they produce. The Java implementation produces 48 bits with each iteration but only returns the 32 most significant bits from these values. This is because the higher-order bits have longer periods than the lower order bits (see below). LCG's that use this technique produce much better values than those that do not.

edit:

after further reading (conveniently, on Wikipedia), the values of a, c, and m must satisfy these conditions to have the full period:

  1. c and m must be relatively primes

  2. a-1 is divisible by all prime factors of m

  3. a-1 is a multiple of 4 if m is a multiple of 4

The only one that I can clearly tell is still satisfied is #3. #1 and #2 need to be checked, and I have a feeling that one (or both) of these fail.

放飞的风筝 2024-11-07 12:18:15

来自 java.util.Random 顶部的文档:

  • 《计算机编程艺术》第 2 卷第 3.2.1 节中进行了描述

  • 该算法在Donald Knuth 的 。它是一个 48 位种子、
  • 线性同余公式。

因此整个算法被设计为操作 48 位种子,而不是 64 位种子。我想你可以向 Knuth 先生提出这个问题;p

From the docs at the top of java.util.Random:

  • The algorithm is described in The Art of Computer Programming,
  • Volume 2 by Donald Knuth in Section 3.2.1. It is a 48-bit seed,
  • linear congruential formula.

So the entire algorithm is designed to operate of 48-bit seeds, not 64 bit ones. I guess you can take it up with Mr. Knuth ;p

雪化雨蝶 2024-11-07 12:18:15

来自 wikipedia (@helloworld922 发布的引文所暗示的引文):

LCG 的另一个问题是,如果 m 设置为 2 的幂,则生成序列的低位比特的周期比整个序列的周期短得多。一般来说,序列中的第 n 个最低有效数字以 b 为基数表示输出序列,其中 bk = m 对于某个整数 k,最多重复周期 bn。

此外,它继续(我的斜体):

当 m 是 2 的幂时,LCG 的低位位无论如何都不应依赖于任何程度的随机性。事实上,简单地用 2n 代替模数项就可以看出低阶位经历的周期非常短。特别是,当 m 为 2 的幂时,任何全周期 LCG 将交替产生奇数和偶数结果。

最后,原因可能是历史性的:Sun 的人们想要可靠地工作的东西,而 Knuth 公式给出了 32 个有效位。请注意 java.util.Random API 是这么说的(我的斜体):

如果使用相同的种子创建两个 Random 实例,并且对每个实例进行相同的方法调用序列,则它们将生成并返回相同的数字序列。 为了保证这个属性,为 Random 类指定了特定的算法。为了 Java 代码的绝对可移植性,Java 实现必须使用此处为 Random 类显示的所有算法。但是,Random 类的子类允许使用其他算法,只要它们遵守通用契约即可对于所有方法。

所以我们坚持将其作为参考实现。然而,这并不意味着您不能使用另一个生成器(以及 Random 子类或创建新类):

来自同一维基百科页面:

MMIX,作者:Donald Knuth m=264 a=6364136223846793005 c=1442695040888963407

有一个适合您的 64 位公式。

随机数很棘手(正如 Knuth 指出的那样),根据您的需要,如果您需要 64 位数字,您可能只需调用 java.util.Random 两次并连接这些位就可以了。如果您确实关心统计属性,请使用类似 Mersenne Twister 的东西,或者如果您关心信息泄漏/不可预测性使用java.security.SecureRandom

From wikipedia (the quote alluded to by the quote that @helloworld922 posted):

A further problem of LCGs is that the lower-order bits of the generated sequence have a far shorter period than the sequence as a whole if m is set to a power of 2. In general, the nth least significant digit in the base b representation of the output sequence, where bk = m for some integer k, repeats with at most period bn.

And furthermore, it continues (my italics):

The low-order bits of LCGs when m is a power of 2 should never be relied on for any degree of randomness whatsoever. Indeed, simply substituting 2n for the modulus term reveals that the low order bits go through very short cycles. In particular, any full-cycle LCG when m is a power of 2 will produce alternately odd and even results.

In the end, the reason is probably historical: the folks at Sun wanted something to work reliably, and the Knuth formula gave 32 significant bits. Note that the java.util.Random API says this (my italics):

If two instances of Random are created with the same seed, and the same sequence of method calls is made for each, they will generate and return identical sequences of numbers. In order to guarantee this property, particular algorithms are specified for the class Random. Java implementations must use all the algorithms shown here for the class Random, for the sake of absolute portability of Java code. However, subclasses of class Random are permitted to use other algorithms, so long as they adhere to the general contracts for all the methods.

So we're stuck with it as a reference implementation. However that doesn't mean you can't use another generator (and subclass Random or create a new class):

from the same Wikipedia page:

MMIX by Donald Knuth m=264 a=6364136223846793005 c=1442695040888963407

There's a 64-bit formula for you.

Random numbers are tricky (as Knuth notes) and depending on your needs, you might be fine with just calling java.util.Random twice and concatenating the bits if you need a 64-bit number. If you really care about the statistical properties, use something like Mersenne Twister, or if you care about information leakage / unpredictability use java.security.SecureRandom.

难得心□动 2024-11-07 12:18:15

这样做似乎没有一个充分的理由。
使用掩模是一种保守的方法,使用经过验证的设计。
忽略它很可能会产生更好的生成器,但是,在不了解数学的情况下,这是一个冒险的步骤。

掩码的另一个小优点是 8 位架构上的速度增益,因为它使用 6 个字节而不是 8 个字节。

It doesn't look like there was a good reason for doing this.
Applying the mask is an conservative approach using a proven design.
Leaving it out most probably leads to a better generator, however, without knowing the math well, it's a risky step.

Another small advantage of masking is a speed gain on 8-bit architectures, since it uses 6 bytes instead of 8.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文