[0.0, 1.0) 范围内双精度值的唯一值总数是多少?

发布于 2024-10-23 14:40:26 字数 925 浏览 10 评论 0原文

Random.NextDouble()(范围 [0.0,1.0) 中的 Double)有时会与一个大的 Int64 相乘(让 Int64 big = 9000000000L),结果会取整以获得比从 Random 获得的值更大的随机 Int64 值.Next()(范围 [0,Int32.MaxValue) 中的 Int32)。

Random r = new Random();
long big = 9000000000L;
long answer = (long) (r.NextDouble() * big);

在我看来, [0.0, 1.0) 范围内的 Double 唯一值总数提供了它可能生成的唯一 Int64 数量的上限。事实上,这是一个宽松的上限,因为许多不同的 Double 将映射到相同的 Int64。

因此,我想知道: [0.0, 1.0) 范围内双精度值的唯一值总数是多少?

如果您能告诉我“big”可以取的最大值是多少,以便“answer”可以是范围 [0,big) 中的值,以及“answer”值的分布是否均匀(假设),那就更好了Random.NextDouble() 是统一的。

编辑:这里的Double(双精度)指的是IEEE 754浮点双精度,而Int64(long)和Int32(int)分别指的是64位和32位有符号2的补码。


受到这个问题的启发:Generate 10digits unique random number in java

虽然我使用的是 C#,但这个问题与语言无关,更多的是关于离散数学而不是编程,但它困扰我主要不是因为数学好奇心,而是因为程序员只想使用一个公式,只有当它做了什么从安全角度来看,这是应该做的。

Random.NextDouble() (a Double from the range [0.0,1.0)) is sometimes multiplied with a large Int64 (let Int64 big = 9000000000L), and the result floored to obtain a random Int64 value larger than what can be obtained from Random.Next() (an Int32 from the range [0,Int32.MaxValue)).

Random r = new Random();
long big = 9000000000L;
long answer = (long) (r.NextDouble() * big);

It seems to me that the total number of unique values for a Double in the range [0.0, 1.0) provides an upper-bound for the number of unique Int64 it can possibly generate. A loose upper-bound, in fact, as many different Doubles will map to the same Int64.

Hence, I would like to know: what is the total number of unique values for a double in the range [0.0, 1.0)?

Even better if you can tell me what is the largest value "big" can take so that "answer" can be a value from the range [0,big), and whether the distribution of values of "answer" is uniform, assuming that Random.NextDouble() is uniform.

Edit: Double (double) here refers to IEEE 754 floating-point double, while Int64 (long) and Int32 (int) refer to 64-bit and 32-bit signed 2's complement respectively.


Inspired by this question: Generating 10 digits unique random number in java

While I used C#, this question is language-agnostic and is more about discrete mathematics than programming, but it bothers me not mainly from a sense of mathematical curiousity, but from that of a programmer wanting to use a formula only if it does what it is supposed to do and from a security viewpoint.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

油焖大侠 2024-10-30 14:40:26

IEEE-754 有 11 位指数和 52 位尾数。假设符号位为 0(正),如果指数范围为 0x001 到 0x3FE,则该值为 0 到 1 之间的标准浮点数。尾数以不存储的前导 1 进行解释。对于指数的每个 0x3FE 值,都有 2^52 个尾数值。此外,如果指数为 0x000,则尾数将被解释为没有该主值,但如同指数为 0x001,总共 0x3FF = 1023 个指数,其中所有尾数均有效。总共有 1023*2^52 个值。另外,负0也可以算,多了一个值。

如果从所有值均匀生成随机双精度数,那么在相乘以生成 Int64 时确实会产生偏差。然而,任何合理的随机库都会在 [0, 1) 上近似均匀分布,并且将其转换为 Int64 时不会出现偏差。允许生成 [0, big) 中的所有整数的“big”的最大值是 2^53——1/2 和 1 之间的 2^52 个数字的分辨率是 2^(-53)。然而,通常情况下,这些数字是通过将随机整数除以整数范围(通常是 Int32)来生成的,这意味着您实际上无法生成比该源更多的数字。考虑直接组合两个 Int32,例如将一位移位 32 位并将它们组合成 Int64。 (但要小心——生成器的状态空间可能只有 32 位。)

IEEE-754 has 11 bits of exponent, and 52 of mantissa. Assuming the sign bit is 0 (positive), If the exponent ranges from 0x001 to 0x3FE, the value is a standard floating point number between 0 and 1. The mantissa is interpreted with a leading 1 that is not stored. For each of these 0x3FE values for the exponent, there are 2^52 values of the mantissa. In addition, if the exponent is 0x000, the mantissa is interpreted without that leading value, but as if the exponent were 0x001, for a total of 0x3FF = 1023 exponents where all mantissas are valid. This is a total of 1023*2^52 values. In addition, negative 0 may count, which is one more value.

If random doubles were generated uniformly from all values, then this would indeed produce a bias when multiplying in order to generate an Int64. However, any reasonable random library will approximate a uniform distribution on [0, 1), and this will not be biased when turning it into an Int64. The largest value for "big" that will allow all integers in [0, big) to be produced is 2^53 -- the resolution of the 2^52 numbers between 1/2 and 1 is 2^(-53). However, it's often the case that these numbers are produced by dividing random integers by the integer range (usually Int32) meaning you can't actually produce more numbers than this source. Consider directly combining two Int32s instead, e.g. by shifting one by 32 bits and combining them into an Int64. (Though be wary -- the state space for the generator might only be 32 bits.)

烟─花易冷 2024-10-30 14:40:26

作为您问题的推论,我会告诉您,Random C# 生成器在内部使用一个生成器,该生成器“给他”0...Int32.MaxValue - 1 之间的数字。 >。然后,它将数字除以 Int32.MaxValue(从技术上讲,它乘以该数字的倒数)以返回双精度值。因此,在 C# 中,仅返回 Int32.MaxValue 可能的双精度数 (0...Int32.MaxValue - 1)

As a corollary to your question, I'll tell you that the Random C# generator uses internally a generator that "gives him" numbers between 0...Int32.MaxValue - 1. Then it divides the number by Int32.MaxValue (technically it multiplies by the inverse of that number) to return a double. So in C#, there are only Int32.MaxValue possible doubles returned (0...Int32.MaxValue - 1)

俏︾媚 2024-10-30 14:40:26

IEEE754 对双精度的精度非常清楚:

http://en.wikipedia.org/wiki /IEEE_754-2008

您有 52 位精度加上一个额外的假定位。

您的指数从 -1022 到 1023,大约 11 位,包括符号。

第 64 位是数字的总符号。

我们将忽略次标准化数字。

您询问的是 -1022 和 0 之间的指数。这意味着您有大约 10 个可用的 11 位指数可供您使用。

您有 52+1 位可用尾数。

这大约是 62 位可用精度,用于表示

在此处输入图像描述的 2**62 个不同值

The IEEE754 is pretty clear on the precision of doubles:

http://en.wikipedia.org/wiki/IEEE_754-2008

You have 52 bits of precision plus an additional assumed bit.

You have exponents from -1022 to 1023, about 11 bits, including a sign.

The 64th bit is the overall sign for the number.

We'll ignore subnormalized numbers.

You're asking about exponents between -1022 and 0. This means you have about 10 of the available 11 bits of exponent available to you.

You have 52+1 bits of mantissa available.

This is about 62 bits of usable precision to represent 2**62 distinct values from

enter image description here

能否归途做我良人 2024-10-30 14:40:26

@wnoise 几乎做到了,但这是我的两分钱。

IEEE 浮点数可以作为整数进行比较和递增,但有一些限制,请参阅这个问题 了解详细信息。因此,如果我们将 +0.0 和 1.0 转换为 64 位整数,我们将得到 0 到 1 之间的步数:

#include <iostream>

int main()
{
        double zero = 0.0;
        double one = 1.0;
        unsigned long long z = *reinterpret_cast<unsigned long long*>(&zero);
        unsigned long long o = *reinterpret_cast<unsigned long long*>(&one);
        std::cout << z << std::endl;
        std::cout << o << std::endl;
}

这分别给出 0 和 4607182418800017408,即在 [0.0, 1.0) 范围内有 4607182418800017408 个唯一的 double 值。

@wnoise pretty much nailed it, but here's my two cents.

IEEE floats can be compared and incremented as integers with some restrictions, see this question for details. So, if we cast +0.0 and 1.0 to 64 bit integers, we get the number of steps between zero and one:

#include <iostream>

int main()
{
        double zero = 0.0;
        double one = 1.0;
        unsigned long long z = *reinterpret_cast<unsigned long long*>(&zero);
        unsigned long long o = *reinterpret_cast<unsigned long long*>(&one);
        std::cout << z << std::endl;
        std::cout << o << std::endl;
}

This gives me 0 and 4607182418800017408, respectively, i.e. there are 4607182418800017408 unique double values in the range [0.0, 1.0).

鹤仙姿 2024-10-30 14:40:26

double 在 [0.0, 1.0) 范围内的唯一值总数取决于 double 在特定环境中的表示形式。

最常见的表示形式之一是 IEEE 754 指定的表示形式。例如,该格式由 JavaC#(有关后者,请参阅1.3 类型和变量)。

The total number of unique values for a double in the range [0.0, 1.0) depends on the representation of double in the particular environment.

One of the most common representation is the one specified by IEEE 754. That format is e. g. mandated by Java and C# (see 1.3 Types and variables for the latter).

梦里人 2024-10-30 14:40:26

这取决于 double 的实现。有些实现不允许非规范化值并忽略前导值;在这里确定可能值的数量很容易:

  • 有一些“特殊”值(0、+0、-0、+无穷、-无穷、静默 NaN、信号 NaN)通常会花费你一个可能的指数,
  • 这是没有办法的移动尾数并修改指数给出一个等效的数字

如果您的实现允许非规范化值,那么确定这个数字会变得有点困难,但我首先将此表示中的可能值映射到具有固定前导的等效表示(尾数会少用一位);如果您找到了适当的映射,这将是内射,并且您已将问题简化为一个更简单的。

That depends on the implementation of double.There are implementations that do not allow denormalized values and leave out the leading one; determining the number of possible values here is easy:

  • there are a few "special" values (0, +0, -0, +∞, -∞, silent NaN, signalling NaN) that typically cost you one possible exponent
  • there is no way that shifting the mantissa and modifying the exponent gives an equivalent number

If your implementation allows denormalized values, determining this number becomes a bit more difficult, but I'd start by mapping the possible values in this representation to the equivalent representation with the fixed leading one (which will use one bit less in the mantissa); if you've found an appropriate mapping, this will be injective, and you have reduced the problem to a simpler one.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文