16 位、32 位和 64 位 IEEE-754 系统可以表示什么范围的数字?

发布于 2024-07-19 07:28:29 字数 229 浏览 7 评论 0 原文

我对浮点数的表示方式有所了解,但恐怕还不够。

一般问题是:

对于给定的精度(就我的目的而言,以 10 为基数的精确小数位数),16 位、32 位和 64 位 IEEE-754 系统可以表示什么范围的数字?

具体来说,我只对精确到 +/-0.5(个位)或 +/- 0.0005(千分位)的 16 位和 32 位数字范围感兴趣。

I know a little bit about how floating-point numbers are represented, but not enough, I'm afraid.

The general question is:

For a given precision (for my purposes, the number of accurate decimal places in base 10), what range of numbers can be represented for 16-, 32- and 64-bit IEEE-754 systems?

Specifically, I'm only interested in the range of 16-bit and 32-bit numbers accurate to +/-0.5 (the ones place) or +/- 0.0005 (the thousandths place).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

违心° 2024-07-26 07:28:29

对于给定的 IEEE-754 浮点数 X

2^E <= abs(X) < 2^(E+1)

如果从 X 到下一个最大可表示浮点数的距离 (epsilon)是:

epsilon = 2^(E-52)    % For a 64-bit float (double precision)
epsilon = 2^(E-23)    % For a 32-bit float (single precision)
epsilon = 2^(E-10)    % For a 16-bit float (half precision)

上述方程允许我们计算以下内容:

  • 对于半精度 ...

    如果您想要 +/-0.5(或 2^-1)的精度,则数字的最大大小为 2^10。 任何大于此限制的 X 都会导致浮点数之间的距离大于 0.5。

    如果您想要 +/-0.0005(大约 2^-11)的精度,则数字的最大大小为 1。任何大于此最大限制的 X 都会导致浮点数之间的距离大于 0.0005 .

  • 对于单精度...

    如果您想要 +/-0.5(或 2^-1)的精度,则数字的最大大小为 2^23。 任何大于此限制的 X 都会导致浮点数之间的距离大于 0.5。

    如果您想要 +/-0.0005(大约 2^-11)的精度,则数字的最大大小为 2^13。 任何大于此限制的 X 都会导致浮点数之间的距离大于 0.0005。

  • 对于双精度...

    如果您想要 +/-0.5(或 2^-1)的精度,则数字的最大大小为 2^52。 任何大于此限制的 X 都会导致浮点数之间的距离大于 0.5。

    如果您想要 +/-0.0005(大约 2^-11)的精度,则数字的最大大小为 2^42。 任何大于此限制的 X 都会导致浮点数之间的距离大于 0.0005。

For a given IEEE-754 floating point number X, if

2^E <= abs(X) < 2^(E+1)

then the distance from X to the next largest representable floating point number (epsilon) is:

epsilon = 2^(E-52)    % For a 64-bit float (double precision)
epsilon = 2^(E-23)    % For a 32-bit float (single precision)
epsilon = 2^(E-10)    % For a 16-bit float (half precision)

The above equations allow us to compute the following:

  • For half precision...

    If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^10. Any X larger than this limit leads to the distance between floating point numbers greater than 0.5.

    If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 1. Any X larger than this maximum limit leads to the distance between floating point numbers greater than 0.0005.

  • For single precision...

    If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^23. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.5.

    If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^13. Any X larger than this lmit leads to the distance between floating point numbers being greater than 0.0005.

  • For double precision...

    If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^52. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.5.

    If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^42. Any X larger than this limit leads to the distance between floating point numbers being greater than 0.0005.

醉态萌生 2024-07-26 07:28:29

对于浮点整数(我将根据 IEEE 双精度给出答案),1 到 2^53 之间的每个整数都可以精确表示。 超过 2^53 时,可以精确表示的整数之间的间隔为 2 的递增幂。 例如:

  • 2^53 + 2 和 2^54 之间的每个第二个整数都可以精确表示。
  • 2^54 + 4 到 2^55 之间的每第 4 个整数都可以精确表示。
  • 2^55 + 8 到 2^56 之间的每第 8 个整数都可以精确表示。
  • 2^56 + 16 到 2^57 之间的每第 16 个整数都可以精确表示。
  • 2^57 + 32 和 2^58 之间的每个第 32 个整数都可以精确表示。
  • 2^58 + 64 到 2^59 之间的每一个第 64 个整数都可以精确表示。
  • 2^59 + 128 到 2^60 之间的每一个第 128 个整数都可以精确表示。
  • 2^60 + 256 到 2^61 之间的每一个第 256 个整数都可以精确表示。
  • 2^61 + 512 和 2^62 之间的每一个第 512 个整数都可以精确表示。


不能精确表示的整数将四舍五入到最接近的可表示整数,因此最坏情况的四舍五入是可表示整数之间间距的 ​​1/2。

For floating-point integers (I'll give my answer in terms of IEEE double-precision), every integer between 1 and 2^53 is exactly representable. Beyond 2^53, integers that are exactly representable are spaced apart by increasing powers of two. For example:

  • Every 2nd integer between 2^53 + 2 and 2^54 can be represented exactly.
  • Every 4th integer between 2^54 + 4 and 2^55 can be represented exactly.
  • Every 8th integer between 2^55 + 8 and 2^56 can be represented exactly.
  • Every 16th integer between 2^56 + 16 and 2^57 can be represented exactly.
  • Every 32nd integer between 2^57 + 32 and 2^58 can be represented exactly.
  • Every 64th integer between 2^58 + 64 and 2^59 can be represented exactly.
  • Every 128th integer between 2^59 + 128 and 2^60 can be represented exactly.
  • Every 256th integer between 2^60 + 256 and 2^61 can be represented exactly.
  • Every 512th integer between 2^61 + 512 and 2^62 can be represented exactly.
    .
    .
    .

Integers that are not exactly representable are rounded to the nearest representable integer, so the worst case rounding is 1/2 the spacing between representable integers.

丶情人眼里出诗心の 2024-07-26 07:28:29

从 Peter R 到 MSDN 参考文献的链接中精确引用的内容可能是一个很好的经验法则,但现实当然要复杂得多。

“浮点”中的“点”是二进制点而不是小数点,这一事实有悖于我们的直觉。 典型的例子是 0.1,它只需要一位十进制的精度,但根本无法用二进制精确表示。

如果您有一个周末可以打发,请查看每个计算机科学家应该了解什么浮点运算。 您可能会对精度二进制到十进制转换

The precision quoted form Peter R's link to the MSDN ref is probably a good rule of thumb, but of course reality is more complicated.

The fact that the "point" in "floating point" is a binary point and not decimal point has a way of defeating our intuitions. The classic example is 0.1, which needs a precision of only one digit in decimal but isn't representable exactly in binary at all.

If you have a weekend to kill, have a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic. You'll probably be particularly interested in the sections on Precision and Binary to Decimal Conversion.

ぃ双果 2024-07-26 07:28:29

首先,IEEE-754-2008 和 -1985 都没有 16 位浮点数; 但它是一个提议的加法,具有 5 位指数和 10 位分数。 IEE-754使用专用符号位,因此正负范围相同。 此外,分数前面有一个隐含的 1,因此您会得到一个额外的位。

如果您想要精确到个位,就像您可以表示每个整数一样,答案相当简单:指数将小数点移到分数的右端。 因此,10 位分数为 ±211

如果您想要小数点后一位,则放弃小数点前一位,因此您有 ±210

单精度有 23 位小数,因此您将拥有 ±224 整数。

小数点后需要多少位精度完全取决于您正在执行的计算以及您正在执行的位数。

  • 210 = 1,024
  • 211 = 2,048
  • 223 = 8,388,608
  • 224 = 16,777,216
  • 253< /sup> = 9,007,199,254,740,992(双精度)
  • 2113 = 10,384,593,717,069,655,257,060,992,658,440,192(四精度)

另请参阅

First off, neither IEEE-754-2008 nor -1985 have 16-bit floats; but it is a proposed addition with a 5-bit exponent and 10-bit fraction. IEE-754 uses a dedicated sign bit, so the positive and negative range is the same. Also, the fraction has an implied 1 in front, so you get an extra bit.

If you want accuracy to the ones place, as in you can represent each integer, the answer is fairly simple: The exponent shifts the decimal point to the right-end of the fraction. So, a 10-bit fraction gets you ±211.

If you want one bit after the decimal point, you give up one bit before it, so you have ±210.

Single-precision has a 23-bit fraction, so you'd have ±224 integers.

How many bits of precision you need after the decimal point depends entirely on the calculations you're doing, and how many you're doing.

  • 210 = 1,024
  • 211 = 2,048
  • 223 = 8,388,608
  • 224 = 16,777,216
  • 253 = 9,007,199,254,740,992 (double-precision)
  • 2113 = 10,384,593,717,069,655,257,060,992,658,440,192 (quad-precision)

See also

恬淡成诗 2024-07-26 07:28:29

请参阅 IEEE 754-1985

v = (-1)^sign * s^(exponent-exponent_bias) * (1 + 分数)

注意(1 + 分数)。 正如 @bendin 指出的,使用二进制浮点数,不能表达简单的十进制值,例如 0.1。 这意味着您可以通过多次执行简单加法或调用截断等操作来引入舍入错误。 如果您对任何类型的精度感兴趣,实现它的唯一方法是使用定点小数,它基本上是一个缩放的整数。

See IEEE 754-1985:

v = (-1)^sign * s^(exponent-exponent_bias) * (1 + fraction)

Note (1 + fraction). As @bendin point out, using binary floating point, you cannot express simple decimal values such as 0.1. The implication is that you can introduce rounding errors by doing simple additions many many times or calling things like truncation. If you are interested in any sort of precision whatsoever, the only way to achieve it is to use a fixed-point decimal, which basically is a scaled integer.

所有深爱都是秘密 2024-07-26 07:28:29

如果我正确理解你的问题,这取决于你的语言。
对于 C#,请查看MSDN 参考。 浮点数有 7 位精度和双 15-16 位精度。

If I understand your question correctly, it depends on your language.
For C#, check out the MSDN ref. Float has a 7 digit precision and double 15-16 digit precision.

静水深流 2024-07-26 07:28:29

我花了很长时间才弄清楚,在 Java 中使用双精度数时,我并没有损失显着的计算精度。 浮点实际上具有非常好的以相当合理的精度表示数字的能力。 我失去的精度是在将用户输入的十进制数字转换为本机支持的二进制浮点表示形式后立即丢失的。 我最近开始将所有数字转换为 BigDecimal。 BigDecimal 在代码中比浮点数或双精度数要处理的工作要多得多,因为它不是原始类型之一。 但另一方面,我将能够准确地表示用户输入的数字。

It took me quite a while to figure out that when using doubles in Java, I wasn't losing significant precision in calculations. floating point actually has a very good ability to represent numbers to quite reasonable precision. The precision I was losing was immediately upon converting decimal numbers typed by users to the binary floating point representation that is natively supported. I've recently started converting all my numbers to BigDecimal. BigDecimal is much more work to deal with in the code than floats or doubles, since it's not one of the primitive types. But on the other hand, I'll be able to exactly represent the numbers that users type in.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文