当前位置：文江博客话题详情

浮点变量存储的哪个数字可以准确？

发布于 2025-02-10 02:23:20 字数 241 浏览 1 评论 0原文

就我的知识而言，浮点变量可以完全存储（例如0）。

因此，如果我有代码：

{

float var = 0;

printf（“％f”，var）;

}

我会作为输出来：

0.00000000000000000，

但我听说还有其他数字也可以使用浮点变量精确存储。

如何使用浮点变量确定是否可以精确存储一个变量？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时常饿 2025-02-17 02:23:20

简介

在IEEE-754“双重精度”格式中可以表示有限数，并且仅当它等于 m •2 ^e整数 m 和 e ，因此-2 ⁵³＆lt; m ＆lt; +2 ⁵³和-1074≤em>e ≤971。

例如，可以表示3.125，因为它等于25•2 ^-3，而25是（-2 ⁵³，+2 ⁵³）中的整数，而-3是[-1074，971]中的整数。

讨论

通常，浮点格式表示有限数字为± d ₀。 d ₋₂d _-3… d _{1- p }• b ^e其中：

±表示 +或 - ，
b 是一个为格式固定的正整数基础，通常为2或10，
p ，称为精度，是 base- b b base的数量。 /em>，
d ₀。 d _-1d _{- 2}d _-3… d _{1- p}（示例：1.2345）是浮点数表示的重要意义，每个 d _i是一个具有0≤ d _i＆lt; b 和
e 是一个整数，具有 e _min≤ e e ≤ e _max，其中 e _min和 e _max是格式的固定限制。

注意radix点“。” d ₀之后。这意味着显着性在[0， b ）中，并且在radix点后具有 p -1位。（“ [0， b ）”表示包含0但不包括 b 的半开间隔。，要么在第一个数字之前，。 sub> −2 d _-3… d _{1- p}，或在最后一个数字之后， d ₀d _-1d −2 d _-3… d _{1- p}。这些在数学上是等效的：

d ₀。 -2 d _-3… d _{1- p}•• b ^e=。 d ₀d -1 d _-2d _{_-3… d _{1- p}• b ^{e +1}= d ₀d _-1d _-2}d ₋₃ boundem> d 1-1p> p b ^{e +1- p}。

指数限制 e _min和 e _max将被调整以匹配所使用的位置。在右侧的radix点上，显着性是一个整数，这对于在分析和有关浮点数的证明中使用数字理论很方便。

某些浮点格式可能需要 d ₀为非零。现在这很少见。 d ₀的表示为正常形式，以及其中 d 的表示₀零被认为是否定化。一个非零数，可以以一种格式的符合形式表示，但太小而无法以正常格式表示为 subsormoral formaloral 。

对于IEEE-754 Binary64格式，也称为“双精度”， b 是2， p IS 53， e e _{min 是-1022， e _max是1023。}

使用整数尺度缩放，指数的最小值和最大值为-1074和971。然后，我们可以说一个当且仅当其等于某些整数M m 时，才能以这种格式表示有限的数字。和 e ，以便-2 ⁵³＆lt; m ＆lt; +2 ⁵³和-1074≤em>e ≤971。

对于单精度，binary32格式，-2 ²⁴＆lt; m ＆lt; +2 ²⁴和-149≤em>e ≤104。

这些格式还具有代表-∞， +∞和特殊NAN和特殊NAN（不是数字）值的编码。

不可能的数字

由于3⅓•2 ^p不是任何整数 p ，因此无法表示 3⅓。如果有整数 m 和 e ，以便3⅓= m •2 ^e，我们可以将每一侧乘以3 = 3• m •2 ^e，然后5 = 3• m •2 ^{e -1}。如果 e -1为负，则我们将两面都乘以2 ^{1- e}具有5•2 ^{1- E}= 3• m 。然后是5 = 3• m •2 ^{e -1}或5•2 ^{1- e}= 3• m 是一个具有整数的方程，但是右侧的倍数为3，左侧不具有 arithmetic的基本定理。

Introduction

A finite number can be represented in the IEEE-754 “double precision” format if and only if it equals M•2^e for some integers M and e such that −2⁵³ < M < +2⁵³ and −1074 ≤ e ≤ 971.

For example, 3.125 can be represented because it equals 25•2⁻³, and 25 is an integer in (−2⁵³, +2⁵³), and −3 is an integer in [−1074, 971].

Discussion

Generally, a floating-point format represents finite numbers as ±d₀.d₋₁d₋₂d₋₃…d_1−p•b^e where:

± means either + or −,
b is a positive integer base that is fixed for the format, usually 2 or 10,
p, called the precision, is the number of base-b digits in the significand,
d₀.d₋₁d₋₂d₋₃…d_1−p (example: 1.2345) is the significand of the floating-point representation, and each d_i is an integer with 0 ≤ d_i < b, and
e is an integer with e_min ≤ e ≤ e_max, where e_min and e_max are fixed limits for the format.

Note the radix point “.” after d₀. This means the significand is in [0, b) and has p−1 digits after the radix point. (“[0, b)” denotes a half-open interval that includes 0 but excludes b.) Sometimes floating-point formats are described with the radix point in different positions, either before the first digit, .d₀d₋₁d₋₂d₋₃…d_1−p, or after the last digit, d₀d₋₁d₋₂d₋₃…d_1−p. These are equivalent mathematically:

d₀.d₋₁d₋₂d₋₃…d_1−p•b^e = .d₀d₋₁d₋₂d₋₃…d_1−p•b^e+1 = d₀d₋₁d₋₂d₋₃…d_1−p.•b^e+1−p.

The exponent limits e_min and e_max would be adjusted to match the position used. With the radix-point on the right, the significand is an integer, and this is convenient for using number theory in analysis and proofs about floating-point.

Some floating-point formats may require d₀ to be non-zero. This is rare now. Representations in which d₀ is non-zero are said to be in normal form, and representations in which d₀ is zero are said be denormalized. A non-zero number that can be represented in the denormalized form of a format but is too small to be represented in its normal format is said to be subnormal.

For the IEEE-754 binary64 format, also called “double precision,” b is 2, p is 53, e_min is −1022, and e_max is 1023.

Using the integer-significand scaling, the exponent minimum and maximum are −1074 and 971. Then we can say a finite number can be represented in this format if and only if it equals M•2^e for some integers M and e such that −2⁵³ < M < +2⁵³ and −1074 ≤ e ≤ 971.

For single precision, the binary32 format, −2²⁴ < M < +2²⁴ and −149 ≤ e ≤ 104.

These formats also have encodings that represent −∞, +∞, and special NaN (Not a Number) values.

Example of an Unrepresentable Number

3⅓ cannot be represented because 3⅓•2^p is not an integer for any integer p. If there were integers M and e such that 3⅓ = M•2^e, we could multiply each side by 3 to get 10 = 3•M•2^e, and then 5 = 3•M•2^e−1. If e−1 is negative, we multiply both sides by 2^1−e to have 5•2^1−e = 3•M. Then either 5 = 3•M•2^e−1 or 5•2^1−e = 3•M is an equation having only integers, but the right side has a factor of 3 and the left side does not, which contradicts the fundamental theorem of arithmetic.