使用 C:如何确定浮点数各分量的大小?

发布于 2024-07-08 02:27:10 字数 145 浏览 16 评论 0原文

我正在寻找有关如何以独立于体系结构的方式查找浮点数的大小(以位为单位)和范围的建议。 代码可以使用不同的标志在各种平台(AIX、Linux、HPUX、VMS,也许是 Windoze)上构建 - 因此结果应该有所不同。 符号,我只看到了一位,但是如何测量指数和尾数的大小呢?

I am looking for suggestions on how to find the sizes (in bits) and range of floating point numbers in an architecture independent manner. The code could be built on various platforms (AIX, Linux, HPUX, VMS, maybe Windoze) using different flags - so results should vary. The sign, I've only seen as one bit, but how to measure the size of the exponent and mantissa?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

本王不退位尔等都是臣 2024-07-15 02:27:10

由于您正在考虑构建多个系统,我认为您可能正在考虑使用 GCC 进行编译。

关于浮点的一些好信息 - 这是几乎所有现代架构都使用的:
http://en.wikipedia.org/wiki/IEEE_754

这详细介绍了一些可以过来
http://www.network-theory.co.uk/docs/gccintro /gccintro_70.html

Since you're looking at building for a number of systems, I think you may be looking at using GCC for compilation.

Some good info on floating point - this is what almost all modern architectures use:
http://en.wikipedia.org/wiki/IEEE_754

This details some of the differences that can come up
http://www.network-theory.co.uk/docs/gccintro/gccintro_70.html

执笏见 2024-07-15 02:27:10

查看 float.h 中定义的值。 这些应该给你你需要的价值观。

Have a look at the values defined in float.h. Those should give you the values you need.

只等公子 2024-07-15 02:27:10

当您按照之前评论中建议的链接进行操作时,您可能会看到对 每个计算机科学家应该做什么的引用了解浮点运算。 无论如何,请花时间阅读本文。 当讨论浮点时,它无处不在。

As you follow the links suggested in previous comments, you'll probably see references to What Every Computer Scientist Should Know About Floating Point Arithmetic. By all means, take the time to read this paper. It pops up everywhere when floating point is discussed.

毁虫ゝ 2024-07-15 02:27:10

比较容易找出:

十进制或二进制;

myfloat a = 2.0,
        b = 0.0;

for (int i=0; i<20; i++)
  b += 0.1;

(a == b) => decimal, else binary

原因:所有二进制系统都可以表示2.0,但是任何二进制系统都会有一个
表示 0.1 的误差项。 通过累加,您可以确保该误差项不会像舍入一样消失:例如,即使在二进制系统中,1.0 == 3.0*(1.0/3.0)

尾数长度:

Myfloat a = 1.0,
        b = 1.0,
        c,
        inc = 1.0;

int mantissabits = 0;

do {
 mantissabits++;
 inc *= 0.5;   // effectively shift to the right
 c = b+inc;
} while (a != c);

您要添加递减项,直到达到尾数的容量。 它返回 24 位浮点型和 53 位双精度型,这是正确的(尾数本身仅包含 23/52 位,但由于第一位始终是标准化值,因此您有一个隐藏的额外位)。

指数长度:

Myfloat a = 1.0;
int max = 0,
    min = 0;

while (true) {
 a *= 2.0;
 if (a != NaN && a != Infinity && whatever) // depends on system
   max++;
 else
   break;
}

a = 1.0;
while (true) {
 a *= 0.5;
 if (a != 0.0) 
   min--;
 else
   break;
}

您向左或向右移动 1.0,直到到达顶部或底部。
通常exp范围是-(max+1) - max

如果 min 小于 -(max+1),则存在(如浮点数和双精度数)次正规值。
通常正值和负值是对称的(可能有一个偏移),但您可以通过添加负值来调整测试。

Its relatively easy to find out:

Decimal or binary;

myfloat a = 2.0,
        b = 0.0;

for (int i=0; i<20; i++)
  b += 0.1;

(a == b) => decimal, else binary

Reason: All binary systems can represent 2.0, but any binary system will have an
error term for representing 0.1. By accumulating you can be sure that this error term will not vanish like in rounding: e.g. 1.0 == 3.0*(1.0/3.0) even in binary systems

Mantissa length:

Myfloat a = 1.0,
        b = 1.0,
        c,
        inc = 1.0;

int mantissabits = 0;

do {
 mantissabits++;
 inc *= 0.5;   // effectively shift to the right
 c = b+inc;
} while (a != c);

You are adding decreasing terms until you reach the capacity of the mantissa. It gives back 24 bits for float and 53 bits for double which is correct (The mantissa itself contains only 23/52 bits, but as the first bit is always one on normalized values, you have a hidden extra bit).

Exponent length:

Myfloat a = 1.0;
int max = 0,
    min = 0;

while (true) {
 a *= 2.0;
 if (a != NaN && a != Infinity && whatever) // depends on system
   max++;
 else
   break;
}

a = 1.0;
while (true) {
 a *= 0.5;
 if (a != 0.0) 
   min--;
 else
   break;
}

You are shifting 1.0 to the left or to the right until you hit the top or the bottom.
Normally the exp range is -(max+1) - max.

If min is smaller than -(max+1), you have (as floats and doubles have) subnormals.
Normally positive and negative values are symmetric (with perhaps one offset), but you can adjust the test by adding negative values.

昔梦 2024-07-15 02:27:10

用于存储浮点数中每个字段的位数不会改变。

                      Sign      Exponent    Fraction    Bias
Single Precision    1 [31]     8 [30-23]      23 [22-00]         127
Double Precision    1 [63]    11 [62-52]      52 [51-00]        1023

编辑:正如乔纳森在评论中指出的那样,我遗漏了 long double 类型。 我将把它的位分解作为读者的练习。 :)

The number of bits used to store each field in a floating point number doesn't change.

                      Sign      Exponent    Fraction    Bias
Single Precision    1 [31]     8 [30-23]      23 [22-00]         127
Double Precision    1 [63]    11 [62-52]      52 [51-00]        1023

EDIT: As Jonathan pointed out in the comments, I left out the long double type. I'll leave its bit decomposition as an exercise for the reader. :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文