使用 C:如何确定浮点数各分量的大小?
我正在寻找有关如何以独立于体系结构的方式查找浮点数的大小(以位为单位)和范围的建议。 代码可以使用不同的标志在各种平台(AIX、Linux、HPUX、VMS,也许是 Windoze)上构建 - 因此结果应该有所不同。 符号,我只看到了一位,但是如何测量指数和尾数的大小呢?
I am looking for suggestions on how to find the sizes (in bits) and range of floating point numbers in an architecture independent manner. The code could be built on various platforms (AIX, Linux, HPUX, VMS, maybe Windoze) using different flags - so results should vary. The sign, I've only seen as one bit, but how to measure the size of the exponent and mantissa?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
由于您正在考虑构建多个系统,我认为您可能正在考虑使用 GCC 进行编译。
关于浮点的一些好信息 - 这是几乎所有现代架构都使用的:
http://en.wikipedia.org/wiki/IEEE_754
这详细介绍了一些可以过来
http://www.network-theory.co.uk/docs/gccintro /gccintro_70.html
Since you're looking at building for a number of systems, I think you may be looking at using GCC for compilation.
Some good info on floating point - this is what almost all modern architectures use:
http://en.wikipedia.org/wiki/IEEE_754
This details some of the differences that can come up
http://www.network-theory.co.uk/docs/gccintro/gccintro_70.html
查看
float.h
中定义的值。 这些应该给你你需要的价值观。Have a look at the values defined in
float.h
. Those should give you the values you need.当您按照之前评论中建议的链接进行操作时,您可能会看到对 每个计算机科学家应该做什么的引用了解浮点运算。 无论如何,请花时间阅读本文。 当讨论浮点时,它无处不在。
As you follow the links suggested in previous comments, you'll probably see references to What Every Computer Scientist Should Know About Floating Point Arithmetic. By all means, take the time to read this paper. It pops up everywhere when floating point is discussed.
比较容易找出:
十进制或二进制;
原因:所有二进制系统都可以表示2.0,但是任何二进制系统都会有一个
表示 0.1 的误差项。 通过累加,您可以确保该误差项不会像舍入一样消失:例如,即使在二进制系统中,1.0 == 3.0*(1.0/3.0)
尾数长度:
您要添加递减项,直到达到尾数的容量。 它返回 24 位浮点型和 53 位双精度型,这是正确的(尾数本身仅包含 23/52 位,但由于第一位始终是标准化值,因此您有一个隐藏的额外位)。
指数长度:
您向左或向右移动 1.0,直到到达顶部或底部。
通常exp范围是
-(max+1) - max
。如果
min
小于-(max+1)
,则存在(如浮点数和双精度数)次正规值。通常正值和负值是对称的(可能有一个偏移),但您可以通过添加负值来调整测试。
Its relatively easy to find out:
Decimal or binary;
Reason: All binary systems can represent 2.0, but any binary system will have an
error term for representing 0.1. By accumulating you can be sure that this error term will not vanish like in rounding: e.g. 1.0 == 3.0*(1.0/3.0) even in binary systems
Mantissa length:
You are adding decreasing terms until you reach the capacity of the mantissa. It gives back 24 bits for float and 53 bits for double which is correct (The mantissa itself contains only 23/52 bits, but as the first bit is always one on normalized values, you have a hidden extra bit).
Exponent length:
You are shifting 1.0 to the left or to the right until you hit the top or the bottom.
Normally the exp range is
-(max+1) - max
.If
min
is smaller than-(max+1)
, you have (as floats and doubles have) subnormals.Normally positive and negative values are symmetric (with perhaps one offset), but you can adjust the test by adding negative values.
用于存储浮点数中每个字段的位数不会改变。
编辑:正如乔纳森在评论中指出的那样,我遗漏了 long double 类型。 我将把它的位分解作为读者的练习。 :)
The number of bits used to store each field in a floating point number doesn't change.
EDIT: As Jonathan pointed out in the comments, I left out the long double type. I'll leave its bit decomposition as an exercise for the reader. :)