从 unsigned long long 转换为 float,并四舍五入到最接近的偶数
我需要编写一个从 unsigned long long 舍入到 float 的函数,并且舍入应该向最接近的偶数舍入。 我不能只进行 C++ 类型转换,因为据我所知,标准没有指定舍入。 我正在考虑使用 boost::numeric,但在阅读文档后我找不到任何有用的线索。可以使用该库来完成此操作吗? 当然,如果有替代方案,我很乐意使用它。
任何帮助将不胜感激。
编辑:添加一个示例以使事情更清楚一些。 假设我想将 0xffffff7ffffffffff 转换为其浮点表示形式。 C++ 标准允许以下任一结果:
- 0x5f7fffff ~ 1.9999999*2^63
- 0x5f800000 = 2^64
现在,如果添加舍入到最接近偶数的限制,则只有第一个结果可接受。
I need to write a function that rounds from unsigned long long to float, and the rounding should be toward nearest even.
I cannot just do a C++ type-cast, since AFAIK the standard does not specify the rounding.
I was thinking of using boost::numeric, but i could not find any useful lead after reading the documentation. Can this be done using that library?
Of course, if there is an alternative, i would be glad to use it.
Any help would be much appreciated.
EDIT: Adding an example to make things a bit clearer.
Suppose i want to convert 0xffffff7fffffffff to its floating point representation. The C++ standard permits either one of:
- 0x5f7fffff ~ 1.9999999*2^63
- 0x5f800000 = 2^64
Now if you add the restriction of round to nearest even, only the first result is acceptable.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
由于源代码中有很多位无法在
float
中表示,并且您不能(显然)依赖语言的转换,因此您必须自己完成。我设计的一个方案可能对你有帮助,也可能没有帮助。基本上,
float
有 31 位来表示正数,因此我选取源数字中的 31 个最高有效位。然后我保存并掩盖所有较低的位。然后根据较低位的值,将“新”LSB 向上或向下舍入,最后使用static_cast
创建一个float
。我留下了一些你可以根据需要删除的提示。
Since you have so many bits in the source that can't be represented in the
float
and you can't (apparently) rely on the language's conversion, you'll have to do it yourself.I devised a scheme that may or may not help you. Basically, there are 31 bits to represent positive numbers in a
float
so I pick up the 31 most significant bits in the source number. Then I save off and mask away all the lower bits. Then based on the value of the lower bits I round the "new" LSB up or down and finally usestatic_cast
to create afloat
.I left in some couts that you can remove as desired.
我在 Smalltalk 中对任意精度整数 (LargeInteger) 执行了此操作,在 Squeak/Pharo/Visualworks/Gnu Smalltalk/Dolphin Smalltalk 中实现和测试,如果您可以阅读 Smalltalk 代码,甚至还可以在博客中介绍它 http://smallissimo.blogspot.fr/2011/09/clarifying-and-optimizing.html .< br>
加速算法的技巧是这样的:符合 IEEE 754 标准的 FPU 将精确舍入不精确运算的结果。因此,我们可以承受 1 次不精确的操作,并让硬件为我们正确舍入。这让我们可以轻松处理前 48 位。但我们无法承受两次不精确的操作,因此有时我们必须以不同的方式处理最低位...
希望代码有足够的记录:
奖励:此代码应该根据您的 FPU 舍入模式进行舍入,无论它是什么,因为我们隐式地使用 FPU 来执行 + 运算的舍入。
但是,请注意标准中的激进优化< C99,谁知道编译器什么时候会使用扩展精度...(除非你强制使用类似 -ffloat-store 的东西)。
如果您总是想舍入到最接近的偶数,无论当前的舍入模式如何,那么您必须在以下情况下增加高位:
编辑:
如果您坚持舍入到最近偶数平局打破,那么另一个解决方案是使用非相邻部分 (fhigh,flow) 和 (fmid) 的 Shewchuck EXPANSION-SUM,请参阅 http://www-2.cs.cmu.edu/afs/cs/project /quake/public/papers/robust-arithmetic.ps :
这使得无分支算法具有更多的操作。它可能适用于其他舍入模式,但我让您分析论文以确保......
I did this in Smalltalk for arbitrary precision integer (LargeInteger), implemented and tested in Squeak/Pharo/Visualworks/Gnu Smalltalk/Dolphin Smalltalk, and even blogged about it if you can read Smalltalk code http://smallissimo.blogspot.fr/2011/09/clarifying-and-optimizing.html .
The trick for accelerating the algorithm is this one: IEEE 754 compliant FPU will round exactly the result of an inexact operation. So we can afford 1 inexact operation and let the hardware rounds correctly for us. That let us handle easily first 48 bits. But we cannot afford two inexact operations, so we sometimes have to care of the lowest bits differently...
Hope the code is documented enough:
Bonus: this code should round according to your FPU rounding mode whatever it may be, since we implicitely used the FPU to perform rounding with + operation.
However, beware of aggressive optimizations in standards < C99, who knows when the compiler will use extended precision... (unless you force something like -ffloat-store).
If you always want to round to nearest even, whatever the current rounding mode, then you'll have to increment high bits when:
EDIT:
If you stick to round-to-nearest-even tie breaking, then another solution is to use Shewchuck EXPANSION-SUM of non adjacent parts (fhigh,flow) and (fmid) see http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps :
This makes a branch-free algorithm with a bit more ops. It may work with other rounding modes, but I let you analyze the paper to make sure...
8 字节整数和浮点格式之间的可能性很容易解释,但实现起来却不那么简单!
下一段涉及 8 字节有符号整数可以表示的内容。
1 (2^0) 和 16777215 (2^24-1) 之间的所有正整数都可以用 iEEE754 单精度(浮点)精确表示。或者,准确地说,是 2^0 到 2^24-2^0 之间的所有数字,增量为 2^0。下一个可精确表示的正整数范围是 2^1 到 2^25-2^1,增量为 2^1,依此类推,直到 2^39 到 2^63-2^39,增量为 2^39。
无符号 8 字节整数值最多可表示为 2^64-2^40,增量为 2^40。
单精度格式并没有就此停止,而是一直持续到 2^103 到 2^127-2^103 的范围(以 2^103 为增量)。
对于 4 字节整数(长整型),最高浮点范围为 2^7 到 2^31-2^7,增量为 2^7。
在 x86 架构上,浮点指令集支持的最大整数类型是 8 字节有符号整数。 2^64-1无法通过常规方式加载。
这意味着对于表示为“2^i,其中 i 是整数 >0”的给定范围增量,以位模式 0x1 到 2^i-1 结尾的所有整数将无法在该范围内以浮点形式精确表示
这意味着您所谓的向上舍入实际上取决于您正在工作的范围。如果您的范围的粒度是您想要的范围,那么尝试向上舍入 1 (2^0) 或 16 (2^4) 是没有用的。是 2^19。
如果您尝试进行以下转换,您建议执行的操作(将 2^63-1 舍入为 2^63)的另一个结果可能会导致(长整数格式)溢出:longlong_int=(long long) ((float) 2^ 63)。
看看我写的这个小程序(用 C 语言),它应该有助于说明什么是可能的,什么是不可能的。
该程序显示了可表示的整数值范围。它们之间存在重叠:例如 2^5 可以在所有范围内表示,下限为 2^b,其中 1=
What is possible between between 8-byte integers and the float format is straightforward to explain but less so to implement!
The next paragraph concerns what is representable in 8 byte signed integers.
All positive integers between 1 (2^0) and 16777215 (2^24-1) are exactly representable in iEEE754 single precision (float). Or, to be precise, all numbers between 2^0 and 2^24-2^0 in increments of 2^0. The next range of exactly representable positive integers is 2^1 to 2^25-2^1 in increments of 2^1 and so on up to 2^39 to 2^63-2^39 in increments of 2^39.
Unsigned 8-byte integer values can be expressed up to 2^64-2^40 in increments of 2^40.
The single precison format doesn't stop here but goes on all the way up to the range 2^103 to 2^127-2^103 in increments of 2^103.
For 4-byte integers (long) the highest float range is 2^7 to 2^31-2^7 in 2^7 increments.
On the x86 architecture the largest integer type supported by the floating point instruction set is the 8 byte signed integer. 2^64-1 cannot be loaded by conventional means.
This means that for a given range increment expressed as "2^i where i is an integer >0" all integers that end with the bit pattern 0x1 up to 2^i-1 will not be exactly representable within that range in a float
This means that what you call rounding upwards is actually dependent on what range you are working in. It is of no use to try to round up by 1 (2^0) or 16 (2^4) if the granularity of the range you are in is 2^19.
An additional consequence of what you propose to do (rounding 2^63-1 to 2^63) could result in an (long integer format) overflow if you attempt the following conversion: longlong_int=(long long) ((float) 2^63).
Check out this small program I wrote (in C) which should help illustrate what is possible and what isn't.
This program shows the representable integer value ranges. There is overlap beteen them: for example 2^5 is representable in all ranges with a lower boundary 2^b where 1=