我们通常应该对浮点数使用浮点文字而不是更简单的双精度文字吗?

发布于 2024-12-08 13:35:26 字数 1076 浏览 1 评论 0原文

在 C++ 中 (或者可能只有我们的编译器 VC8 和 VC10) 3.14 是双精度文字,3.14f 是浮点文字。

现在我有一个同事说:

我们应该使用浮点文字进行浮点计算,使用双精度文字进行双精度计算,因为在计算中使用常量时,这可能会影响计算的精度。

具体来说,我认为他的意思是:

double d1, d2;
float f1, f2;
... init and stuff ...
f1 = 3.1415  * f2;
f1 = 3.1415f * f2; // any difference?
d1 = 3.1415  * d2;
d1 = 3.1415f * d2; // any difference?

或者,由我添加,甚至:

d1 = 42    * d2;
d1 = 42.0f * d2; // any difference?
d1 = 42.0  * d2; // any difference?

更一般地说,我可以看到使用 2.71828183f唯一点是确保常量 I我试图指定实际上适合浮点数(否则编译器错误/警告)。

有人可以解释一下吗?您是否指定了 f 后缀?为什么?

引用一个我认为理所当然的答案:

如果您正在使用浮点变量和双精度文字,则整个 操作将以 double 形式完成,然后转换回 float。

这可能有什么坏处吗? (除了非常非常理论上的性能影响?)

进一步编辑:如果包含技术细节(赞赏!)的答案还可以包括这些差异如何影响通用代码,那就太好了。 (是的,如果您正在处理数字,您可能希望确保您的 big-n 浮点运算尽可能高效(且正确)——但是对于被调用几次的通用代码来说这重要吗?如果代码只使用 0.0 并跳过 -- 难以维护的 -- float 后缀,是不是会更干净?)

In C++ (or maybe only our compilers VC8 and VC10) 3.14 is a double literal and 3.14f is a float literal.

Now I have a colleague that stated:

We should use float-literals for float calculations and double-literals for double calculations as this could have an impact on the precision of a calculation when constants are used in a calcualtion.

Specifically, I think he meant:

double d1, d2;
float f1, f2;
... init and stuff ...
f1 = 3.1415  * f2;
f1 = 3.1415f * f2; // any difference?
d1 = 3.1415  * d2;
d1 = 3.1415f * d2; // any difference?

Or, added by me, even:

d1 = 42    * d2;
d1 = 42.0f * d2; // any difference?
d1 = 42.0  * d2; // any difference?

More generally, the only point I can see for using 2.71828183f is to make sure that the constant I'm trying to specify will actually fit into a float (compiler error/warning otherwise).

Can someone shed some light on this? Do you specify the f postfix? Why?

To quote from an answer what I implicitly took for granted:

If you're working with a float variable and a double literal the whole
operation will be done as double and then converted back to float.

Could there possibly be any harm in this? (Other than a very, very theoretical performance impact?)

Further edit: It would be nice if answers containing technical details (appreciated!) could also include how these differences affect general purpose code. (Yes, if you're number crunching, you probably like to make sure your big-n floating point ops are as efficient (and correct) as possible -- but does it matter for general purpose code that's called a few times? Isn't it cleaner if the code just uses 0.0 and skips the -- hard to maintain! -- float suffix?)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

苏佲洛 2024-12-15 13:35:26

是的,您应该使用 f 后缀。原因包括:

  1. 性能。当你写 float foo(float x) { return x*3.14; },您强制编译器发出将 x 转换为 double 的代码,然后进行乘法,然后将结果转换回 single。如果添加 f 后缀,则这两种转换都会被消除。在许多平台上,每次转换的成本都与乘法本身一样昂贵。

  2. 性能(续)。在某些平台(例如大多数手机)上,双精度算术比单精度算术慢得多。即使忽略转换开销(在 1. 中介绍),每次强制计算以双倍计算时,都会减慢程序速度。这不仅仅是一个“理论”问题。

  3. 减少 bug 的暴露。考虑这个例子 float x = 1.2; if (x == 1.2) // Something; something 执行了吗?不,不是,因为 x 将 1.2 四舍五入为 float,但正在与双精度值 1.2 进行比较。两者并不相等。

Yes, you should use the f suffix. Reasons include:

  1. Performance. When you write float foo(float x) { return x*3.14; }, you force the compiler to emit code that converts x to double, then does the multiplication, then converts the result back to single. If you add the f suffix, then both conversions are eliminated. On many platforms, each those conversions are about as expensive as the multiplication itself.

  2. Performance (continued). There are platforms (most cellphones, for example), on which double-precision arithmetic is dramatically slower than single-precision. Even ignoring the conversion overhead (covered in 1.), every time you force a computation to be evaluated in double, you slow your program down. This is not just a "theoretical" issue.

  3. Reduce your exposure to bugs. Consider the example float x = 1.2; if (x == 1.2) // something; Is something executed? No, it is not, because x holds 1.2 rounded to a float, but is being compared to the double-precision value 1.2. The two are not equal.

人生百味 2024-12-15 13:35:26

我做了一个测试。

我编译了这段代码:

float f1(float x) { return x*3.14; }            
float f2(float x) { return x*3.14F; }   

Using gcc 4.5.1 for i686 with optimization -O2。

这是为 f1 生成的汇编代码:

pushl   %ebp
movl    %esp, %ebp
subl    $4, %esp # Allocate 4 bytes on the stack
fldl    .LC0     # Load a double-precision floating point constant
fmuls   8(%ebp)  # Multiply by parameter
fstps   -4(%ebp) # Store single-precision result on the stack
flds    -4(%ebp) # Load single-precision result from the stack
leave
ret

这是为 f2 生成的汇编代码:

pushl   %ebp
flds    .LC2          # Load a single-precision floating point constant
movl    %esp, %ebp
fmuls   8(%ebp)       # Multiply by parameter
popl    %ebp
ret

所以有趣的是,对于 f1,编译器存储了值并重新加载它只是为了确保结果被截断为单 -精确。

如果我们使用 -ffast-math 选项,那么这种差异会显着减少:

pushl   %ebp
fldl    .LC0             # Load double-precision constant
movl    %esp, %ebp
fmuls   8(%ebp)          # multiply by parameter
popl    %ebp
ret


pushl   %ebp
flds    .LC2             # Load single-precision constant
movl    %esp, %ebp
fmuls   8(%ebp)          # multiply by parameter
popl    %ebp
ret

但是加载单精度或双精度常量之间仍然存在差异。

64 位更新

的结果

这些是 x86-64 的 gcc 5.2.1 优化 -O2: f1:

cvtss2sd  %xmm0, %xmm0       # Convert arg to double precision
mulsd     .LC0(%rip), %xmm0  # Double-precision multiply
cvtsd2ss  %xmm0, %xmm0       # Convert to single-precision
ret

f2

mulss     .LC2(%rip), %xmm0  # Single-precision multiply
ret

:使用 -ffast-math,结果是相同的。

I did a test.

I compiled this code:

float f1(float x) { return x*3.14; }            
float f2(float x) { return x*3.14F; }   

Using gcc 4.5.1 for i686 with optimization -O2.

This was the assembly code generated for f1:

pushl   %ebp
movl    %esp, %ebp
subl    $4, %esp # Allocate 4 bytes on the stack
fldl    .LC0     # Load a double-precision floating point constant
fmuls   8(%ebp)  # Multiply by parameter
fstps   -4(%ebp) # Store single-precision result on the stack
flds    -4(%ebp) # Load single-precision result from the stack
leave
ret

And this is the assembly code generated for f2:

pushl   %ebp
flds    .LC2          # Load a single-precision floating point constant
movl    %esp, %ebp
fmuls   8(%ebp)       # Multiply by parameter
popl    %ebp
ret

So the interesting thing is that for f1, the compiler stored the value and re-loaded it just to make sure that the result was truncated to single-precision.

If we use the -ffast-math option, then this difference is significantly reduced:

pushl   %ebp
fldl    .LC0             # Load double-precision constant
movl    %esp, %ebp
fmuls   8(%ebp)          # multiply by parameter
popl    %ebp
ret


pushl   %ebp
flds    .LC2             # Load single-precision constant
movl    %esp, %ebp
fmuls   8(%ebp)          # multiply by parameter
popl    %ebp
ret

But there is still the difference between loading a single or double precision constant.

Update for 64-bit

These are the results with gcc 5.2.1 for x86-64 with optimization -O2:

f1:

cvtss2sd  %xmm0, %xmm0       # Convert arg to double precision
mulsd     .LC0(%rip), %xmm0  # Double-precision multiply
cvtsd2ss  %xmm0, %xmm0       # Convert to single-precision
ret

f2:

mulss     .LC2(%rip), %xmm0  # Single-precision multiply
ret

With -ffast-math, the results are the same.

挽心 2024-12-15 13:35:26

我怀疑是这样的:如果您正在使用浮点变量和双精度文字,则整个操作将作为双精度完成,然后转换回浮点。

如果您使用浮点文字,理论上讲,计算将以浮点精度完成,即使某些硬件无论如何都会将其转换为双精度来进行计算。

I suspect something like this: If you're working with a float variable and a double literal the whole operation will be done as double and then converted back to float.

If you use a float literal, notionally speaking the computation will be done at float precision even though some hardware will convert it to double anyway to do the calculation.

淡莣 2024-12-15 13:35:26

通常,我认为这不会有任何区别,但值得
指出 3.1415f3.1415(通常)不相等。在
另一方面,您通常不会在 float 中进行任何计算
无论如何,至少在通常的平台上是这样。 (double 同样快,如果
不是更快。)大约您应该看到 float 的唯一时间是当
是大型数组,即使如此,所有计算通常都会
double中完成。

Typically, I don't think it will make any difference, but it is worth
pointing out that 3.1415f and 3.1415 are (typically) not equal. On
the other hand, you don't normally do any calculations in float
anyway, at least on the usual platforms. (double is just as fast, if
not faster.) About the only time you should see float is when there
are large arrays, and even then, all of the calculations will typically
be done in double.

隐诗 2024-12-15 13:35:26

有一个区别:如果使用 double 常量并将其与 float 变量相乘,则该变量先转换为 double,以 double 进行计算,然后将结果转换为 float。虽然精度在这里并不是真正的问题,但这可能会导致混乱。

There is a difference: If you use a double constant and multiply it with a float variable, the variable is converted into double first, the calculation is done in double, and then the result is converted into float. While precision isn't really a problem here, this might lead to confusion.

安稳善良 2024-12-15 13:35:26

我个人倾向于使用 f 后缀表示法作为原则问题,并尽可能明显地表明这是一个 float 类型而不是 double 类型。

我的两分钱

I personally tend to use the f postfix notation as a matter of principles and to make it obvious as much as I can that this is a float type rather than a double.

My two cents

眼眸里的那抹悲凉 2024-12-15 13:35:26

来自C++ 标准(工作草案),第 5 节关于二元运算符

许多二元运算符期望算术或操作数
枚举类型导致转换并以类似的方式产生结果类型
方式。目的是产生一个通用类型,这也是
结果。这种模式称为通常的算术转换,
其定义如下: — 如果任一操作数是有作用域的
枚举类型(7.2),不执行任何转换;如果另一个
操作数不具有相同的类型,表达式格式错误。 —
如果其中一个操作数是 long double 类型,则另一个操作数应被转换
至长双倍。 — 否则,如果其中一个操作数为 double,则另一个
应转换为双精度。 — 否则,如果任一操作数是浮点型,
另一个应转换为浮动。

还有第 4.8 节

浮点类型的纯右值可以转换为
另一种浮点类型。如果源值可以精确地
以目标类型表示,转换结果为
那个确切的表示。如果源值位于两个相邻的值之间
目标值,转换的结果是
实现定义的这些值中的任何一个的选择。否则,
行为未定义

的结果是,您可以通过以目标类型指定的精度指定常量来避免不必要的转换,前提是您不会因此而丢失计算精度(即,您的操作数可以在目标类型的精度)。

From the C++ Standard ( Working Draft ), section 5 on binary operators

Many binary operators that expect operands of arithmetic or
enumeration type cause conversions and yield result types in a similar
way. The purpose is to yield a common type, which is also the type of
the result. This pattern is called the usual arithmetic conversions,
which are defined as follows: — If either operand is of scoped
enumeration type (7.2), no conversions are performed; if the other
operand does not have the same type, the expression is ill-formed. —
If either operand is of type long double, the other shall be converted
to long double. — Otherwise, if either operand is double, the other
shall be converted to double. — Otherwise, if either operand is float,
the other shall be converted to float.

And also section 4.8

A prvalue of floating point type can be converted to a prvalue of
another floating point type. If the source value can be exactly
represented in the destination type, the result of the conversion is
that exact representation. If the source value is between two adjacent
destination values, the result of the conversion is an
implementation-defined choice of either of those values. Otherwise, the
behavior is undefined

The upshot of this is that you can avoid unnecessary conversions by specifying your constants in the precision dictated by the destination type, provided that you will not lose precision in the calculation by doing so (ie, your operands are exactly representable in the precision of the destination type ).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文