我们通常应该对浮点数使用浮点文字而不是更简单的双精度文字吗？

发布于 2024-12-08 13:35:26 字数 1076 浏览 6 评论 0原文

在 C++ 中 _{（或者可能只有我们的编译器 VC8 和 VC10）} 3.14 是双精度文字，3.14f 是浮点文字。

现在我有一个同事说：

我们应该使用浮点文字进行浮点计算，使用双精度文字进行双精度计算，因为在计算中使用常量时，这可能会影响计算的精度。

具体来说，我认为他的意思是：

double d1, d2;
float f1, f2;
... init and stuff ...
f1 = 3.1415  * f2;
f1 = 3.1415f * f2; // any difference?
d1 = 3.1415  * d2;
d1 = 3.1415f * d2; // any difference?

或者，由我添加，甚至：

d1 = 42    * d2;
d1 = 42.0f * d2; // any difference?
d1 = 42.0  * d2; // any difference?

更一般地说，我可以看到使用 2.71828183f 的唯一点是确保常量 I我试图指定实际上适合浮点数（否则编译器错误/警告）。

有人可以解释一下吗？您是否指定了 f 后缀？为什么？

引用一个我认为理所当然的答案：

如果您正在使用浮点变量和双精度文字，则整个操作将以 double 形式完成，然后转换回 float。

这可能有什么坏处吗？（除了非常非常理论上的性能影响？）

进一步编辑：如果包含技术细节（赞赏！）的答案还可以包括这些差异如何影响通用代码，那就太好了。（是的，如果您正在处理数字，您可能希望确保您的 big-n 浮点运算尽可能高效（且正确）——但是对于被调用几次的通用代码来说这重要吗？如果代码只使用 0.0 并跳过 -- 难以维护的 -- float 后缀，是不是会更干净？）

原文

In C++ _{(or maybe only our compilers VC8 and VC10)} 3.14 is a double literal and 3.14f is a float literal.

Now I have a colleague that stated:

We should use float-literals for float calculations and double-literals for double calculations as this could have an impact on the precision of a calculation when constants are used in a calcualtion.

Specifically, I think he meant:

double d1, d2;
float f1, f2;
... init and stuff ...
f1 = 3.1415  * f2;
f1 = 3.1415f * f2; // any difference?
d1 = 3.1415  * d2;
d1 = 3.1415f * d2; // any difference?

Or, added by me, even:

d1 = 42    * d2;
d1 = 42.0f * d2; // any difference?
d1 = 42.0  * d2; // any difference?

More generally, the only point I can see for using 2.71828183f is to make sure that the constant I'm trying to specify will actually fit into a float (compiler error/warning otherwise).

Can someone shed some light on this? Do you specify the f postfix? Why?

To quote from an answer what I implicitly took for granted:

If you're working with a float variable and a double literal the whole
operation will be done as double and then converted back to float.

Could there possibly be any harm in this? (Other than a very, very theoretical performance impact?)

Further edit: It would be nice if answers containing technical details (appreciated!) could also include how these differences affect general purpose code. (Yes, if you're number crunching, you probably like to make sure your big-n floating point ops are as efficient (and correct) as possible -- but does it matter for general purpose code that's called a few times? Isn't it cleaner if the code just uses 0.0 and skips the -- hard to maintain! -- float suffix?)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

苏佲洛 2024-12-15 13:35:26

是的，您应该使用 f 后缀。原因包括：

性能。当你写 float foo(float x) { return x*3.14; }，您强制编译器发出将 x 转换为 double 的代码，然后进行乘法，然后将结果转换回 single。如果添加 f 后缀，则这两种转换都会被消除。在许多平台上，每次转换的成本都与乘法本身一样昂贵。
性能（续）。在某些平台（例如大多数手机）上，双精度算术比单精度算术慢得多。即使忽略转换开销（在 1. 中介绍），每次强制计算以双倍计算时，都会减慢程序速度。这不仅仅是一个“理论”问题。
减少 bug 的暴露。考虑这个例子 float x = 1.2; if (x == 1.2) // Something; something 执行了吗？不，不是，因为 x 将 1.2 四舍五入为 float，但正在与双精度值 1.2 进行比较。两者并不相等。

回复收藏 0 原文

人生百味 2024-12-15 13:35:26

我做了一个测试。

我编译了这段代码：

float f1(float x) { return x*3.14; }            
float f2(float x) { return x*3.14F; }

Using gcc 4.5.1 for i686 with optimization -O2。

这是为 f1 生成的汇编代码：

pushl   %ebp
movl    %esp, %ebp
subl    $4, %esp # Allocate 4 bytes on the stack
fldl    .LC0     # Load a double-precision floating point constant
fmuls   8(%ebp)  # Multiply by parameter
fstps   -4(%ebp) # Store single-precision result on the stack
flds    -4(%ebp) # Load single-precision result from the stack
leave
ret

这是为 f2 生成的汇编代码：

pushl   %ebp
flds    .LC2          # Load a single-precision floating point constant
movl    %esp, %ebp
fmuls   8(%ebp)       # Multiply by parameter
popl    %ebp
ret

所以有趣的是，对于 f1，编译器存储了值并重新加载它只是为了确保结果被截断为单 -精确。

如果我们使用 -ffast-math 选项，那么这种差异会显着减少：

pushl   %ebp
fldl    .LC0             # Load double-precision constant
movl    %esp, %ebp
fmuls   8(%ebp)          # multiply by parameter
popl    %ebp
ret


pushl   %ebp
flds    .LC2             # Load single-precision constant
movl    %esp, %ebp
fmuls   8(%ebp)          # multiply by parameter
popl    %ebp
ret

但是加载单精度或双精度常量之间仍然存在差异。

64 位更新

的结果

这些是 x86-64 的 gcc 5.2.1 优化 -O2: f1:

cvtss2sd  %xmm0, %xmm0       # Convert arg to double precision
mulsd     .LC0(%rip), %xmm0  # Double-precision multiply
cvtsd2ss  %xmm0, %xmm0       # Convert to single-precision
ret

mulss     .LC2(%rip), %xmm0  # Single-precision multiply
ret

：使用 -ffast-math，结果是相同的。

I did a test.

I compiled this code:

float f1(float x) { return x*3.14; }            
float f2(float x) { return x*3.14F; }

Using gcc 4.5.1 for i686 with optimization -O2.

This was the assembly code generated for f1:

pushl   %ebp
movl    %esp, %ebp
subl    $4, %esp # Allocate 4 bytes on the stack
fldl    .LC0     # Load a double-precision floating point constant
fmuls   8(%ebp)  # Multiply by parameter
fstps   -4(%ebp) # Store single-precision result on the stack
flds    -4(%ebp) # Load single-precision result from the stack
leave
ret

And this is the assembly code generated for f2:

pushl   %ebp
flds    .LC2          # Load a single-precision floating point constant
movl    %esp, %ebp
fmuls   8(%ebp)       # Multiply by parameter
popl    %ebp
ret

So the interesting thing is that for f1, the compiler stored the value and re-loaded it just to make sure that the result was truncated to single-precision.

If we use the -ffast-math option, then this difference is significantly reduced:

pushl   %ebp
fldl    .LC0             # Load double-precision constant
movl    %esp, %ebp
fmuls   8(%ebp)          # multiply by parameter
popl    %ebp
ret


pushl   %ebp
flds    .LC2             # Load single-precision constant
movl    %esp, %ebp
fmuls   8(%ebp)          # multiply by parameter
popl    %ebp
ret

But there is still the difference between loading a single or double precision constant.

Update for 64-bit

These are the results with gcc 5.2.1 for x86-64 with optimization -O2:

f1:

cvtss2sd  %xmm0, %xmm0       # Convert arg to double precision
mulsd     .LC0(%rip), %xmm0  # Double-precision multiply
cvtsd2ss  %xmm0, %xmm0       # Convert to single-precision
ret

f2:

mulss     .LC2(%rip), %xmm0  # Single-precision multiply
ret

With -ffast-math, the results are the same.

回复收藏 0 原文

挽心 2024-12-15 13:35:26

我怀疑是这样的：如果您正在使用浮点变量和双精度文字，则整个操作将作为双精度完成，然后转换回浮点。

如果您使用浮点文字，理论上讲，计算将以浮点精度完成，即使某些硬件无论如何都会将其转换为双精度来进行计算。

回复收藏 0 原文

淡莣 2024-12-15 13:35:26

通常，我认为这不会有任何区别，但值得
指出 3.1415f 和 3.1415（通常）不相等。在
另一方面，您通常不会在 float 中进行任何计算
无论如何，至少在通常的平台上是这样。（double 同样快，如果
不是更快。）大约您应该看到 float 的唯一时间是当
是大型数组，即使如此，所有计算通常都会
在double中完成。

回复收藏 0 原文

隐诗 2024-12-15 13:35:26

有一个区别：如果使用 double 常量并将其与 float 变量相乘，则该变量先转换为 double，以 double 进行计算，然后将结果转换为 float。虽然精度在这里并不是真正的问题，但这可能会导致混乱。

回复收藏 0 原文

安稳善良 2024-12-15 13:35:26

我个人倾向于使用 f 后缀表示法作为原则问题，并尽可能明显地表明这是一个 float 类型而不是 double 类型。

我的两分钱

回复收藏 0 原文

眼眸里的那抹悲凉 2024-12-15 13:35:26

来自C++ 标准（工作草案），第 5 节关于二元运算符

许多二元运算符期望算术或操作数
枚举类型导致转换并以类似的方式产生结果类型
方式。目的是产生一个通用类型，这也是
结果。这种模式称为通常的算术转换，
其定义如下： — 如果任一操作数是有作用域的
枚举类型（7.2），不执行任何转换；如果另一个
操作数不具有相同的类型，表达式格式错误。 —
如果其中一个操作数是 long double 类型，则另一个操作数应被转换
至长双倍。 — 否则，如果其中一个操作数为 double，则另一个
应转换为双精度。 — 否则，如果任一操作数是浮点型，
另一个应转换为浮动。

还有第 4.8 节