为什么 double 没有类似 C4738 的警告?
Visual C++ 可能发出 C4738 警告 :
将 32 位浮点结果存储在内存中,可能会降低性能
当 32 位浮点将存储在内存中而不是存储在寄存器中时,可能会导致性能损失。
描述进一步表明使用 double 可以解决该问题。我不明白为什么后者是正确的。
为什么在内存中存储 float
会导致性能损失,而存储 double
却不会?
Visual C++ can emit C4738 warning:
storing 32-bit float result in memory, possible loss of performance
for cases when a 32-bit float
is about to be stored in memory instead of being stored in a register.
The description further says using double
resolves the issue. I don't get why the latter is true.
Why is storing float
in memory result in performance loss and storing double
does not?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
该警告结合了两个问题:
使用 double 解决了第二个问题(至少部分地,64 位仍然不如 80 位精确),但对可能的性能损失没有影响。这就是为什么警告描述提到了两种补救措施:
The warning combines two issues:
Using double resolves the second issue (at least partially, 64 bits are still less precise than 80 bits), but has no impact on the possible performance loss. Which is why the warning decription mentions TWO remedies:
虽然我不能 100% 确定原因,但这是我的猜测。
当未启用 x86 和 SSE2 上的编译时,编译器必须对所有浮点寄存器使用 x87 FP 堆栈。在 MSVC 上,FP 模式默认设置为 53 位精度舍入。 (我想。我对此不是 100% 确定。)
因此,在 FP 堆栈上完成的所有操作都是双精度的。
但是,当将某些内容强制转换为浮点型时,精度需要四舍五入为单精度。执行此操作的唯一方法是通过 4 字节内存操作数上的 fstp 指令将其存储到内存中 - 并重新加载它。
让我们看一下 C4738 警告页面上的示例< /a> 您链接到:
当您调用
func()
时,d
可能存储在 x87 寄存器中。但是,对func()
的调用需要将精度降低到单精度。这将导致d
被舍入/存储到内存中。然后在f = f1 + f2;
行上重新加载并重新提升为双精度。但是,如果您完全使用 double ,编译器可以将 d 保留在寄存器中 - 从而绕过往返内存的开销。
至于为什么它会让你用完寄存器......我不知道。程序的语义可能会导致具有相同值的双精度和单精度值 - 在这种情况下,需要额外的寄存器。
While I'm not 100% sure of the cause, here's my guess.
When compiling on x86 and SSE2 is not enabled, the compiler must use the x87 FP stack for all floating-point registers. On MSVC, the FP-mode, by default, set to the 53-bit precision rounding. (I think. I'm not 100% sure on this.)
Therefore, all operations done on the FP-stack is at double-precision.
However, when something is cast down to a
float
, the precision needs to be rounded to single-precision. The only way to do this is to store it to memory via thefstp
instruction over a 4-byte memory operand - and reload it.Let's look at the example on the C4738 warning page you linked to:
When you call
func()
,d
is probably stored in an x87 register. However, the call tofunc()
requires that the precision be lowered to single-precision. This will caused
to be rounded/stored to memory. Then reloaded and re-promoted to double-precision on the linef = f1 + f2;
.However, if you use
double
the whole way, the compiler can keepd
in register - thus bypassing the overhead of going to and from memory.As for why it could make you run out of registers... I have no idea. It's possible that the semantics of the program may result in having both double-precision and single-precision values of the same value - which, in this case, require an extra register.