现代芯片上的浮点与整数运算性能
考虑加法模型上的维特比解码器。它花时间进行添加和比较。现在,考虑两种:一种使用 C/C++ float
作为数据类型,另一种使用 int
。在现代芯片上,您是否期望 int
的运行速度明显快于 float
?或者管道的奇迹(以及没有乘法和除法)会让一切变得均匀吗?
Consider a Viterbi decoder on an additive model. It spends its time doing additions and comparisons. Now, consider two: one with C/C++ float
as the data type, and another with int
. On modern chips, would you expect int
to run significantly faster than float
? Or will the wonders of pipelining (and the absence of multiplication and division) make it all come out about even?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
取决于您所说的“显着”是什么意思。我通常期望看到 int 的执行速度提高大约 2 倍,但这完全取决于其他情况。可以处理 AMD64 (AMD/Core2) 指令集的现代处理器通常可以在每个周期有效执行 1 次浮点运算如果它们可以保持管道馈送
它们通常还可以在同一个周期中执行 2 或 3 次整数运算时间量。甚至可以同时做这两件事。
但编写停止管道的代码并不难,您必须避免在计算完成后立即使用计算结果,否则管道将停止,并且每次乘法您会得到更多的 3 个周期而不是 1 个周期。
每个周期的指令在大多数情况下,PowerPC 与 AMD/Intel 相同或更好。
附录:
顺便说一下,您可能会发现比较(或者更确切地说比较所暗示的分支)最终比添加花费更多。错误预测的分支代价高昂,尤其是在 Pentium 4 处理器上。
Depends on what you mean by significantly. I usually expect to see ints perform about 2x faster, but it all depends on what else is going on. Modern processors that can handle the AMD64 (AMD/Core2) instruction set can usually do effectively 1 float operation per cycle if they can keep the pipeline fed
They can also usually do 2 or 3 integer operations in the same amount of time. and even can do both at once.
But it's not that hard to write code that stalls the pipeline, you have to avoid using the result of a calculation immediately after it's complete or the pipeline will stall and you get more like 3 cycles per multiply rather than 1.
The instructions per cycle for the PowerPC is the same or better than AMD/Intel in most cases.
Addendum:
By the way, you may discover that the comparisons (or rather the branches that the comparisons imply) end up costing a lot more than the additions. mis-predicted branches are expensive, especially on the Pentium 4 processor.