float 比 double 慢吗？ 64位程序运行速度比32位程序快吗？

发布于 2024-11-02 18:14:51 字数 371 浏览 9 评论 0原文

使用 float 类型比使用 double 类型慢吗？

我听说现代 Intel 和 AMD CPU 的双精度计算速度比浮点数计算速度快。

标准数学函数（sqrt、pow、log、sin、cos 怎么样？， ETC。）？以单精度计算它们应该要快得多，因为它需要更少的浮点运算。例如，单精度 sqrt 可以使用比双精度 sqrt 更简单的数学公式。另外，我听说标准数学函数在 64 位模式下速度更快（在 64 位操作系统上编译和运行时）。这个问题的最终答案是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱的那么颓废 2024-11-09 18:14:51

经典的x86架构使用浮点单元（FPU）来执行浮点计算。 FPU 在其内部寄存器中执行所有计算，每个寄存器都有 80 位精度。每次尝试使用 float 或 double 时，变量首先从内存加载到 FPU 的内部寄存器中。这意味着实际计算速度绝对没有差异，因为在任何情况下计算都是以 80 位精度进行的。唯一可能不同的是从内存加载值并将结果存储回内存的速度。当然，在 32 位平台上，与 float 相比，加载/存储 double 可能需要更长的时间。在 64 位平台上应该没有任何区别。

现代 x86 架构支持扩展指令集 (SSE/SSE2)，其中新指令可以执行完全相同的浮点计算，而无需涉及“旧”FPU 指令。不过，我还是不希望看到 float 和 double 的计算速度有任何差异。由于这些现代平台都是 64 位平台，因此加载/存储速度也应该是相同的。

在不同的硬件平台上，情况可能有所不同。但通常较小的浮点类型不应提供任何性能优势。较小浮点类型的主要目的是节省内存，而不是提高性能。

编辑：（针对@MSalters评论）
我上面所说的适用于基本算术运算。当谈到库函数时，答案将取决于几个实现细节。如果平台的浮点指令集包含实现给定库函数功能的指令，那么我上面所说的通常也适用于该函数（通常包括像 sin 这样的函数， cos、sqrt）。对于 FP 指令集中不立即支持其功能的其他函数，情况可能会明显不同。此类函数的 float 版本很可能比其 double 版本更有效地实现。

The classic x86 architecture uses floating-point unit (FPU) to perform floating-point calculations. The FPU performs all calculations in its internal registers, which have 80-bit precision each. Every time you attempt to work with float or double, the variable is first loaded from memory into the internal register of the FPU. This means that there is absolutely no difference in the speed of the actual calculations, since in any case the calculations are carried out with full 80-bit precision. The only thing that might be different is the speed of loading the value from memory and storing the result back to memory. Naturally, on a 32-bit platform it might take longer to load/store a double as compared to float. On a 64-bit platform there shouldn't be any difference.

Modern x86 architectures support extended instruction sets (SSE/SSE2) with new instructions that can perform the very same floating-point calculations without involving the "old" FPU instructions. However, again, I wouldn't expect to see any difference in calculation speed for float and double. And since these modern platforms are 64-bit ones, the load/store speed is supposed to be the same as well.

On a different hardware platform the situation could be different. But normally a smaller floating-point type should not provide any performance benefits. The main purpose of smaller floating-point types is to save memory, not to improve performance.

Edit: (To address @MSalters comment)
What I said above applies to fundamental arithmetical operations. When it comes to library functions, the answer will depend on several implementation details. If the platform's floating-point instruction set contains an instruction that implements the functionality of the given library function, then what I said above will normally apply to that function as well (that would normally include functions like sin, cos, sqrt). For other functions, whose functionality is not immediately supported in the FP instruction set, the situation might prove to be significantly different. It is quite possible that float versions of such functions can be implemented more efficiently than their double versions.

回复收藏 0 原文

套路撩心 2024-11-09 18:14:51

您的第一个问题已经在此处得到了回答。

您的第二个问题完全取决于您正在使用的数据的“大小”。这一切都归结为系统的低层架构以及它如何处理大值。 32 位系统中的 64 位数据需要 2 个周期才能访问 2 个寄存器。 64 位系统上的相同数据应该只需要 1 个周期来访问 1 个寄存器。

一切总是取决于你在做什么。我发现没有快速且硬性的规则，因此您需要分析当前任务并选择最适合您特定任务需求的方法。

回复收藏 0 原文

泛滥成性 2024-11-09 18:14:51

虽然在大多数系统上，对于单个值，double 的速度与 float 相同，但您是对的，诸如 sqrt、之类的计算函数单精度的 sin 等应该比将它们计算为双精度要快得多。在 C99 中，即使您的变量是 double 型，您也可以使用 sqrtf、sinf 等函数，并从中受益。

我提到的另一个问题是内存（以及同样的存储设备）带宽。如果您有数百万或数十亿个值需要处理，float 几乎肯定会比 double 快两倍，因为所有内容都将受到内存限制或 io 限制。在某些情况下，这是使用 float 作为数组或磁盘存储中的类型的一个很好的理由，但我不认为这是使用 float 的一个很好的理由您进行计算所使用的变量。

回复收藏 0 原文

疧_╮線 2024-11-09 18:14:51

根据我在 Java 中所做的一些研究和经验测量：

双精度数和浮点型的基本算术运算在 Intel 硬件上基本上执行相同，除法除外；
另一方面，在 iPhone 4 和 iPad 中使用的 Cortex-A8 上，即使是双精度数上的“基本”算术也需要大约两倍于浮点数的时间（浮点上的寄存器 FP 加法大约需要 4 纳秒，而浮点上的寄存器 FP 加法则需要大约 4 纳秒）双倍耗时约 9 纳秒）；
我已经做了一些 java.util.Math 方法的计时（三角函数等等）这可能是有趣的——原则上，其中一些在浮点数上可能会更快，因为计算浮点数的精度所需的项更少；另一方面，其中许多最终“并不像你想象的那么糟糕”；

确实，可能存在特殊情况，例如存储器带宽问题超过“原始”计算时间。

回复收藏 0 原文