float 比 double 慢吗? 64位程序运行速度比32位程序快吗?

发布于 2024-11-02 18:14:51 字数 371 浏览 1 评论 0原文

使用 float 类型比使用 double 类型慢吗?

我听说现代 Intel 和 AMD CPU 的双精度计算速度比浮点数计算速度快。

标准数学函数(sqrtpowlogsincos 怎么样? , ETC。)?以单精度计算它们应该要快得多,因为它需要更少的浮点运算。例如,单精度 sqrt 可以使用比双精度 sqrt 更简单的数学公式。另外,我听说标准数学函数在 64 位模式下速度更快(在 64 位操作系统上编译和运行时)。这个问题的最终答案是什么?

Is using float type slower than using double type?

I heard that modern Intel and AMD CPUs can do calculations with doubles faster than with floats.

What about standard math functions (sqrt, pow, log, sin, cos, etc.)? Computing them in single-precision should be considerably faster because it should require less floating-point operations. For example, single precision sqrt can use simpler math formula than double precision sqrt. Also, I heard that standard math functions are faster in 64 bit mode (when compiled and run on 64 bit OS). What is the definitive answer on this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

爱的那么颓废 2024-11-09 18:14:51

经典的x86架构使用浮点单元(FPU)来执行浮点计算。 FPU 在其内部寄存器中执行所有计算,每个寄存器都有 80 位精度。每次尝试使用 floatdouble 时,变量首先从内存加载到 FPU 的内部寄存器中。这意味着实际计算速度绝对没有差异,因为在任何情况下计算都是以 80 位精度进行的。唯一可能不同的是从内存加载值并将结果存储回内存的速度。当然,在 32 位平台上,与 float 相比,加载/存储 double 可能需要更长的时间。在 64 位平台上应该没有任何区别。

现代 x86 架构支持扩展指令集 (SSE/SSE2),其中新指令可以执行完全相同的浮点计算,而无需涉及“旧”FPU 指令。不过,我还是不希望看到 floatdouble 的计算速度有任何差异。由于这些现代平台都是 64 位平台,因此加载/存储速度也应该是相同的。

在不同的硬件平台上,情况可能有所不同。但通常较小的浮点类型不应提供任何性能优势。较小浮点类型的主要目的是节省内存,而不是提高性能。

编辑:(针对@MSalters评论)
我上面所说的适用于基​​本算术运算。当谈到库函数时,答案将取决于几个实现细节。如果平台的浮点指令集包含实现给定库函数功能的指令,那么我上面所说的通常也适用于该函数(通常包括像 sin 这样的函数, cossqrt)。对于 FP 指令集中不立即支持其功能的其他函数,情况可能会明显不同。此类函数的 float 版本很可能比其 double 版本更有效地实现。

The classic x86 architecture uses floating-point unit (FPU) to perform floating-point calculations. The FPU performs all calculations in its internal registers, which have 80-bit precision each. Every time you attempt to work with float or double, the variable is first loaded from memory into the internal register of the FPU. This means that there is absolutely no difference in the speed of the actual calculations, since in any case the calculations are carried out with full 80-bit precision. The only thing that might be different is the speed of loading the value from memory and storing the result back to memory. Naturally, on a 32-bit platform it might take longer to load/store a double as compared to float. On a 64-bit platform there shouldn't be any difference.

Modern x86 architectures support extended instruction sets (SSE/SSE2) with new instructions that can perform the very same floating-point calculations without involving the "old" FPU instructions. However, again, I wouldn't expect to see any difference in calculation speed for float and double. And since these modern platforms are 64-bit ones, the load/store speed is supposed to be the same as well.

On a different hardware platform the situation could be different. But normally a smaller floating-point type should not provide any performance benefits. The main purpose of smaller floating-point types is to save memory, not to improve performance.

Edit: (To address @MSalters comment)
What I said above applies to fundamental arithmetical operations. When it comes to library functions, the answer will depend on several implementation details. If the platform's floating-point instruction set contains an instruction that implements the functionality of the given library function, then what I said above will normally apply to that function as well (that would normally include functions like sin, cos, sqrt). For other functions, whose functionality is not immediately supported in the FP instruction set, the situation might prove to be significantly different. It is quite possible that float versions of such functions can be implemented more efficiently than their double versions.

套路撩心 2024-11-09 18:14:51

您的第一个问题已经在此处得到了回答

您的第二个问题完全取决于您正在使用的数据的“大小”。这一切都归结为系统的低层架构以及它如何处理大值。 32 位系统中的 64 位数据需要 2 个周期才能访问 2 个寄存器。 64 位系统上的相同数据应该只需要 1 个周期来访问 1 个寄存器。

一切总是取决于你在做什么。我发现没有快速且硬性的规则,因此您需要分析当前任务并选择最适合您特定任务需求的方法。

Your first question has already been answer here on SO.

Your second question is entirely dependent on the "size" of the data you are working with. It all boils down to the low level architecture of the system and how it handles large values. 64-bits of data in a 32 bit system would require 2 cycles to access 2 registers. The same data on a 64 bit system should only take 1 cycle to access 1 register.

Everything always depends on what you're doing. I find there are no fast and hard rules so you need to analyze the current task and choose what works best for your needs for that specific task.

泛滥成性 2024-11-09 18:14:51

虽然在大多数系统上,对于单个值,double 的速度与 float 相同,但您是对的,诸如 sqrt之类的计算函数单精度的 sin 等应该比将它们计算为双精度要快得多。在 C99 中,即使您的变量是 double 型,您也可以使用 sqrtf、sinf 等函数,并从中受益。

我提到的另一个问题是内存(以及同样的存储设备)带宽。如果您有数百万或数十亿个值需要处理,float 几乎肯定会比 double 快两倍,因为所有内容都将受到内存限制或 io 限制。在某些情况下,这是使用 float 作为数组或磁盘存储中的类型的一个很好的理由,但我不认为这是使用 float 的一个很好的理由您进行计算所使用的变量。

While on most systems double will be the same speed as float for individual values, you're right that computing functions like sqrt, sin, etc. in single-precision should be a lot faster than computing them to double-precision. In C99, you can use the sqrtf, sinf, etc. functions even if your variables are double, and get the benefit.

Another issue I've seen mentioned is memory (and likewise storage device) bandwidth. If you have millions or billions of values to deal with, float will almost certainly be twice as fast as double since everything will be memory-bound or io-bound. This is a good reason to use float as the type in an array or on-disk storage in some cases, but I would not consider it a good reason to use float for the variables you do your computations with.

疧_╮線 2024-11-09 18:14:51

根据我在 Java 中所做的一些研究和经验测量:

  • 双精度数和浮点型的基本算术运算在 Intel 硬件上基本上执行相同,除法除外;
  • 另一方面,在 iPhone 4 和 iPad 中使用的 Cortex-A8 上,即使是双精度数上的“基本”算术也需要大约两倍于浮点数的时间(浮点上的寄存器 FP 加法大约需要 4 纳秒,而浮点上的寄存器 FP 加法则需要大约 4 纳秒)双倍耗时约 9 纳秒);
  • 我已经做了一些 java.util.Math 方法的计时(三角函数等等)这可能是有趣的——原则上,其中一些在浮点数上可能会更快,因为计算浮点数的精度所需的项更少;另一方面,其中许多最终“并不像你想象的那么糟糕”;

确实,可能存在特殊情况,例如存储器带宽问题超过“原始”计算时间。

From some research and empirical measurements I have made in Java:

  • basic arithmetic operations on doubles and floats essentially perform identically on Intel hardware, with the exception of division;
  • on the other hand, on the Cortex-A8 as used in the iPhone 4 and iPad, even "basic" arithmetic on doubles takes around twice as long as on floats (a register FP addition on a float taking around 4ns vs a register FP on a double taking around 9ns);
  • I've made some timings of methods on java.util.Math (trigonometrical functions etc) which may be of interest -- in principle, some of these may well be faster on floats as fewer terms would be required to calculate to the precision of a float; on the other hand, many of these end up being "not as bad as you'd think";

It is also true that there may be special circumstances in which e.g. memory bandwidth issues outweigh "raw" calculation times.

儭儭莪哋寶赑 2024-11-09 18:14:51

x86 FPU 中的“本机”内部浮点表示形式为 80 位宽。这与 float(32 位)和 double(64 位)不同。每次有值移入或移出 FPU 时,都会执行一次转换。只有一条 FPU 指令执行 sin 运算,并且它适用于内部 80 位表示。

floatdouble 的转换速度是否更快取决于许多因素,并且必须针对给定的应用程序进行测量。

The "native" internal floating point representation in the x86 FPU is 80 bits wide. This is different from both float (32 bits) and double (64 bits). Every time a value moves in or out of the FPU, a conversion is performed. There is only one FPU instruction that performs a sin operation, and it works on the internal 80 bit representation.

Whether this conversion is faster for float or for double depends on many factors, and must be measured for a given application.

沉默的熊 2024-11-09 18:14:51

这取决于处理器。如果处理器具有本机双精度指令,则仅执行双精度算术通常比给定一个浮点型、将其转换为双精度型、执行双精度算术,然后将其转换回浮点型更快。

It depends on the processor. If the processor has native double-precision instructions, it'll usually be faster to just do double-precision arithmetic than to be given a float, convert it to a double, do the double-precision arithmetic, then convert it back to a float.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文