64 位机器/操作系统上的 32 位操作与 64 位操作

发布于 2024-08-28 02:11:40 字数 63 浏览 4 评论 0原文

哪个操作,即 32 位操作或 64 位操作(如屏蔽 32 位标志或 64 位标志),在 64 位机器上会更便宜?

Which operation i.e a 32 bit operation or a 64 bit operation (like masking a 32 bit flag or a 64 bit flag), would be cheaper on a 64 bit machine?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

記憶穿過時間隧道 2024-09-04 02:11:40

由于您没有指定架构,我只能建议一个一般答案,因为它取决于操作和所讨论的处理器架构。一旦 CPU 寄存器中有数据,大多数操作通常会花费相同的时间,无论该值最初是 32 位还是 64 位。

然而,在某些架构上,数据进入寄存器的方式可能存在一些差异。在某些情况下,“本机”值可能比某些硬件上的较小值更快:

获取数据

  • 获取“本机大小”值可能更快比获取更小的值。也就是说,无论如何,处理器可能需要获取 64 位,然后屏蔽/移出其中的 32 位以“加载”32 位值。处理 64 位值时不需要这种掩码/移位,因此它可能会加载得更快。 (这违背了直观的想法,即两倍大的东西可能需要两倍的时间来加载)。

  • 或者,如果总线可以处理半角读取,则可以在加载 64 位值的同时加载 32 位。

  • 更令人困惑的是,CPU 缓存也可以改变结果。通常,当您从内存中读取一个值时,会将多个内存位置的“行”读入高速缓存,以便可以从快速高速缓存内存中提供后续读取,而不需要从 RAM 中进行完整读取。在这种情况下,如果您按顺序访问多个值,则使用 32 位值会更快,因为将缓存两倍的值,从而减少缓存未命中的情况。

计算

  • 处理器硬件针对处理 64 位值进行了优化,因此使用 32 位计算值可能会带来更多麻烦,从而降低运行速度。例如,它可能能够“本机”处理双精度(64 位)值,但必须先将浮点(32 位)值转换为双精度,然后才能处理它,然后将结果转换回浮点。< /p>

  • 或者,可能存在通过 CPU 的 32 位和 64 位路径,或者 CPU 可能能够以不影响指令整体执行时间的方式执行所需的任何转换,在这种情况下,它们可以以相同的速度计算。

  • 这可能会影响复杂运算(浮点),但不太可能成为简单运算(AND、OR 等)的问题

As you don;t specify an architecture, I can suggest only a general answer, as it depends on the operation and on the processor architecture in question. Once you have the data in a CPU register, then most operations will usually take the same amount of time regardless of whether the value was originally 32 or 64 bit.

However, there can be some differences on some architectures in how the data gets into a register. Here are some situations where a "native" value may be faster than a smaller value on some hardware:

Fetching data

  • Fetching a "native sized" value may be faster than fetching a smaller value. That is, the processor may need to fetch 64 bits regardless, and then mask/shift off 32 bits of it to "load" a 32-bit value. This masking/shifting is not required when working on a 64 bit value, so it can possibly be loaded faster. (This goes against the intuitive idea that something twice as big might take twice as long to load).

  • Alternatively, if the bus can handle half-width fetches, then 32 bits may be loaded in the same time as a 64 bit value.

  • To confuse matters more, the CPU caches can change results as well. Usually when you read one value from memory, a "line" of several memory locations are read into the cache, so that subsequent reads can be supplied from fast cache memory instead of requiring a full fetch from RAM. In which case using 32 bit values will work out faster if you are accessing many values in sequence, as twice as many of them will be cached, resulting in fewer cache misses.

Computation

  • the processor hardware is optimised for dealing with 64-bit values, so calculating values using 32 bits may cause it more trouble, and thus could slow things down. e.g. It might be able to process a double (64-bit) value "natively" but have to convert a float (32-bit) value into a double before it can process it, then convert the result back to a float afterwards.

  • Alternatively, there may be 32-bit and 64-bit paths through the CPU, or the CPU may be able to do any conversions required in a way that does not affect the overall execution time of the instruction, in which case they may be calculated at the same speed.

  • This may affect complex operations (floating point) but is unlikely to be a problem with simple ops (AND, OR, etc)

此岸叶落 2024-09-04 02:11:40

一般来说,64 位操作或 32 位操作具有相同的成本。 32 位操作最终可能需要额外的指令,具体取决于编译器是否需要确保 64 位寄存器的高 32 位被清除(或符号扩展),但该操作通常成本很少。

指令编码可能存在一些差异,这可能会使一个指令编码比另一个指令编码占用更多空间,但这(以及优势所在)将取决于许多因素。

Generally speaking a 64 bit operation or a 32 bit operation would have the same cost. The 32-bit operation might end up taking an extra instruction depending on if the compiler needed to ensure that the upper 32-bits of a 64-bit register was cleared (or sign-extended), but that operation generally has little cost.

There might be some difference in instruction encoding that might make one take more space than the other, but that (and which way the advantage would lie) would depend on a number of factors.

活雷疯 2024-09-04 02:11:40

这取决于——屏蔽标志通常会使用 AND 指令,一旦数据进入寄存器,该指令就会快速执行(~1 个周期)。从内存加载 64 位数据通常比加载 32 位数据慢 - 但如果您使用超过 32 个标志,则无论如何都必须加载超过 32 位数据,并在其中处理屏蔽与使用两到三个指令相比,循环将提高速度。这些是否会对整体速度产生影响通常取决于周围的指令——例如,如果数据已经在缓存中,则可能不需要从内存加载它。

换句话说,很难进行概括——你只需要查看特定的代码序列(不仅仅是一条指令,而是整个序列)来表达任何内容——而该序列的结果可能对另一个序列没有多大意义最初看起来几乎相同的序列。

It depends -- masking a flag will normally use an AND instruction, which will execute quickly (~1 cycle) once the data is in a register. Loading 64 bits of data from memory will generally be slower than loading 32 bits of data -- but if you're using more than 32 flags, you'll have to load more than 32 bits of data anyway, and handling the masking in one cycle will improve speed over doing it in two or three instructions. Whether any of this makes a difference to overall speed will generally depend on surrounding instructions -- for example, if the data is already in the cache anyway, you may not need to load it from memory.

In other words, it's difficult to make generalizations -- you just about have to look at a specific code sequence (not just one instruction, but a whole sequence) to say anything -- and the result for that sequence may not mean much about another sequence that initially looks almost identical.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文