64 位程序是否比 32 位版本更大更快?
我想我关注的是 x86,但我通常对从 32 位到 64 位的转变感兴趣。
从逻辑上讲,我可以看到,在某些情况下,常量和指针会更大,因此程序可能会更大。为了提高效率而在字边界上分配内存的愿望意味着分配之间有更多的空白。
我还听说 x86 上的 32 位模式在上下文切换时必须刷新其缓存,因为可能存在重叠的 4G 地址空间。
那么,64 位的真正好处是什么?
作为补充问题,128 位会更好吗?
编辑:
我刚刚编写了我的第一个 32/64 位程序。它创建 16 字节(32b 版本)或 32 字节(64b 版本)对象的链接列表/树,并向 stderr 进行大量打印 - 不是一个真正有用的程序,也不是典型的东西,但这是我的第一个。
大小:81128(32b) v 83672(64b) - 所以差别不大
速度:17s(32b) v 24s(64b) - 在 32 位操作系统 (OS-X 10.5.8) 上运行
更新:
我注意到新的混合 x32正在开发 ABI(应用程序二进制接口),它是 64b,但使用 32b 指针。对于某些测试,它比 32b 或 64b 产生更小的代码和更快的执行速度。
I suppose I am focussing on x86, but I am generally interested in the move from 32 to 64 bit.
Logically, I can see that constants and pointers, in some cases, will be larger so programs are likely to be larger. And the desire to allocate memory on word boundaries for efficiency would mean more white-space between allocations.
I have also heard that 32 bit mode on the x86 has to flush its cache when context switching due to possible overlapping 4G address spaces.
So, what are the real benefits of 64 bit?
And as a supplementary question, would 128 bit be even better?
Edit:
I have just written my first 32/64 bit program. It makes linked lists/trees of 16 byte (32b version) or 32 byte (64b version) objects and does a lot of printing to stderr - not a really useful program, and not something typical, but it is my first.
Size: 81128(32b) v 83672(64b) - so not much difference
Speed: 17s(32b) v 24s(64b) - running on 32 bit OS (OS-X 10.5.8)
Update:
I note that a new hybrid x32 ABI (Application Binary Interface) is being developed that is 64b but uses 32b pointers. For some tests it results in smaller code and faster execution than either 32b or 64b.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
与 x86 相比,我通常发现 x86-64 上的计算密集型代码的速度提高了 30%。这很可能是因为我们有 16 x 64 位通用寄存器和 16 x SSE 寄存器,而不是 8 x 32 位通用寄存器和 8 x SSE 寄存器。这是 x86-64 Linux 上的 Intel ICC 编译器 (11.1) 的结果 - 当然,其他编译器(例如 gcc)或其他操作系统(例如 Windows)的结果可能会有所不同。
I typically see a 30% speed improvement for compute-intensive code on x86-64 compared to x86. This is most likely due to the fact that we have 16 x 64 bit general purpose registers and 16 x SSE registers instead of 8 x 32 bit general purpose registers and 8 x SSE registers. This is with the Intel ICC compiler (11.1) on an x86-64 Linux - results with other compilers (e.g. gcc), or with other operating systems (e.g. Windows), may be different of course.
除非您需要访问 32b 寻址允许的更多内存,否则好处即使有,也很小。
在 64b CPU 上运行时,无论运行 32b 还是 64b 代码(使用相同的缓存和相同的总线),您都会获得相同的内存接口。
虽然 x64 架构具有更多的寄存器,可以更轻松地进行优化,但这通常会被以下事实所抵消:指针现在更大,并且使用带有指针的任何结构会导致更高的内存流量。我估计,与 32b 应用程序相比,64b 应用程序的总体内存使用量增加了 15-30% 左右。
Unless you need to access more memory that 32b addressing will allow you, the benefits will be small, if any.
When running on 64b CPU, you get the same memory interface no matter if you are running 32b or 64b code (you are using the same cache and same BUS).
While x64 architecture has a few more registers which allows easier optimizations, this is often counteracted by the fact pointers are now larger and using any structures with pointers results in a higher memory traffic. I would estimate the increase in the overall memory usage for a 64b application compared to a 32b one to be around 15-30 %.
不管有什么好处,我建议您始终根据系统的默认字大小(32 位或 64 位)编译程序,因为如果您将库编译为 32 位二进制文件并在 64 位上提供它系统中,当 64 位版本是默认可用时,您将强制任何想要链接到您的库的人以 32 位二进制形式提供其库(以及任何其他库依赖项)。这对每个人来说都是相当麻烦的。如有疑问,请提供库的两个版本。
至于 64 位的实际好处...最明显的是您可以获得更大的地址空间,因此如果 mmap 一个文件,您可以一次寻址更多文件(并将更大的文件加载到内存中)。另一个好处是,假设编译器优化得很好,许多算术运算都可以并行化(例如,将两对 32 位数字放入两个寄存器中,并在单个加法运算中执行两次加法),并且大数字计算将运行得更快。也就是说,整个 64 位与 32 位的事情根本不会帮助您解决渐近复杂性,因此如果您希望优化代码,您可能应该查看算法而不是像这样的常数因子。
编辑:
请忽略我关于并行加法的声明。这不是由普通的 add 语句执行的...我将其与某些向量化/SSE 指令混淆了。除了更大的地址空间之外,一个更准确的好处是有更多的通用寄存器,这意味着 CPU 寄存器文件中可以维护更多的局部变量,这比将变量放在程序堆栈(通常意味着进入 L1 缓存)。
Regardless of the benefits, I would suggest that you always compile your program for the system's default word size (32-bit or 64-bit), since if you compile a library as a 32-bit binary and provide it on a 64-bit system, you will force anyone who wants to link with your library to provide their library (and any other library dependencies) as a 32-bit binary, when the 64-bit version is the default available. This can be quite a nuisance for everyone. When in doubt, provide both versions of your library.
As to the practical benefits of 64-bit... the most obvious is that you get a bigger address space, so if mmap a file, you can address more of it at once (and load larger files into memory). Another benefit is that, assuming the compiler does a good job of optimizing, many of your arithmetic operations can be parallelized (for example, placing two pairs of 32-bit numbers in two registers and performing two adds in single add operation), and big number computations will run more quickly. That said, the whole 64-bit vs 32-bit thing won't help you with asymptotic complexity at all, so if you are looking to optimize your code, you should probably be looking at the algorithms rather than the constant factors like this.
EDIT:
Please disregard my statement about the parallelized addition. This is not performed by an ordinary add statement... I was confusing that with some of the vectorized/SSE instructions. A more accurate benefit, aside from the larger address space, is that there are more general purpose registers, which means more local variables can be maintained in the CPU register file, which is much faster to access, than if you place the variables in the program stack (which usually means going out to the L1 cache).
我正在编写一个名为 foolsmate 的国际象棋引擎。使用基于极小极大树搜索到深度 9(从某个位置)的最佳移动提取需要:
在
Win32
配置上:~17.0s
;切换到
x64
配置后:~10.3s
;这是41%的加速度!
I'm coding a chess engine named foolsmate. The best move extraction using a minimax-based tree search to depth 9 (from a certain position) took:
on
Win32
configuration: ~17.0s
;after switching to
x64
configuration: ~10.3s
;This is 41% of acceleration!
64位除了拥有更多的寄存器外,默认还有SSE2。这意味着您确实可以并行执行一些计算。 SSE 扩展还有其他好处。但我想主要的好处是不必检查扩展是否存在。如果是 x64,则有 SSE2 可用。 ……如果我没记错的话。
In addition to having more registers, 64-bit has SSE2 by default. This means that you can indeed perform some calculations in parallel. The SSE extensions had other goodies too. But I guess the main benefit is not having to check for the presence of the extensions. If it's x64, it has SSE2 available. ...If my memory serves me correctly.
在 x68 到 x68_64 的特定情况下,64 位程序的大小将大致相同,甚至更小,使用更多的内存,并且运行速度更快。这主要是因为 x86_64 不仅有 64 位寄存器,而且还有两倍的寄存器。 x86 没有足够的寄存器来使编译语言尽可能高效,因此 x86 代码花费大量指令和内存带宽在寄存器和内存之间来回移动数据。 x86_64 的空间要少得多,因此占用的空间更少,运行速度更快。浮点和位旋转向量指令在 x86_64 中也更加高效。
不过,一般来说,64 位代码不一定更快,而且通常更大,无论是对于运行时的代码还是内存使用而言。
In the specific case of x68 to x68_64, the 64 bit program will be about the same size, if not slightly smaller, use a bit more memory, and run faster. Mostly this is because x86_64 doesn't just have 64 bit registers, it also has twice as many. x86 does not have enough registers to make compiled languages as efficient as they could be, so x86 code spends a lot of instructions and memory bandwidth shifting data back and forth between registers and memory. x86_64 has much less of that, and so it takes a little less space and runs faster. Floating point and bit-twiddling vector instructions are also much more efficient in x86_64.
In general, though, 64 bit code is not necessarily any faster, and is usually larger, both for code and memory usage at runtime.
将应用程序迁移到 64 位的唯一理由是大型数据库或至少有 100 个并发用户的 ERP 应用程序等应用程序需要更多内存,当应用程序缓存以获得更好的性能时,2 GB 限制将很快超出。这种情况特别是在 Windows 操作系统上,其中整数和长整型仍然是 32 位(它们有新变量 _int64。只有指针是 64 位。事实上,WOW64 在 Windows x64 上进行了高度优化,因此 32 位应用程序在 64 位 Windows 上运行的代价较低我在 Windows x64 上的经验是,32 位应用程序版本的运行速度比 64 位快 10-15%,因为在前一种情况下,至少对于专有内存数据库,您可以使用指针算术来维护 b 树(数据库系统中处理器最密集的部分) . 计算密集型应用程序需要大的小数来实现 32-64 位操作系统上的 double 无法提供的结果,这些应用程序可以在本机中使用 _int64 而不是软件模拟,当然,基于大型磁盘的数据库也将显示出相对于 32 位的改进。能够使用大内存来缓存查询计划等。
Only justification for moving your application to 64 bit is need for more memory in applications like large databases or ERP applications with at least 100s of concurrent users where 2 GB limit will be exceeded fairly quickly when applications cache for better performance. This is case specially on Windows OS where integer and long is still 32 bit (they have new variable _int64. Only pointers are 64 bit. In fact WOW64 is highly optimised on Windows x64 so that 32 bit applications run with low penalty on 64 bit Windows OS. My experience on Windows x64 is 32 bit application version run 10-15% faster than 64 bit since in former case at least for proprietary memory databases you can use pointer arithmatic for maintaining b-tree (most processor intensive part of database systems). Compuatation intensive applications which require large decimals for highest accuracy not afforded by double on 32-64 bit operating system. These applications can use _int64 in natively instead of software emulation. Of course large disk based databases will also show improvement over 32 bit simply due to ability to use large memory for caching query plans and so on.
任何需要使用 CPU 的应用程序(例如转码、显示性能和媒体渲染),无论是音频还是视频,都肯定需要(此时)使用 64 位而不是 32 位,并且由于 CPU 能够处理纯粹的数据,因此可以从使用 64 位中受益。向其抛出的数据量。这与其说是地址空间的问题,不如说是处理数据的方式的问题。给定 64 位代码的 64 位处理器将会表现得更好,尤其是处理数学上困难的事情,例如代码转换和 VoIP 数据 - 事实上,任何类型的“数学”应用程序都应该受益于 64 位 CPU 和操作系统的使用。证明我错了。
Any applications that require CPU usage such as transcoding, display performance and media rendering, whether it be audio or visual, will certainly require (at this point) and benefit from using 64 bit versus 32 bit due to the CPU's ability to deal with the sheer amount of data being thrown at it. It's not so much a question of address space as it is the way the data is being dealt with. A 64 bit processor, given 64 bit code, is going to perform better, especially with mathematically difficult things like transcoding and VoIP data - in fact, any sort of 'math' applications should benefit by the usage of 64 bit CPUs and operating systems. Prove me wrong.
在我的机器上,使用 virtulDub_x64(带有 x64 h265 库)的相同 h265 编码速度几乎是 virtulDub_x32(常规 x32 h265 库)的两倍。这可能是因为 longint(64 位)数字操作(即:加法)可以在 x64 上的单个指令上完成,但在 32 位上需要两个:添加较低的部分,然后添加(带进位)较高的部分。因此,除非整数数学仅限于 32 位整数,否则大多数在 x32 下将花费更多时间。
On my machine, same h265 encode works almost twice as fast using virtulDub_x64 (with x64 h265 library) vs virtulDub_x32 (regular x32 h265 library). That's probably because longint (64bits) numbers operations (ie: add) can be done on a single instruction on x64, but on 32bit needs two: add lower part, and then add (with carry) the higher part. So unless integer maths are limited to 32bit integers, most of it will take more time under x32.