x86-64 ISA 的 32 位指针：为什么不呢？

发布于 2025-01-04 15:30:48 字数 479 浏览 4 评论 0原文

x86-64 指令集添加了更多寄存器和其他改进，以帮助简化可执行代码。然而，在许多应用程序中，增加的指针大小是一种负担。每个指针中多余的、未使用的字节会堵塞缓存，甚至可能溢出 RAM。例如，GCC 使用 -m32 标志构建，我认为这就是原因。

可以加载 32 位值并将其视为指针。这不需要额外的指令，只需加载/计算 32 位并从结果地址加载即可。不过，由于平台具有不同的内存映射，因此该技巧无法移植。在 Mac OS X 上，保留整个低 4 GiB 地址空间。尽管如此，对于我编写的一个程序，在使用之前将 0x100000000L 添加到 32 位“地址”，比真正的 64 位地址大大提高了性能，或者使用 -m32 进行编译。

拥有 32 位 x86-64 平台有什么根本障碍吗？我认为支持这样的嵌合体会增加任何操作系统的复杂性，任何想要最后 20% 的人都应该使用 Make it Work™，但它似乎仍然最适合各种计算密集型程序。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

酒与心事 2025-01-11 15:30:48

Linux 正在开发一个名为“x32”的 ABI。它是 x86_64 和 ia32 之间的混合，类似于您所描述的 - 32 位地址空间，同时使用完整的 64 位寄存器集。它需要自定义内核、binutils 和 gcc。

一些 SPEC 运行表明，某些基准测试的性能提高了约 30%。如需了解更多信息，请访问 https://sites.google.com/site/x32abi/

回复收藏 0 原文

夜清冷一曲。 2025-01-11 15:30:48

是的，您可以限制程序仅使用前 2/4 GB 地址空间，或使用具有 32 位（或更少）偏移量的 64 位基址

As 神秘在上面评论，ICC 甚至可以自动执行此操作。它具有-auto-ilp32 / /Qauto-ilp32 选项使用 32 位指针在 64 位模式下（如果适用）：

指示编译器分析程序以确定是否存在可以安全收缩为 32 位指针的 64 位指针以及是否存在 64 位 long（在 Linux* 系统上））可以安全地缩小为 32 位 long。

但是，如果您无权访问 ICC 或者想要对输出代码生成有更多控制，那么在 Linux 上有 x32abi 正如其他人提到的

，在 Windows 上没有像 Linux 上那样的 x32abi，但您仍然可以通过禁用 <代码>/LARGEADDRESSAWARE<默认情况下为 x86-64 二进制文件启用的 /code> 标志

默认情况下，基于 64 位 Microsoft Windows 的应用程序具有数 TB 的用户模式地址空间。有关精确值，请参阅 Windows 和 Windows Server 的内存限制发布。但是，应用程序可以指定系统应为应用程序分配低于 2 GB 的所有内存。如果满足以下条件，此功能对于 64 位应用程序是有益的：
2 GB 地址空间就足够了。
该代码有许多指针截断警告。
指针和整数可以自由混合。
代码具有使用 32 位数据类型的多态性。
所有指针仍然是 64 位指针，但系统确保每次内存分配都低于 2 GB 限制，因此，如果应用程序截断指针，则不会丢失重要数据。 指针可以被截断为 32 位值，然后通过符号扩展或零扩展扩展为 64 位值。
虚拟地址空间

当然没有直接的编译器支持类似于 GCC 中的 -mx32 选项，因此每次存储指向内存的指针或取消引用它时，您可能需要手动处理指针。最简单的解决方案是编写一个包装 32 位指针的类来处理该问题。幸运的是，MS 在同一架构中混合 32 位和 64 位指针方面也有经验，因此他们有很多支持关键字/宏:

POINTER_32/__ptr32
POINTER_64/__ptr64
POINTER_SIGNED/__sptr
POINTER_UNSIGNED/__uptr

您还可以强制所有内存分配发生在 4 GB 标记以下并处理手动执行所有操作

无论如何，限制到第一个 2/4 GB 内存页面可能不会是可行的，因为缺乏内存或 ASLR。您可以告诉操作系统围绕某个 64 位基地址分配内存。这样，您就可以为大于 4GB 的地址空间提供多个基址。Google

的 V8 引擎使用它来压缩指针为32位以节省内存以及提高性能。请此处查看内存和性能改进的比较。他们甚至讨论了一个很好的优化，通过将基址设置为 FS/GS 段寄存器并释放另一个通用寄存器

或者如果您的指针始终对齐，那么您可以删除低位来寻址更大数量的内存，就像在 JVM 的“压缩”中一样。哎呀”，它总是寻址 8 字节对齐的对象

另请参阅 V8 中的压缩指针实现与 JVM 的压缩指针实现有何不同糟糕？

了解更多

Yes, you can limit the program to use the first 2/4 GB address space only, or use a 64-bit base with 32-bit (or less) offset

As Mysticial commented above, ICC can even automatically do that. It has the -auto-ilp32 / /Qauto-ilp32 option to use 32-bit pointers in 64-bit mode if applicable:

Instructs the compiler to analyze the program to determine if there are 64-bit pointers that can be safely shrunk into 32-bit pointers and if there are 64-bit longs (on Linux* systems) that can be safely shrunk into 32-bit longs.

But if you don't have access to ICC or want to have more control over the output codegen then on Linux there's x32abi as others have mentioned

On Windows there's no x32abi like on Linux, but you can still use 32-bit pointers by disabling the /LARGEADDRESSAWARE flag which is enabled for x86-64 binaries by default

By default, 64-bit Microsoft Windows-based applications have a user-mode address space of several terabytes. For precise values, see Memory Limits for Windows and Windows Server Releases. However, applications can specify that the system should allocate all memory for the application below 2 gigabytes. This feature is beneficial for 64-bit applications if the following conditions are true:
A 2 GB address space is sufficient.
The code has many pointer truncation warnings.
Pointers and integers are freely mixed.
The code has polymorphism using 32-bit data types.
All pointers are still 64-bit pointers, but the system ensures that every memory allocation occurs below the 2 GB limit, so that if the application truncates a pointer, no significant data is lost. Pointers can be truncated to 32-bit values, then extended to 64-bit values by either sign extension or zero extension.
Virtual Address Space

Of course there's no direct compiler support like the -mx32 option in GCC, therefore you may need to deal with pointers manually every time you store a pointer to memory or dereference it. The simplest solution is to write a class wrapping a 32-bit pointer to handle that. Luckily MS also had experience on mixed 32 and 64-bit pointers in the same architecture so they have lots of supporting keywords/macros:

POINTER_32/__ptr32
POINTER_64/__ptr64
POINTER_SIGNED/__sptr
POINTER_UNSIGNED/__uptr

You can also force all memory allocations to happen below the 4 GB mark and handle everything manually

Anyway, limit to the first 2/4 GB memory page might not be feasible, because of the lack of memory or the reduced effectiveness of ASLR. You can tell the OS to allocate memory around some 64-bit base address instead. This way you can have multiple bases for an address space larger than 4GB

Google's V8 engine uses this to compress pointers to 32 bits to save memory as well as improve performance. See the comparison in memory and performance improvement here. They even discuss a nice optimization by setting the base to FS/GS segment register and free another general-purpose register

Or if your pointers are always aligned then you can drop the low bits to address a larger amount of memory, like in JVM's "compressed Oops" which always address 8-byte aligned objects

回复收藏 0 原文

双马尾 2025-01-11 15:30:48

我预计在操作系统中支持这样的模型并不困难。此模型中进程唯一需要更改的是页面管理，页面分配必须低于 4 GB 点。如果内核将缓冲区传递给应用程序，那么它也应该从虚拟地址空间的前 4 GB 中分配缓冲区。这同样适用于加载和启动应用程序的加载程序。除此之外，64 位内核应该能够处理此类应用程序，无需进行重大修改。

编译器支持也不应该是一个大问题。主要是生成可以使用额外 CPU 寄存器及其完整 64 位的代码，并在需要时添加适当的 REX 前缀。

回复收藏 0 原文