32 位 Intel 处理器上的内存对齐
Intel 的 32 位处理器(例如 Pentium)具有 64 位宽的数据总线,因此每次访问可获取 8 个字节。 基于此,我假设这些处理器在地址总线上发出的物理地址始终是 8 的倍数。
首先,这个结论正确吗?
其次,如果正确,则应该在 8 字节边界上对齐数据结构成员。 但我见过有人在这些处理器上使用 4 字节对齐。
他们这样做有何正当理由?
Intel's 32-bit processors such as Pentium have 64-bit wide data bus and therefore fetch 8 bytes per access. Based on this, I'm assuming that the physical addresses that these processors emit on the address bus are always multiples of 8.
Firstly, is this conclusion correct?
Secondly, if it is correct, then one should align data structure members on an 8 byte boundary. But I've seen people using a 4-byte alignment instead on these processors.
How can they be justified in doing so?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
通常的经验法则(直接来自 Intel 和 AMD 的优化手册)是每种数据类型都应按其自身的大小对齐。
int32
应在 32 位边界上对齐,int64
应在 64 位边界上对齐,依此类推。 char 适合任何地方。当然,另一个经验法则是“编译器已被告知对齐要求”。 您无需担心它,因为编译器知道添加正确的填充和偏移量以允许有效访问数据。
唯一的例外是使用 SIMD 指令时,您必须手动确保大多数编译器上的对齐。
我不明白这有什么区别。 CPU 可以简单地发出对包含这 4 个字节的 64 位块的读取。 这意味着它要么在请求的数据之前或之后获得 4 个额外字节。 但在这两种情况下,只需要一次读取。 32 位宽数据的 32 位对齐可确保它不会跨越 64 位边界。
The usual rule of thumb (straight from Intels and AMD's optimization manuals) is that every data type should be aligned by its own size. An
int32
should be aligned on a 32-bit boundary, anint64
on a 64-bit boundary, and so on. A char will fit just fine anywhere.Another rule of thumb is, of course "the compiler has been told about alignment requirements". You don't need to worry about it because the compiler knows to add the right padding and offsets to allow efficient access to data.
The only exception is when working with SIMD instructions, where you have to manually ensure alignment on most compilers.
I don't see how that makes a difference. The CPU can simply issue a read for the 64-bit block that contains those 4 bytes. That means it either gets 4 extra bytes before the requested data, or after it. But in both cases, it only takes a single read. 32-bit alignment of 32-bit-wide data ensures that it won't cross a 64-bit boundary.
物理总线是 64 位宽...8 的倍数 --> 是的
,但是,还有两个因素需要考虑:
Physical bus is 64bit wide ...multiple of 8 --> yes
HOWEVER, there are two more factor to consider:
他们这样做是合理的,因为更改为 8 字节对齐将构成 ABI 更改,并且边际性能改进不值得这么麻烦。
正如其他人已经说过的,缓存行很重要。 实际内存总线上的所有访问均以高速缓存线为单位(x86、IIRC 上为 64 字节)。 请参阅已经提到的“每个程序员需要了解的有关内存的知识”文档。 所以实际的内存流量是64字节对齐的。
They are justified in doing so because changing to 8-byte alignment would constitute an ABI change, and the marginal performance improvement is not worth the trouble.
As someone else already said, cachelines matter. All accesses on the actual memory bus are in terms of cache lines (64 bytes on x86, IIRC). See the "What every programmer needs to know about memory" doc that was mentioned already. So the actual memory traffic is 64 byte aligned.
对于随机访问,只要数据没有错位(例如跨越边界),我认为这并不重要; 可以通过硬件中的简单 AND 结构找到数据中的正确地址和偏移量。 当一次读取访问不足以获取一个值时,它会变慢。 这也是编译器通常将小值(字节等)放在一起的原因,因为它们不必位于特定的偏移量; Shorts 应该位于偶数地址上,4 字节地址上应为 32 位,8 字节地址上应为 64 位。
请注意,如果您涉及缓存和线性数据访问,情况就会有所不同。
For random access and as long as the data is not misaligned (e.g. crossing a boundary), I don't think that it matters much; the correct address and offset in the data can be found with a simple AND construct in hardware. It gets slow when one read access is not sufficient to get one value. That's also why compilers usually put small values (bytes etc.) together because they don't have to be at a specific offset; shorts should be on even addresses, 32-bit on 4-byte addresses and 64-bit on 8-byte addresses.
Note that if you have caching involed and linear data access, things will be different.
您所指的 64 位总线为缓存提供数据。 作为 CPU,始终读取和写入整个缓存行。 高速缓存行的大小始终是 8 的倍数,并且其物理地址确实以 8 字节偏移量对齐。
高速缓存到寄存器的传输不使用外部数据总线,因此该总线的宽度无关紧要。
The 64 bits bus you refer to feeds the caches. As a CPU, always read and write entire cache lines. The size of a cache line is always a multiple of 8, and its physical address is indeed aligned at 8 byte offsets.
Cache-to-register transfers do not use the external databus, so the width of that bus is irrelevant.