InterlockedExchange 和内存对齐
我很困惑微软说 InterlockedExchange 需要内存对齐,但是英特尔文档说 LOCK 不需要内存对齐。 我错过了什么吗? 感谢
来自 Microsoft MSDN Library
平台 SDK:DLL、进程和线程 InterlockedExchange
Target参数指向的变量必须在32位边界上对齐; 否则,此函数将在多处理器 x86 系统和任何非 x86 系统上表现异常。
来自英特尔软件开发人员手册;
LOCK 指令 导致处理器的 LOCK# 信号在执行附带指令期间被置位(将指令转变为原子指令)。 在多处理器环境中,LOCK# 信号可确保在该信号有效时处理器独占使用任何共享内存。
LOCK 前缀的完整性不受内存字段对齐的影响。 观察到任意未对齐字段的内存锁定。
P6 和更新的处理器系列中的内存排序
锁定指令有一个总顺序。
软件控制总线锁定
总线锁的完整性不受内存字段对齐的影响。 LOCK 语义遵循更新整个操作数所需的尽可能多的总线周期。但是,建议锁定访问在其自然边界上对齐,以获得更好的系统性能: • 8 位访问的任何边界(锁定或其他)。 • 锁定字访问的16 位边界。 • 锁定双字访问的32 位边界。 • 锁定四字访问的64 位边界。
I am confused that Microsoft says memory alignment is required for InterlockedExchange however, Intel documentation says that memory alignment is not required for LOCK.
Am i missing something, or whatever?
thanks
from Microsoft MSDN Library
Platform SDK: DLLs, Processes, and Threads
InterlockedExchange
The variable pointed to by the Target parameter must be aligned on a 32-bit boundary; otherwise, this function will behave unpredictably on multiprocessor x86 systems and any non-x86 systems.
from Intel Software Developer’s Manual;
LOCK instruction
Causes the processor’s LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal insures that the processor has exclusive use of any shared memory while the signal is asserted.The integrity of the LOCK prefix is not affected by the alignment of the memory field.
Memory locking is observed for arbitrarily misaligned fields.Memory Ordering in P6 and More Recent Processor Families
Locked instructions have a total order.
Software Controlled Bus Locking
The integrity of a bus lock is not affected by the alignment of the memory field. The LOCK semantics are followed for as many bus cycles as necessary to update the entire operand. However, it is recommend that locked accesses be aligned on their natural boundaries for better system performance:
•Any boundary for an 8-bit access (locked or otherwise).
•16-bit boundary for locked word accesses.
•32-bit boundary for locked doubleword accesses.
•64-bit boundary for locked quadword accesses.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
曾几何时,Microsoft 在 x86 以外的处理器上支持 WindowsNT,例如 MIPS、PowerPC 和 Alpha。 这些处理器都需要对其互锁指令进行对齐,因此微软将这一要求放入其规范中,以确保这些原语可以移植到不同的体系结构。
Once upon a time, Microsoft supported WindowsNT on processors other than x86, such as MIPS, PowerPC, and Alpha. These processors all require alignment for their interlocked instructions, so Microsoft put the requirement in their spec to ensure that these primitives would be portable to different architectures.
尽管锁前缀不需要内存对齐,并且可能用于实现 InterlockedExchange() 的 cmpxchg 操作不需要对齐,但如果操作系统启用了对齐检查,那么 cmpxchg 将引发对齐检查异常(AC ) 当使用未对齐的操作数执行时。 检查 cmpxchg 和类似文档,查看保护模式异常列表。 我不确定 Windows 是否启用对齐检查,但这并不会让我感到惊讶。
Even though the lock prefix doesn't require memory to be aligned, and the cmpxchg operation that's probably used to implement InterlockedExchange() doesn't require alignment, if the OS has enabled alignment checking then the cmpxchg will raise an alignment check exception (AC) when executed with unaligned operands. Check the docs for the cmpxchg and similar, looking at the list of protected mode exceptions. I don't know for sure that Windows enables alignment checking, but it wouldn't surprise me.
嘿,我回答了一些与此相关的问题,也请记住;
我差点忘了,来自 Intel 的 TBB,他们定义了加载/存储 8 位,不使用隐式或显式锁定(在某些情况下);
不管怎样,希望这至少能为你解决一些问题。
Hey, I answered a few questions related to this, also keep in mind;
I nearly forgot, from Intel's TBB, they have Load/Store 8bit's defined w/o the use of implicit or explicit locking (in some cases);
Anyhow, hope that clears at leat some of this up for you.
我不明白你的英特尔信息来自哪里。
对我来说,很明显英特尔非常关心对齐和/或跨越缓存行。
例如,在 Core-i7 处理器上,您仍然必须确保数据不会跨越缓存行,否则不能保证操作是原子的。
在第 3-I 卷“系统编程,针对 x86/x64”中,英特尔明确指出:
I don't understand where your Intel information is coming from.
To me, its pretty clear that Intel cares A LOT about alignment and/or spanning cache-lines.
For example, on a Core-i7 processor, you STILL have to make sure your data doesn't not span over cache-lines, or else the operation is NOT guaranteed to be atomic.
On Volume 3-I, System Programming, For x86/x64 Intel clearly states: