NASM 是否存在未对齐访问问题?

发布于 2025-01-12 07:19:26 字数 240 浏览 1 评论 0原文

我知道什么是 C 语言中的未对齐访问以及它可能对某些处理器 UB 造成的影响。

我想知道在 NASM 程序集上编写的代码中是否存在相同的问题:

    section .text
        global _start
_start:
        mov [arr], word "abcd"

        section .data
arr: db 1, 2, 3, 4, 5, 6, 7

I know what is the unaligned access in C and that it can cause for some processors UB.

I wonder if there is the same problem in code like this, written on NASM assembly:

    section .text
        global _start
_start:
        mov [arr], word "abcd"

        section .data
arr: db 1, 2, 3, 4, 5, 6, 7

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

白色秋天 2025-01-19 07:19:26

一般来说没问题,x86 允许任何大小的未对齐访问(对 16 字节未对齐有一些限制)。

其他一些 ISA 则没有(例如 SPARC、MIPS32r6 之前的 MIPS 等),而 C 通过不定义 T* 指针小于 alignof(T)< 时的行为来迎合这些需求。 /代码> 对齐。在 GNU C 中,您可以使用 __attribute__((aligned(1))) 来 typedef 类型,这些类型在任何对齐方式下都具有明确定义的行为。


默认情况下,.data 部分在 Linux 下至少对齐 4 个字节,因此 2 字节(word)存储到 [arr] 一致的商店;地址保证是偶数(除非您使用特殊的链接器选项/链接器脚本告诉它在奇数地址上启动 .data)。您的 arr.data 部分的开头开始。

此外,"abcd" 是一个 4 字节常量,必须将其截断以适合单词。我猜你在测试你的示例以查看它恰好在你自己的计算机上运行之前错过了这一点,然后询问它总体上是否安全?

某些处理器UB的原因

不,总是 ISO C 中的 UB。请参阅 为什么对 mmap 内存进行未对齐访问AMD64 上有时会出现段错误? 获取示例和链接。请注意,未定义行为并不意味着它确实崩溃,只是优化器可以假设它不会发生并且结果可能是不可预测的。

与大多数 ISA 一样,x86 中的行为始终是明确定义的。硬件供应商必须准确指定即使在引发异常的情况下也会发生什么,因此可以编写操作系统以在用户空间导致故障时保持对机器的控制。 (因此,在 asm 中,您真正需要的不是定义的行为,而是保证无故障。)

对于 16 字节以外的任何访问大小,任何未对齐都可以。 (假设 AC 位被清除,这是正常系统中的情况。例如,如果您设置它,对于小型未对齐副本,glibc memcpy 会出错。除非您自己专门设置 AC 作为检测无意未对齐访问的方法,否则您可以假设它已被清除。现代 CPU 上还有用于分割加载和分割存储的性能计数器,您可以使用它们来检测有问题的计数器。)

对于 16 字节访问,传统 SSE默认情况下,访问需要自然对齐(例如,SSE2 pxor xmm0, [rdi] 需要对齐),但像 movdqu 未对齐加载/存储这样的指令除外。其他大小(例如 8 字节)不需要对齐,例如,punpckldq mm0, [rdi] 是对齐安全的,因为 MMX 寄存器只有 8 字节宽,即使 punpck 指令也是如此烦人的是进行全角加载,而不是只加载到目的地的一半。)

对于 AVX / AVX-512 编码(VEX / EVEX),默认情况下未对齐(例如vaddps xmm0, xmm1, [rdi] 不需要对齐),只有需要特殊对齐的指令,例如 vmovntps-stores 或 vmovdqa load /store 会因未对齐而出错。

即使对于未对齐的地址,需要对齐的访问的行为也是明确定义的:#GP 错误导致 SSE/AVX 未对齐,或者 #AC(如果您设置了 AC 位并执行了需要 2、4、或 8 字节对齐但未满足该要求。 (https://xem.github.io/minix86/manual/intel-x86-and-64-manual-vol3/o_fe12b1e2a880e0ce-231.html摘录了英特尔SDM PDF的相关页面。)

在GNU下/Linux,如果用户空间进程生成#GF 异常,它将收到 SIGSEGV(分段错误)。 IIRC,#AC 可能会让内核传递 SIGBUS(总线错误)。


x86 中未对齐访问的唯一问题是性能

(除了旧版 SSE 内存操作数提到的情况。)

Generally no problem, x86 allows unaligned accesses for any size (with some limitations for 16-byte unaligned).

Some other ISAs don't (e.g. SPARC, MIPS before MIPS32r6, etc.) and C caters to those by not defining the behaviour when a T* pointer has less than alignof(T) alignment. In GNU C you can use __attribute__((aligned(1))) to typedef types that have well-defined behaviour at any alignment.


The .data section will be aligned by at least 4 bytes by default under Linux, so a 2-byte (word) store to [arr] is an aligned store; the address is guaranteed to be even (unless you use special linker options / linker script to tell it to start .data on an odd address). Your arr starts at the start of your .data section.

Also, "abcd" is a 4-byte constant that will have to be truncated to fit in a word. I guess you missed that when you tested your example to see that it happened to work on your own computer, before asking if it was safe in general?

cause for some processors UB

No, it's always UB in ISO C. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example and links. Note that Undefined Behaviour doesn't mean it does crash, just that the optimizer can assume it doesn't happen and the results can be unpredictable.

The behaviour is always well-defined in x86, like for most ISAs. Hardware vendors have to specify exactly what happens even in cases that raise exceptions, so OSes can be written to maintain control of the machine when user-space causes faults. (So in asm, what you're really looking for isn't defined-behaviour, but guaranteed non-faulting.)

Any misalignment is fine for any access size other than 16 bytes. (Assuming the AC bit is cleared, which is the case in normal systems. glibc memcpy for example would fault if you set it, for small unaligned copies. Unless you specifically set AC yourself as a way to detect unintentional unaligned accesses, you can assume it's cleared. There are also performance counters for split-loads and split-stores on modern CPUs which you can use instead to detect problematic ones.)

For 16-byte accesses, legacy-SSE accesses require natural alignment by default (e.g. SSE2 pxor xmm0, [rdi] requires alignment), except for instructions like movdqu unaligned load/store. Other sizes like 8-byte don't require alignment, e.g. punpckldq mm0, [rdi] is alignment-safe because MMX registers are only 8 bytes wide, even though punpck instructions annoyingly do full-width loads instead of just the half that they shuffle in to the destination.)

With AVX / AVX-512 encodings (VEX / EVEX), unaligned is the default (e.g. vaddps xmm0, xmm1, [rdi] doesn't require alignment), and only special alignment-required instructions like vmovntps-stores or vmovdqa load/store will fault on misalignment.

The behaviour of alignment-required accesses is well-defined even for misaligned addresses: #GP fault for SSE/AVX misalignment, or #AC if you set the AC bit and did something that required 2, 4, or 8 bytes of alignment but didn't meet that requirement. (https://xem.github.io/minix86/manual/intel-x86-and-64-manual-vol3/o_fe12b1e2a880e0ce-231.html excerpts the relevant page of Intel's SDM PDFs.)

Under GNU/Linux, a user-space process will receive a SIGSEGV (segmentation fault) if it generates a #GF exception. IIRC, #AC might get the kernel to deliver a SIGBUS (bus error).


The only problems with unaligned access in x86 are performance

(Except as mentioned with legacy-SSE memory operands.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文