自定义ASM脚本中的movdqa segfault
我有以下代码段( https://godbolt.org/z/ce1qe9fvv )幼稚& DOT产品的矢量化版本。
我决定将矢量化版本编译在独立ASM文件中,如下所示:
extern exit
section .text
global _start
_start:
mov rax, 8589934593
mov QWORD [rsp-72], rax
mov rax, 17179869187
mov QWORD [rsp-64], rax
mov rax, 25769803781
mov QWORD [rsp-56], rax
mov rax, 34359738375
mov QWORD [rsp-48], rax
mov rax, 85899345930
mov QWORD [rsp-40], rax
mov rax, 171798691870
mov QWORD [rsp-32], rax
mov rax, 257698037810
mov QWORD [rsp-24], rax
mov rax, 343597383750
mov QWORD [rsp-16], rax
movdqa xmm1, [rsp-72]
movdqa xmm0, [rsp-24]
pmulld xmm1, [rsp-40]
pmulld xmm0, [rsp-56]
paddd xmm0, xmm1
movdqa xmm1, xmm0
psrldq xmm1, 8
paddd xmm0, xmm1
movdqa xmm1, xmm0
psrldq xmm1, 4
paddd xmm0, xmm1
movd eax, xmm0
.exit:
call exit
我使用以下来构建:nasm -f elf64 dot_product.asm.asm&&& gcc -g -g -no -pie -nostartfiles -o dot_product dot_product.o
上述代码segfault at movdqdqa xmm0,xmmword ptr [rsp -72]
,这可能意味着数据不是16 - Bytes对齐。但是,以下屏幕截图似乎表明相反:
我在误会某些东西吗?
I have the following code snippet (https://godbolt.org/z/cE1qE9fvv) which contains a naive & vectorized version of a dot product.
I decided to make the vectorized version compile in standalone asm file as following:
extern exit
section .text
global _start
_start:
mov rax, 8589934593
mov QWORD [rsp-72], rax
mov rax, 17179869187
mov QWORD [rsp-64], rax
mov rax, 25769803781
mov QWORD [rsp-56], rax
mov rax, 34359738375
mov QWORD [rsp-48], rax
mov rax, 85899345930
mov QWORD [rsp-40], rax
mov rax, 171798691870
mov QWORD [rsp-32], rax
mov rax, 257698037810
mov QWORD [rsp-24], rax
mov rax, 343597383750
mov QWORD [rsp-16], rax
movdqa xmm1, [rsp-72]
movdqa xmm0, [rsp-24]
pmulld xmm1, [rsp-40]
pmulld xmm0, [rsp-56]
paddd xmm0, xmm1
movdqa xmm1, xmm0
psrldq xmm1, 8
paddd xmm0, xmm1
movdqa xmm1, xmm0
psrldq xmm1, 4
paddd xmm0, xmm1
movd eax, xmm0
.exit:
call exit
I use the following to build: nasm -f elf64 dot_product.asm && gcc -g -no-pie -nostartfiles -o dot_product dot_product.o
The above code segfault at movdqa xmm0, XMMWORD PTR [rsp-72]
which probably means that the data is not 16-bytes aligned. However, the following screenshot seems to indicate the opposite:
Am I misunderstanding something ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论