g++ 4.2 SSE指令的内联汇编用对齐的XMM寄存器副本包装用户汇编代码
我有一个使用内联汇编的函数:
vec8w x86_sse_ldvwu(const vec8w* m) {
vec8w rd;
asm("movdqu %[m],%[rd]" : [rd] "=x" (rd) : [m] "xm" (*m));
return rd;
}
它被编译为以下汇编代码:
sub $0x1c,%esp
mov 0x24(%esp),%eax
movdqa (%eax),%xmm0
movdqu %xmm0,%xmm0
movdqa %xmm0,(%esp)
movdqa (%esp),%xmm0
add $0x1c,%esp
ret
该代码效率不是很高,但这不是我关心的。正如您所看到的,内联汇编器插入一条 movdqa 指令,从 %eax 中的地址复制到 xmm0。问题是指针 vec8w* m 不是 128 字节对齐,因此在执行 movdqa 时出现段错误。 我的问题是是否有办法指示内联汇编器使用 movdqu 而不是 movdqa (默认情况下使用)?我尝试寻找使用 g++ 的 SSE 内在函数的解决方法,但不知怎的,我在 xmmintrin.h 文件中找不到 movdqu (我想应该在其中声明它)。 不幸的是,我无法修改代码,以便始终为对齐参数 m 调用该函数。
I have a function using inline assembly:
vec8w x86_sse_ldvwu(const vec8w* m) {
vec8w rd;
asm("movdqu %[m],%[rd]" : [rd] "=x" (rd) : [m] "xm" (*m));
return rd;
}
It gets compiled to the following assembly code:
sub $0x1c,%esp
mov 0x24(%esp),%eax
movdqa (%eax),%xmm0
movdqu %xmm0,%xmm0
movdqa %xmm0,(%esp)
movdqa (%esp),%xmm0
add $0x1c,%esp
ret
The code isn't terribly efficient, but that isn't my concern. As you can see the inline assembler inserts a movdqa instruction copying from the address in %eax to xmm0. The problem is that the pointer vec8w* m is not 128 bytes aligned, so I get a seg fault when movdqa is being executed.
My question is whether there is a way to instruct the inline assembler to use movdqu instead of movdqa (that it uses by default)? I tried to look for a workaround using SSE intrinsic functions for g++, but somehow I cannot find movdqu in xmmintrin.h file (where it should be declared, I suppose).
Unfortunately, I cannot modify the code so that the function is always called for an aligned argument m.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您正在寻找的内在函数是
_mm_loadu_si128
。它在emmintrin.h
中定义。这是SSE2。xmmintrin.h
标头仅包含 SSE(1) 指令。http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_int_load.htm
_mm_loadu_si128< /code> 将发出您正在寻找的
movdqu
指令。看来这正是您想要通过内联汇编函数来完成的任务。 (未对准的负载)The intrinsic that you are looking for is
_mm_loadu_si128
. It is defined inemmintrin.h
. Which is SSE2. Thexmmintrin.h
header contains only SSE(1) instructions.http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_int_load.htm
_mm_loadu_si128
will emit themovdqu
instruction which you are looking for. It seems that's exactly what you are trying to accomplish with your inline assembly function. (a misaligned load)