g++ 4.2 SSE指令的内联汇编用对齐的XMM寄存器副本包装用户汇编代码

发布于 2024-12-13 17:25:07 字数 714 浏览 7 评论 0原文

我有一个使用内联汇编的函数：

  vec8w x86_sse_ldvwu(const vec8w* m) { 
     vec8w rd; 
     asm("movdqu %[m],%[rd]" : [rd] "=x" (rd) : [m] "xm" (*m)); 
     return rd; 
  }

它被编译为以下汇编代码：

  sub    $0x1c,%esp
  mov    0x24(%esp),%eax
  movdqa (%eax),%xmm0 
  movdqu %xmm0,%xmm0
  movdqa %xmm0,(%esp)
  movdqa (%esp),%xmm0
  add    $0x1c,%esp
  ret

该代码效率不是很高，但这不是我关心的。正如您所看到的，内联汇编器插入一条 movdqa 指令，从 %eax 中的地址复制到 xmm0。问题是指针 vec8w* m 不是 128 字节对齐，因此在执行 movdqa 时出现段错误。我的问题是是否有办法指示内联汇编器使用 movdqu 而不是 movdqa （默认情况下使用）？我尝试寻找使用 g++ 的 SSE 内在函数的解决方法，但不知怎的，我在 xmmintrin.h 文件中找不到 movdqu （我想应该在其中声明它）。不幸的是，我无法修改代码，以便始终为对齐参数 m 调用该函数。

原文

I have a function using inline assembly:

  vec8w x86_sse_ldvwu(const vec8w* m) { 
     vec8w rd; 
     asm("movdqu %[m],%[rd]" : [rd] "=x" (rd) : [m] "xm" (*m)); 
     return rd; 
  }

It gets compiled to the following assembly code:

  sub    $0x1c,%esp
  mov    0x24(%esp),%eax
  movdqa (%eax),%xmm0 
  movdqu %xmm0,%xmm0
  movdqa %xmm0,(%esp)
  movdqa (%esp),%xmm0
  add    $0x1c,%esp
  ret

The code isn't terribly efficient, but that isn't my concern. As you can see the inline assembler inserts a movdqa instruction copying from the address in %eax to xmm0. The problem is that the pointer vec8w* m is not 128 bytes aligned, so I get a seg fault when movdqa is being executed.
My question is whether there is a way to instruct the inline assembler to use movdqu instead of movdqa (that it uses by default)? I tried to look for a workaround using SSE intrinsic functions for g++, but somehow I cannot find movdqu in xmmintrin.h file (where it should be declared, I suppose).
Unfortunately, I cannot modify the code so that the function is always called for an aligned argument m.

分享到QQ

分享到微博