g++ 4.2 SSE指令的内联汇编用对齐的XMM寄存器副本包装用户汇编代码

发布于 2024-12-13 17:25:07 字数 714 浏览 4 评论 0原文

我有一个使用内联汇编的函数:

  vec8w x86_sse_ldvwu(const vec8w* m) { 
     vec8w rd; 
     asm("movdqu %[m],%[rd]" : [rd] "=x" (rd) : [m] "xm" (*m)); 
     return rd; 
  } 

它被编译为以下汇编代码:

  sub    $0x1c,%esp
  mov    0x24(%esp),%eax
  movdqa (%eax),%xmm0 
  movdqu %xmm0,%xmm0
  movdqa %xmm0,(%esp)
  movdqa (%esp),%xmm0
  add    $0x1c,%esp
  ret 

该代码效率不是很高,但这不是我关心的。正如您所看到的,内联汇编器插入一条 movdqa 指令,从 %eax 中的地址复制到 xmm0。问题是指针 vec8w* m 不是 128 字节对齐,因此在执行 movdqa 时出现段错误。 我的问题是是否有办法指示内联汇编器使用 movdqu 而不是 movdqa (默认情况下使用)?我尝试寻找使用 g++ 的 SSE 内在函数的解决方法,但不知怎的,我在 xmmintrin.h 文件中找不到 movdqu (我想应该在其中声明它)。 不幸的是,我无法修改代码,以便始终为对齐参数 m 调用该函数。

I have a function using inline assembly:

  vec8w x86_sse_ldvwu(const vec8w* m) { 
     vec8w rd; 
     asm("movdqu %[m],%[rd]" : [rd] "=x" (rd) : [m] "xm" (*m)); 
     return rd; 
  } 

It gets compiled to the following assembly code:

  sub    $0x1c,%esp
  mov    0x24(%esp),%eax
  movdqa (%eax),%xmm0 
  movdqu %xmm0,%xmm0
  movdqa %xmm0,(%esp)
  movdqa (%esp),%xmm0
  add    $0x1c,%esp
  ret 

The code isn't terribly efficient, but that isn't my concern. As you can see the inline assembler inserts a movdqa instruction copying from the address in %eax to xmm0. The problem is that the pointer vec8w* m is not 128 bytes aligned, so I get a seg fault when movdqa is being executed.
My question is whether there is a way to instruct the inline assembler to use movdqu instead of movdqa (that it uses by default)? I tried to look for a workaround using SSE intrinsic functions for g++, but somehow I cannot find movdqu in xmmintrin.h file (where it should be declared, I suppose).
Unfortunately, I cannot modify the code so that the function is always called for an aligned argument m.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

物价感观 2024-12-20 17:25:07

您正在寻找的内在函数是_mm_loadu_si128。它在 emmintrin.h 中定义。这是SSE2。 xmmintrin.h 标头仅包含 SSE(1) 指令。

http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_int_load.htm

_mm_loadu_si128< /code> 将发出您正在寻找的 movdqu 指令。看来这正是您想要通过内联汇编函数来完成的任务。 (未对准的负载)

The intrinsic that you are looking for is _mm_loadu_si128. It is defined in emmintrin.h. Which is SSE2. The xmmintrin.h header contains only SSE(1) instructions.

http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_int_load.htm

_mm_loadu_si128 will emit the movdqu instruction which you are looking for. It seems that's exactly what you are trying to accomplish with your inline assembly function. (a misaligned load)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文