SIMD 常量浮点数
我一直在尝试使用微软的 sse 内在函数来优化一些代码。优化代码时最大的问题之一是每当我想使用常量时就会发生 LHS。似乎有一些关于生成某些常量的信息(此处和这里 - 第 13.4 节),但它是所有汇编(我宁愿避免)。
问题是,当我尝试使用内在函数实现相同的功能时,msvc 会抱怨类型不兼容等。有谁知道使用内在函数有任何等效的技巧吗?
示例 - 生成 {1.0,1.0,1.0,1.0}
//pcmpeqw xmm0,xmm0
__m128 t = _mm_cmpeq_epi16( t, t );
//pslld xmm0,25
_mm_slli_epi32(t, 25);
//psrld xmm0,2
return _mm_srli_epi32(t, 2);
这会生成一堆有关不兼容类型的错误(__m128 与 _m128i)。我对此很陌生,所以我很确定我错过了一些明显的东西。有人可以帮忙吗?
tldr - 如何生成一个 __m128 vec,其中充满带有 ms 内在函数的单精度常量浮点数?
感谢您的阅读:)
I've been trying my hand at optimising some code I have using microsoft's sse intrinsics. One of the biggest problems when optimising my code is the LHS that happens whenever I want to use a constant. There seems to be some info on generating certain constants (here and here - section 13.4), but its all assembly (which I would rather avoid).
The problem is when I try to implement the same thing with intrinsics, msvc complains about incompatible types etc. Does anyone know of any equivalent tricks using intrinsics?
Example - Generate {1.0,1.0,1.0,1.0}
//pcmpeqw xmm0,xmm0
__m128 t = _mm_cmpeq_epi16( t, t );
//pslld xmm0,25
_mm_slli_epi32(t, 25);
//psrld xmm0,2
return _mm_srli_epi32(t, 2);
This generates a bunch of errors about incompatible type (__m128 vs _m128i). I'm pretty new to this, so I'm pretty sure I'm missing something obvious. Can anyone help?
tldr - How do I generate an __m128 vec filled with single precision constant floats with ms intrinsics?
Thanks for reading :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试
_mm_set_ps
, <一href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_set_ps1&expand=4584,4587" rel="nofollow">_mm_set_ps1
或 <一个href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_set1_ps&expand=4584,4587,4634,4587,4634" rel="nofollow">_mm_set1_ps
。Try
_mm_set_ps
,_mm_set_ps1
or_mm_set1_ps
.只需使用 _mm_castsi128_ps 将 __m128i 转换为 __m128。另外,第二行应该是
Simply cast __m128i to __m128 using _mm_castsi128_ps. Also, the second line should be