64 位特定 simd 内在函数
我在 SSE2 中使用以下联合声明。
typedef unsigned long uli;
typedef uli v4si __attribute__ ((vector_size(16)));
typedef union
{
v4si v;
uli data[2];
} uliv;
uliv a, b, c;
这个想法是将两个无符号长变量(64 位长)分配给每个 a 和 b,对它们进行异或并将结果放入 c 中。
显式赋值 (a.data[0] = Something
) 在这里可以工作,但需要更多时间。
我计划使用内在函数。如果我使用 _mm_set_epi64 (unsigned long x, unsigned long y)
,它会要求 __m64
变量。如果我转换这些变量 (__m64)x
并且它工作正常,但它给出了错误的结果。
for (k = 0; k < 10; k++)
{
simda.v = _mm_set_epi64 (_mulpre1[u1][k], _mulpre2[u2][k]);
simdb.v = _mm_set_epi64 (res1[i+k], res2[i+k]);
simdc.v = _mm_xor_si128 (simda.v, simdb.v);
}
上面的代码给出了错误:
/usr/lib/gcc/x86_64-linux-gnu/4.4.3/include/emmintrin.h:578: note: expected ‘__m64’
but argument is of type ‘long unsigned int’
你能建议一些替代方案(内在)吗?
I am using the following union declaration in SSE2.
typedef unsigned long uli;
typedef uli v4si __attribute__ ((vector_size(16)));
typedef union
{
v4si v;
uli data[2];
} uliv;
uliv a, b, c;
The idea is assign two unsigned long variables (64 bit long) to each a and b, XOR them and place the result in c.
An explicit assignment (a.data[0] = something
) works here but it requires more time.
I plan to use intrinsics. If I use _mm_set_epi64 (unsigned long x, unsigned long y)
, it asks for __m64
variables. If I cast these variables (__m64)x
and it works fine, but it gives wrong result.
for (k = 0; k < 10; k++)
{
simda.v = _mm_set_epi64 (_mulpre1[u1][k], _mulpre2[u2][k]);
simdb.v = _mm_set_epi64 (res1[i+k], res2[i+k]);
simdc.v = _mm_xor_si128 (simda.v, simdb.v);
}
The above code gives error:
/usr/lib/gcc/x86_64-linux-gnu/4.4.3/include/emmintrin.h:578: note: expected ‘__m64’
but argument is of type ‘long unsigned int’
Can you please suggest some alternatives (intrinsics)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您确定系统上的
unsigned long
是64位吗?使用
中的unsigned long long
或更好的uint64_t
可能更安全。在我的系统上,
_mm_set_epi64
采用两个unsigned long long
参数并返回一个__m128i
。从您的问题中不清楚您是否只想(a)异或两个 64 位值或(b)异或两个 2 x 64 位值的向量?
对于情况(a),只需使用标量代码,例如,
对于情况(b),您不需要联合等,只需执行以下操作:
Are you sure that
unsigned long
is 64 bits on your system ? It's probably safer to useunsigned long long
or better yetuint64_t
from<stdint.h>
.On my system
_mm_set_epi64
takes twounsigned long long
parameters and returns an__m128i
.It's not clear from your question whether you just want to (a) XOR two 64 bit values or (b) XOR two vectors of 2 x 64 bit values ?
For case (a) just use scalar code, e.g.
For case (b) you don't need unions etc, just do this: