SIMD-如何添加来自不同元素宽度的2个向量的相应值(char或uint8_t添加到int)
请告诉我如何从同一类型的SIMD向量中添加值,但是这些值本身,这些值本身是由这些SIMD向量中不同数量的字节占据的。
这是一个示例:
int main()
{
//--------------------------------------------------------------
int my_int_sequence[16] = { 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 };
__m128i my_int_sequence_m128i_1 = _mm_loadu_si128((__m128i*) & my_int_sequence[0]);
__m128i my_int_sequence_m128i_2 = _mm_loadu_si128((__m128i*) & my_int_sequence[4]);
__m128i my_int_sequence_m128i_3 = _mm_loadu_si128((__m128i*) & my_int_sequence[8]);
__m128i my_int_sequence_m128i_4 = _mm_loadu_si128((__m128i*) & my_int_sequence[12]);
//--------------------------------------------------------------
//-----------------------------------------------------------------------
char my_char_mask[16] = { 1,0,1,1,0,1,0,1,1,1,0,1,0,1,0,1 };
__m128i my_char_mask_m128i = _mm_loadu_si128((__m128i*) &my_char_mask[0]);
//-----------------------------------------------------------------------
}
也就是说,我在my_int_sequence数组中有一个int值数组 - 由于所有16个int值都不适合一个__m128i向量,因此我将这些值4值加载到第四__m128i vectors中。
我也有16个字节的数组,我还将其加载到my_char_mask_my_m128i vector中。
现在,我想将MY_INT_SECORES_M128I_X向量的每个4个字节值添加到每个4个字节值中,就好像来自my_char_mask_my_my_m128i vector的相应单字节值一样。
这个问题很明显,我需要加起来不同的维度。是否可以?
也许我需要向量my_char_mask_my_m128i的每个字节 - 如何将其转换为4个字节?
Please tell me how can add values from a SIMD vector of the same type, but the values themselves, which are occupied by a different number of bytes in these SIMD vectors.
Here's an example:
int main()
{
//--------------------------------------------------------------
int my_int_sequence[16] = { 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 };
__m128i my_int_sequence_m128i_1 = _mm_loadu_si128((__m128i*) & my_int_sequence[0]);
__m128i my_int_sequence_m128i_2 = _mm_loadu_si128((__m128i*) & my_int_sequence[4]);
__m128i my_int_sequence_m128i_3 = _mm_loadu_si128((__m128i*) & my_int_sequence[8]);
__m128i my_int_sequence_m128i_4 = _mm_loadu_si128((__m128i*) & my_int_sequence[12]);
//--------------------------------------------------------------
//-----------------------------------------------------------------------
char my_char_mask[16] = { 1,0,1,1,0,1,0,1,1,1,0,1,0,1,0,1 };
__m128i my_char_mask_m128i = _mm_loadu_si128((__m128i*) &my_char_mask[0]);
//-----------------------------------------------------------------------
}
That is, I have an array of int values in the my_int_sequence array - and since all 16 int values will not fit in one __m128i vector, I load these values 4 values into the 4th __m128i vectors.
I also have an array of 16 bytes, which I also loaded into the my_char_mask_my_m128i vector.
And now I want to add to each 4 byte value of the my_int_sequence_m128i_x vectors, as if the corresponding one-byte value from the my_char_mask_my_m128i vector.
The problem is obvious that I need to add up, as it were, different dimensions. Is it possible?
Perhaps I need each byte of the vector my_char_mask_my_m128i - how to transform it into 4 bytes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您正在寻找sse4.1 intinsic
_mm_cvtepi8_epi32()
,该在SSE矢量中占据了第一个4(已签名的)8位整数,并将其签名到32位整数中。将其与一些转换结合起来,以将接下来的4移动到下一个扩展程序中,并且您会得到类似的内容(请注意,通常必须告诉编译器生成SSE 4.1指令 - 使用
g ++
和clang ++
使用适当的-march = xxxx
选项或-msse4.1
):彼得·科德斯(Peter Cordes)建议的替代版本,如果您的编译器足够了拥有
_mm_loadu_si32()
:You're looking for the SSE4.1 intrinsic
_mm_cvtepi8_epi32()
, which takes the first 4 (signed) 8-bit integers in the SSE vector and sign-extends them into 32-bit integers. Combine that with some shifting to move the next 4 into place for the next extension, and you get something like:Example (Note that you usually have to tell your compiler to generate SSE 4.1 instructions - with
g++
andclang++
use the appropriate-march=XXXX
option or-msse4.1
):Alternative version suggested by Peter Cordes if your compiler is new enough to have
_mm_loadu_si32()
: