计算 128 位 avx 向量中唯一值的数量,或检测所有元素是否相等?

发布于 2025-01-20 13:25:08 字数 956 浏览 2 评论 0原文

我正在优化代码库中的热路径,并且已经转向矢量化。请记住,我对所有这些 SIMD 内容还很陌生。这是我试图解决的问题,

inline int count_unique(int c1, int c2, int c3, int c4)
{
    return 4 - (c2 == c1)
             - ((c3 == c1) || (c3 == c2))
             - ((c4 == c1) || (c4 == c2) || (c4 == c3));
}

使用 -O3 编译后的汇编输出使用非 SIMD 实现:

count_unique:
        xor     eax, eax
        cmp     esi, edi
        mov     r8d, edx
        setne   al
        add     eax, 3
        cmp     edi, edx
        sete    dl
        cmp     esi, r8d
        sete    r9b
        or      edx, r9d
        movzx   edx, dl
        sub     eax, edx
        cmp     edi, ecx
        sete    dl
        cmp     r8d, ecx
        sete    dil
        or      edx, edi
        cmp     esi, ecx
        sete    cl
        or      edx, ecx
        movzx   edx, dl
        sub     eax, edx
        ret

将 c1,c2,c3,c4 存储为16字节整数向量?

I'm optimizing a hot path in my codebase and i have turned to vectorization. Keep in mind, I'm still quite new to all of this SIMD stuff. Here is the problem I'm trying to solve, implemented using non-SIMD

inline int count_unique(int c1, int c2, int c3, int c4)
{
    return 4 - (c2 == c1)
             - ((c3 == c1) || (c3 == c2))
             - ((c4 == c1) || (c4 == c2) || (c4 == c3));
}

the assembly output after compiling with -O3:

count_unique:
        xor     eax, eax
        cmp     esi, edi
        mov     r8d, edx
        setne   al
        add     eax, 3
        cmp     edi, edx
        sete    dl
        cmp     esi, r8d
        sete    r9b
        or      edx, r9d
        movzx   edx, dl
        sub     eax, edx
        cmp     edi, ecx
        sete    dl
        cmp     r8d, ecx
        sete    dil
        or      edx, edi
        cmp     esi, ecx
        sete    cl
        or      edx, ecx
        movzx   edx, dl
        sub     eax, edx
        ret

How would something like this be done when storing c1,c2,c3,c4 as a 16byte integer vector?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

倾其所爱 2025-01-27 13:25:08

对于您的简化问题(测试所有 4 个通道是否相等),我会稍微不同,方法如下。这样整个测试只需要 3 条指令。

// True when the input vector has the same value in all 32-bit lanes
inline bool isSameValue( __m128i v )
{
    // Rotate vector by 4 bytes
    __m128i v2 = _mm_shuffle_epi32( v, _MM_SHUFFLE( 0, 3, 2, 1 ) );
    // The XOR outputs zero for equal bits, 1 for different bits
    __m128i xx = _mm_xor_si128( v, v2 );
    // Use PTEST instruction from SSE 4.1 set to test the complete vector for all zeros
    return (bool)_mm_testz_si128( xx, xx );
}

For your simplified problem (test all 4 lanes for equality), I would do it slightly differently, here’s how. This way it only takes 3 instructions for the complete test.

// True when the input vector has the same value in all 32-bit lanes
inline bool isSameValue( __m128i v )
{
    // Rotate vector by 4 bytes
    __m128i v2 = _mm_shuffle_epi32( v, _MM_SHUFFLE( 0, 3, 2, 1 ) );
    // The XOR outputs zero for equal bits, 1 for different bits
    __m128i xx = _mm_xor_si128( v, v2 );
    // Use PTEST instruction from SSE 4.1 set to test the complete vector for all zeros
    return (bool)_mm_testz_si128( xx, xx );
}
把回忆走一遍 2025-01-27 13:25:08

好的,我已经“简化”了这个问题,因为我使用唯一计数的唯一情况是它是否为 1,但这与检查所有元素是否相同相同,这可以通过比较输入来完成与自身,但使用 _mm_alignr_epi8 函数移动一个元素(4 个字节)。

inline int is_same_val(__m128i v1) {
    __m128i v2 = _mm_alignr_epi8(v1, v1, 4);
    __m128i vcmp = _mm_cmpeq_epi32(v1, v2);
    return ((uint16_t)_mm_movemask_epi8(vcmp) == 0xffff);
}

Ok, I have "simplified" the problem, because the only case when i was using the unique count, was if it was 1, but that is the same as checking if all elements are the same, which can be done by comparing the input with itself, but shifted over by one element (4 bytes) using the _mm_alignr_epi8 function.

inline int is_same_val(__m128i v1) {
    __m128i v2 = _mm_alignr_epi8(v1, v1, 4);
    __m128i vcmp = _mm_cmpeq_epi32(v1, v2);
    return ((uint16_t)_mm_movemask_epi8(vcmp) == 0xffff);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文