编译器优化的 C 函数与使用 SIMD 内在函数手动编写的函数
我正在探索 SIMD 指令,并且编写了一个小函数(见下文),以使用 AVX512F 内在函数在 16 个整数的向量 v
中查找整数 s
的第一个索引。该函数看起来非常简单。
是否可以用纯 C 语言编写该函数,以便 GCC 或 Clang 能够很好地优化它?我使用 Compiler Explorer (godbolt.org) 来比较 find()
和 findc()
的汇编输出,C 变体总是要复杂得多。我这是在做傻事吗?
#include <stdio.h>
#include <stdint.h>
#include <immintrin.h>
int
find(int64_t s, int64_t *v)
{
__m512i _s = _mm512_set1_epi64(s);
__m512i _v0 = _mm512_load_epi64(&v[0]);
__m512i _v1 = _mm512_load_epi64(&v[8]);
__mmask8 m0 = _mm512_cmpeq_epi64_mask(_s, _v0);
__mmask8 m1 = _mm512_cmpeq_epi64_mask(_s, _v1);
uint32_t matches = (m1 << 8) | m0;
return matches == 0 ? 32 : __builtin_ctz(matches);
}
int
findc(int64_t s, int64_t *v)
{
uint32_t __attribute__ ((aligned(16))) matches = 0;
for (int i = 0; i < 16; ++i)
{
matches |= (s == v[i]) << i;
}
return matches == 0 ? 32 : __builtin_ctz(matches);
}
int
main()
{
int64_t v[] = {40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55};
printf("find=%d\n", find(49, v));
printf("findc=%d\n", findc(49, v));
}
I'm exploring SIMD instructions and I have written a small function (see below) to find the first index of an integer s
in a vector v
of 16 integers using AVX512F intrinsics. The function looks pretty straightforward.
Is it possible to write the function in plain C such that GCC or Clang would be able to optimize it well? I've used Compiler Explorer (godbolt.org) to compare the assembly output of find()
and findc()
, and the C variant is always much more complicated. Am I on a fool's errand?
#include <stdio.h>
#include <stdint.h>
#include <immintrin.h>
int
find(int64_t s, int64_t *v)
{
__m512i _s = _mm512_set1_epi64(s);
__m512i _v0 = _mm512_load_epi64(&v[0]);
__m512i _v1 = _mm512_load_epi64(&v[8]);
__mmask8 m0 = _mm512_cmpeq_epi64_mask(_s, _v0);
__mmask8 m1 = _mm512_cmpeq_epi64_mask(_s, _v1);
uint32_t matches = (m1 << 8) | m0;
return matches == 0 ? 32 : __builtin_ctz(matches);
}
int
findc(int64_t s, int64_t *v)
{
uint32_t __attribute__ ((aligned(16))) matches = 0;
for (int i = 0; i < 16; ++i)
{
matches |= (s == v[i]) << i;
}
return matches == 0 ? 32 : __builtin_ctz(matches);
}
int
main()
{
int64_t v[] = {40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55};
printf("find=%d\n", find(49, v));
printf("findc=%d\n", findc(49, v));
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论