_mm_loadu_si32在Ubuntu上未被GCC识别

发布于 2025-02-12 13:21:59 字数 461 浏览 0 评论 0原文

当我尝试使用_MM_LOADU_SI32时,VSCODE给出了错误消息:
类型“ int”的值不能用于初始化类型的实体“ __m128i
尝试编译时,我会收到错误消息:
函数'_mm_loadu_si32'的隐式声明

是奇怪的部分是_mm_mm_loadu_si32之前的几行,我正在使用_mm_mm_loadu_si128_mm_loadu_si64也有效。
另外,在Windows上,我的程序编译。

我运行sudo apt-get updatesudo apt-get升级,因此问题没有过时的软件。这是否仅限于Ubuntu?

OS:Ubuntu 20.04
GCC:9.4.0

When I try to use _mm_loadu_si32, VScode gives me the error message:
a value of type "int" cannot be used to initialize an entity of type "__m128i
When trying to compile, I get the error message:
implicit declaration of function '_mm_loadu_si32'

The weird part is that a couple lines before _mm_loadu_si32, I'm using _mm_loadu_si128 without having any kind of problems. _mm_loadu_si64 also works.
Also, on windows, my program compiles.

I ran sudo apt-get update and sudo apt-get upgrade, so the problem isn't outdated software. Is this some kind of gcc bug restricted to Ubuntu?

OS: Ubuntu 20.04
gcc: 9.4.0

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

孤凫 2025-02-19 13:21:59

您的GCC太老了,您需要GCC11才能通过inmintrin.h 定义它,

并且您需要GCC11.3或GCC12对于非破裂版本,它放置了已加载的版本字节位于结果矢量中的正确位置,并且要对齐 /严格确定安全。 gcc bug 99754 99754

Clang有时会错过定义一些“助手”内在的,最终才能到他们身边。这是其中一种情况,甚至更糟糕的是,第一次添加它是越野车的尝试。那里有GCC版本(GCC11.0至11.2)支持它,但会将其错误编译(将dword或word放置在加载后的顶部元素中,而不是底部,因为它们使用了_mm_set而不是标题实现中的_mm_setr。)


FP等效4字节负载,__ M128 _mm_load_ss(float*)已永远定义,但是在海湾合作委员会的实现中仍然不像其他编译器那样对齐或严格确定安全。 GCC的标题derefs float*,而不是使用memcpy__属性__(((校准)(1),May_alias))指针类型。那是 gcc bug pr84508

因此,不幸的是,在GCC中,不是可以安全地使用_mm_castps_si128(_mm_load_ss(((float*)ptr)))


老年编译器的便携式实现

您的最佳选择不协调的4个字节负载可能是此便携式实现:

__m128i movd_load(void *p)
{
    int tmp;                       // int32_t on implementations that support intrinsics
    memcpy(&tmp, p, sizeof(tmp));  // unaligned aliasing-safe load
    return _mm_cvtsi32_si128(tmp);
}

它在GCC/Clang/MSVC上很好地编译( godbolt 显示全部)。 GCC和Clang的旧版本:已测试的GCC4.7和GCC12,只是预期movd XMM0,[rdi]/ret

但是它愚蠢地在ICC上编译,加载到EAX中,然后存储/重新加载或movd XMM0,eax,而不是movd的内存源操作数。


这也可作为PMOVZX / PMOVSX负载的建筑块(用于狭窄负载的重要用例之一

#if defined(__SSE4_1__) || defined (_MSC_VER)
__m128i pmovzxbd_load(void *p)
{
    __m128i v = movd_load(p);
    return _mm_cvtepu8_epi32(v);  // folds the load with GCC9 or later
    // but not ICC or MSVC, or earlier GCC: they all movd into an XMM reg and pmovzxbd xmm0,xmm0
    // clang gets this right, with a mem src pmovzxbd
}
#endif
# GCC8.5 -O2 -march=skylake -mno-avx
# and MSVC19.14.  ICC 2021 is even worse, going through EAX
pmovzxbd_load:
        movd    xmm0, DWORD PTR [rdi]
        pmovzxbd        xmm0, xmm0
        ret
# GCC9.5 -O2 -march=skylake -mno-avx
# and clang
pmovzxbd_load:
        pmovzxbd        xmm0, DWORD PTR [rdi]
        ret

Your GCC is too old, you need GCC11 for it to be defined by immintrin.h

And you need GCC11.3 or GCC12 for a non-broken version that puts the loaded bytes in the correct place in the resulting vector, and to be alignment / strict-aliasing safe. GCC bug 99754

GCC and/or clang sometimes miss defining some "helper" intrinsics, only eventually getting around to them. This is one of those cases, and even worse, the first attempt at adding it was buggy. There are GCC versions out there (GCC11.0 through 11.2) which support it but mis-compile it (shuffling the dword or word into the top element after loading, instead of the bottom, because they used _mm_set instead of _mm_setr in the header implementation.)


The FP equivalent 4-byte load, __m128 _mm_load_ss(float*), has been defined forever, but is still not alignment or strict-aliasing safe in GCC's implementation like it is in other compilers. GCC's header derefs the float*, instead of using memcpy or an __attribute__((aligned(1),may_alias)) pointer type. That's GCC bug PR84508.

So unfortunately, in GCC, it's not safe to use _mm_castps_si128( _mm_load_ss( (float*)ptr )) either.


Portable implementation for older compilers

Your best bet for an aliasing-safe unaligned 4-byte load is probably this portable implementation:

__m128i movd_load(void *p)
{
    int tmp;                       // int32_t on implementations that support intrinsics
    memcpy(&tmp, p, sizeof(tmp));  // unaligned aliasing-safe load
    return _mm_cvtsi32_si128(tmp);
}

This compiles nicely on GCC/clang/MSVC (Godbolt showing all). Both old and new versions of GCC and clang: Tested GCC4.7 and GCC12, just the expected movd xmm0, [rdi] / ret.

But it compiles stupidly on ICC, loading into EAX and then either store/reload or movd xmm0, eax, instead of a memory source operand for movd.


This is also useful as a building-block for pmovzx / pmovsx loads (one of the significant use-cases for narrow loads into __m128i, especially unaligned and aliasing-safe loads), such as

#if defined(__SSE4_1__) || defined (_MSC_VER)
__m128i pmovzxbd_load(void *p)
{
    __m128i v = movd_load(p);
    return _mm_cvtepu8_epi32(v);  // folds the load with GCC9 or later
    // but not ICC or MSVC, or earlier GCC: they all movd into an XMM reg and pmovzxbd xmm0,xmm0
    // clang gets this right, with a mem src pmovzxbd
}
#endif
# GCC8.5 -O2 -march=skylake -mno-avx
# and MSVC19.14.  ICC 2021 is even worse, going through EAX
pmovzxbd_load:
        movd    xmm0, DWORD PTR [rdi]
        pmovzxbd        xmm0, xmm0
        ret
# GCC9.5 -O2 -march=skylake -mno-avx
# and clang
pmovzxbd_load:
        pmovzxbd        xmm0, DWORD PTR [rdi]
        ret
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文