如何在C6678 DSP上使用SSE指令集?

发布于 2025-01-09 02:39:27 字数 1139 浏览 1 评论 0原文

SSE 只能在 x86 x64 CPU 上使用。我在 TI C6678 上使用 SPEEXDSP 库时遇到问题。我从来没有使用过SSE指令,我尝试了很多方法,但无法让它在DSP上工作。

是否可以将SSE指令修改为普通的C++指令?如何修改呢? 期待你的答复。 例子:

static inline double interpolate_product_double(const float* a, const float* b, unsigned int len, const spx_uint32_t oversample, float* frac) {
int i;
double ret;
__m128d sum;
__m128d sum1 = _mm_setzero_pd();
__m128d sum2 = _mm_setzero_pd();
__m128 f = _mm_loadu_ps(frac);
__m128d f1 = _mm_cvtps_pd(f);
__m128d f2 = _mm_cvtps_pd(_mm_movehl_ps(f, f));
__m128 t;
for (i = 0; i < len; i += 2)
{
    t = _mm_mul_ps(_mm_load1_ps(a + i), _mm_loadu_ps(b + i * oversample));
    sum1 = _mm_add_pd(sum1, _mm_cvtps_pd(t));
    sum2 = _mm_add_pd(sum2, _mm_cvtps_pd(_mm_movehl_ps(t, t)));

    t = _mm_mul_ps(_mm_load1_ps(a + i + 1), _mm_loadu_ps(b + (i + 1) * oversample));
    sum1 = _mm_add_pd(sum1, _mm_cvtps_pd(t));
    sum2 = _mm_add_pd(sum2, _mm_cvtps_pd(_mm_movehl_ps(t, t)));
}
sum1 = _mm_mul_pd(f1, sum1);
sum2 = _mm_mul_pd(f2, sum2);
sum = _mm_add_pd(sum1, sum2);
sum = _mm_add_sd(sum, _mm_unpackhi_pd(sum, sum));
_mm_store_sd(&ret, sum);
return ret;

}

SSE can only be used on x86 x64 CPUs. I have a problem using the SPEEXDSP library on a TI C6678. I've never used the SSE instruction, I've tried many ways and can't get it to work on the DSP.

Is it possible to modify SSE instructions to normal C++ instructions? How to modify it?
Looking forward to your reply.
Example:

static inline double interpolate_product_double(const float* a, const float* b, unsigned int len, const spx_uint32_t oversample, float* frac) {
int i;
double ret;
__m128d sum;
__m128d sum1 = _mm_setzero_pd();
__m128d sum2 = _mm_setzero_pd();
__m128 f = _mm_loadu_ps(frac);
__m128d f1 = _mm_cvtps_pd(f);
__m128d f2 = _mm_cvtps_pd(_mm_movehl_ps(f, f));
__m128 t;
for (i = 0; i < len; i += 2)
{
    t = _mm_mul_ps(_mm_load1_ps(a + i), _mm_loadu_ps(b + i * oversample));
    sum1 = _mm_add_pd(sum1, _mm_cvtps_pd(t));
    sum2 = _mm_add_pd(sum2, _mm_cvtps_pd(_mm_movehl_ps(t, t)));

    t = _mm_mul_ps(_mm_load1_ps(a + i + 1), _mm_loadu_ps(b + (i + 1) * oversample));
    sum1 = _mm_add_pd(sum1, _mm_cvtps_pd(t));
    sum2 = _mm_add_pd(sum2, _mm_cvtps_pd(_mm_movehl_ps(t, t)));
}
sum1 = _mm_mul_pd(f1, sum1);
sum2 = _mm_mul_pd(f2, sum2);
sum = _mm_add_pd(sum1, sum2);
sum = _mm_add_sd(sum, _mm_unpackhi_pd(sum, sum));
_mm_store_sd(&ret, sum);
return ret;

}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

小鸟爱天空丶 2025-01-16 02:39:27

是的,您可以使用 SIMD Everywhere (SIMDe)。它提供了许多内在函数的可移植实现,包括代码中的所有内在函数。全面披露:我是首席开发人员。

编辑:在这里回复 phuclv,因为评论有点长。

SIMDe 目前不使用 c6x 内联实现像我们经常为 NEON、AltiVec/VSX、WASM SIMD 等所做的那样的功能。没有什么可以阻止它,补丁非常受欢迎,但它们还没有出现。

然而,SiMDe 中的每个函数都有回退到标准 C 的后备实现。不过,通常事情不会发展到那么远;即使不考虑上面提到的特定于体系结构的实现,如果编译器支持它,操作也可以使用 GNU C 实现 矢量扩展,甚至便携式回退实际上也用 OpenMP SIMD 指令。转换函数使用诸如 __builtin_convertvector 之类的编译器内置函数,需要混洗数据的函数将使用 __builtin_shuffle / __builtin_shufflevector

基本上,SIMDe 会竭尽全力让编译器尽可能对向量进行向量化,即使 SIMDe 实际上不知道如何执行此操作。上面的函数都非常简单;我对 c6x SIMD 的了解不够,无法了解硬件支持哪些类型的操作,但 GCC 和 clang(TI 编译器所基于的)通常可以很好地利用 SIMDe 提供的所有信息。老实说,我在这里最担心的是 c6x 是否支持 SIMD 中的双精度浮点(上面的代码使用)......它很有可能只支持单精度浮点。

Yes, you can use SIMD Everywhere (SIMDe). It provides portable implementations of many intrinsics, including all of the ones in your code. Full disclosure: I am the lead developer.

Edit: replying to phuclv here since it's a bit long for a comment.

SIMDe doesn't currently use the c6x instrinsics to implement functions like we often do for NEON, AltiVec/VSX, WASM SIMD, etc. There is nothing preventing it, and patches are very much welcome, but they're not there yet.

However, every function in SiMDe has fallback implementations all the way back to standard C. Usually things don't get that far, though; even discounting the architecture-specific implementations mentioned above, if the compiler supports it the operations are also implemented using GNU C vector extensions, and even the portable fallbacks are actually annotated with OpenMP SIMD directives. Conversion functions use compiler built-ins like __builtin_convertvector, and functions which require shuffling data around will use __builtin_shuffle / __builtin_shufflevector.

Basically, SIMDe goes to great lengths to get the compiler to vectorize the whenever possible, even if SIMDe doesn't actually know how to do it. The functions above are all pretty straightforward; I don't know enough about c6x SIMD to know what kind of operations are supported in hardware, but GCC and clang (which the TI compilers are based on) generally do a very good job with all the information SIMDe gives them. Honestly, the thing I'm most worried about here is whether the c6x supports double-precision floating point in SIMD (which the code above uses)... there is a pretty good chance it only supports single-precision floats.

分分钟 2025-01-16 02:39:27

是否可以将SSE指令修改为普通的C++指令?

不存在“C++ 指令”这样的东西,因为 C++ 是一种只有语句而没有指令的高级语言。但是,是的,可以将 SSE 内在函数转换为 C++ 表达式,因为它们只是并行的多个操作

SSE 是 SIMD指令集,因此只需将其转换为目标架构中相应的SIMD即可。在您的情况下 TI C6678 确实有 SIMD 支持

C64x+ 和 C674x DSP 支持 16 位数据的 2 路 SIMD 操作和 8 位数据的 4 路 SIMD 操作。在C66x DSP上,通过扩展SIMD指令的宽度来提高矢量处理能力。 C66x DSP 可以执行对 128 位向量进行操作的指令。

Is it possible to modify SSE instructions to normal C++ instructions?

There's no such thing as "C++ instructions" because C++ is a high level language with only statements and no instructions. But yes it's possible to convert SSE intrinsics to C++ expressions because they're simply multiple operations in parallel

SSE is one of the SIMD instruction sets so just convert it to the corresponding SIMD in the target architecture. In your case TI C6678 does have SIMD support:

The C64x+ and C674x DSPs support 2-way SIMD operations for 16-bit data and 4-way SIMD operations for 8-bit data. On the C66x DSP, the vector processing capability is improved by extending the width of the SIMD instructions. C66x DSPs can execute instructions that operate on 128-bit vectors.

最美的太阳 2025-01-16 02:39:27

C66x 架构确实支持大量 SIMD 指令,在某种程度上可与 Intel 的 SSE 指令相媲美。

您需要了解两种架构中处理器的寄存器集并比较可用的指令。

例如,_mm_add_ps 执行四个单精度浮点数的同时加法,这些浮点数包含在 SSE 寄存器中四乘四。 DSP 有类似的 DADDSP 指令,仅执行两次此类加法。因此,您需要将一个 _mm_add_ps 转换为两个 DADDSP

阅读手册(这些指令集在线),了解指令的用途,并找到等效项。如果出现死胡同,您仍然可以求助于良好的旧标量运算,例如 C[0]= A[0]+B[0]; C[1]=A[1]+B[1];

The C66x architecture indeed supports a number of SIMD instructions, somewhat comparable to those of Intel's SSE.

You need to be aware of the processor's register set in both architectures and compare the available instructions.

For example, _mm_add_ps performs four simultaneous additions of single-precision floats, contained four by four in the SSE registers. The DSP has a similar DADDSP instruction that only performs two such additions. Hence you will need to translate one _mm_add_ps by two DADDSP.

Read the manuals (these instruction sets are online), understand what the instructions are doing, and find the equivalences. In case of a dead-end, you still have the recourse to good old scalar operations, like C[0]= A[0]+B[0]; C[1]= A[1]+B[1];

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文