使用 NEON/ARM 加载 8 位值

发布于 2024-12-29 12:14:23 字数 390 浏览 0 评论 0原文

我正在尝试将 char 值数组加载到 NEON 寄存器中，然后将它们视为 16 位或 32 位整数值。所以像这样的事情......

void SubVector(short* c, const unsigned char* a, const unsigned char* b, int n)
{
    for(int i = 0; i < n; i++)
    {
        c[i] = (short)a[i] - (short)b[i];
    }
}

我不知道如何加载数据。我是否应该将 8 位数据加载到通道中，然后将寄存器重新解释为短路？或者加载并转换？最快的方法是什么？

有谁有关于如何使用 NEON 内在函数执行此操作的示例吗？

谢谢！

原文

I'm trying to load an array of char values into NEON registers, and then treat them as 16-bit or 32-bit integer values. So something like this...

void SubVector(short* c, const unsigned char* a, const unsigned char* b, int n)
{
    for(int i = 0; i < n; i++)
    {
        c[i] = (short)a[i] - (short)b[i];
    }
}

I'm not sure how to load the data. Should I load the 8-bit data into lanes, and then reinterpret the registers as shorts? Or load and convert? What would be the fastest way?

Does anyone have a example on how they would do this with NEON intrinsics?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

多谢你的绝情让我学会死心 2025-01-05 12:14:23

NEON 具有加法和减法指令，可以将值扩展为 8→16、16→32 或 32→64 位。您可以一次执行 8 个操作，如下所示：

uint8x8_t u88_a, u88_b;
uint16x8_t u168_diff;

u88_a = vld1_u8(a); // load 8 unsigned chars from a[]
u88_b = vld1_u8(b); // load 8 unsigned chars from b[]
u168_diff = vsubl_u8(u88_a, u88_b); // calculate the difference and widen to 16-bits

NEON has addition and subtraction instructions that can widen values from 8->16, 16->32 or 32->64 bits. You can do 8 at a time like this:

uint8x8_t u88_a, u88_b;
uint16x8_t u168_diff;

u88_a = vld1_u8(a); // load 8 unsigned chars from a[]
u88_b = vld1_u8(b); // load 8 unsigned chars from b[]
u168_diff = vsubl_u8(u88_a, u88_b); // calculate the difference and widen to 16-bits

回复收藏 0 原文

~没有更多了~