使用 NEON/ARM 加载 8 位值
我正在尝试将 char 值数组加载到 NEON 寄存器中,然后将它们视为 16 位或 32 位整数值。所以像这样的事情......
void SubVector(short* c, const unsigned char* a, const unsigned char* b, int n)
{
for(int i = 0; i < n; i++)
{
c[i] = (short)a[i] - (short)b[i];
}
}
我不知道如何加载数据。我是否应该将 8 位数据加载到通道中,然后将寄存器重新解释为短路?或者加载并转换?最快的方法是什么?
有谁有关于如何使用 NEON 内在函数执行此操作的示例吗?
谢谢!
I'm trying to load an array of char values into NEON registers, and then treat them as 16-bit or 32-bit integer values. So something like this...
void SubVector(short* c, const unsigned char* a, const unsigned char* b, int n)
{
for(int i = 0; i < n; i++)
{
c[i] = (short)a[i] - (short)b[i];
}
}
I'm not sure how to load the data. Should I load the 8-bit data into lanes, and then reinterpret the registers as shorts? Or load and convert? What would be the fastest way?
Does anyone have a example on how they would do this with NEON intrinsics?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
NEON 具有加法和减法指令,可以将值扩展为 8→16、16→32 或 32→64 位。您可以一次执行 8 个操作,如下所示:
NEON has addition and subtraction instructions that can widen values from 8->16, 16->32 or 32->64 bits. You can do 8 at a time like this: