ARM NEON 汇编器 - 使用与使用理解
我是汇编程序和 NEON 编程的新手。 我的任务是使用 NEON 指令将算法的一部分从 C 转换为 ARM 汇编器。 该算法采用一个 int32 数组,从该数组加载不同的值,进行一些位移和异或并将结果写入另一个数组。 稍后我将使用具有 64 位值的数组,但现在我只是尝试重写代码。
C Pseudo code:
out_array[index] = shiftSome( in_array[index] ) ^ shiftSome( in_array[index] );
所以这里是我关于 NEON 指令的问题:
1.) 如果我像这样加载一个寄存器:
vld1.32 d0, [r1]
它会只从内存加载 32 位还是 2x32 位来填充 64 位 Neon D 寄存器?
2.) 如何访问 D 寄存器的 2/4/8(i32、i16、i8)部分?
3.)我试图从带有偏移量的数组中加载不同的值,但它没有 似乎有效...我做错了什么...这是我的代码: (它是一个整数数组,所以我尝试加载例如 3 元素,它的偏移量应该为 64Bit = 8 字节),
asm volatile(
"vld1.32 d0, [%0], #8 \n"
"vst1.32 d0, [%1]" : : "r" (a), "r" (out): "d0", "r5");
其中“a”是数组,“out”是指向整数的指针(对于调试)。
4.)从数组加载一个值后,我需要将其向右移动,但它似乎不起作用:
vshr.u32 d0, d0, #24 // C code: x >> 24;
5.)是否可以在 Neon 寄存器中只加载 1 个字节,这样我就没有移动/屏蔽某些内容以获得我需要的一个字节?
6.) 我需要使用内联汇编器,但我不确定最后一行的用途:
input list : output list : what is this for?
7.) 您知道任何好的 NEON 参考资料和代码示例吗?
如果有什么区别的话,该程序应该在 Samsung Galaxy S2、cortex-A9 处理器上运行。感谢您的帮助。
----------------编辑--------------------------------
这就是我发现的:
- 它将始终加载完整的寄存器(64位)
- 您可以使用“vmov”指令将neon寄存器的一部分传输到arm寄存器。
- 偏移量应该在臂寄存器中,并将被添加到 内存访问之后的基地址。
- 这是“被破坏的注册表”。使用的每个寄存器和 无论是在输入还是输出列表中,都不应该写在这里。
I am new to assembler and NEON programming.
My task is to convert part of an algorithm from C to ARM Assembler using NEON instructions.
The algorithm takes an int32 array, loads different values from this array, does some bitshifting and Xor and writes the result in another array.
Later I will use an array with 64bit values, but for now i just try to rewrite the code.
C Pseudo code:
out_array[index] = shiftSome( in_array[index] ) ^ shiftSome( in_array[index] );
So here are my questions regarding NEON Instructions:
1.) If i load a register like this:
vld1.32 d0, [r1]
will it load only 32Bit from the memory or 2x32Bit to fill the 64Bit Neon D-Register?
2.) How can I access the 2/4/8 (i32, i16, i8) parts of the D-Register?
3.) I am trying to load different values from the array with an offset, but it doesn't
seem to work...what am I doing wrong... here is my code:
(it is an integer array so I´m trying to load for example the 3-element, which should have an offset of 64Bit = 8 Byte)
asm volatile(
"vld1.32 d0, [%0], #8 \n"
"vst1.32 d0, [%1]" : : "r" (a), "r" (out): "d0", "r5");
where "a" is the array and "out" is an pointer to an integer (for debugging).
4.) After I load a value from the array I need to shift it to the right but it doesn't seem to work:
vshr.u32 d0, d0, #24 // C code: x >> 24;
5.) Is it possible to only load 1 Byte in a Neon register so that I don't have to shift/mask something to get only the one Byte i need?
6.) I need to use Inline assembler, but I am not sure what the last line is for:
input list : output list : what is this for?
7.) Do you know any good NEON References with code examples?
The Program should run on an Samsung Galaxy S2, cortex-A9 Processor if that makes any difference. Thanks for the help.
----------------edit-------------------
That is what i found out:
- It will always load the full Register (64Bit)
- You can use the "vmov" instruction to transfer part of a neon register to an arm register.
- The offset should be in an arm register and will be added to the
base address after the memory access. - It is the "clobbered reg list". Every Register that is used and
neither in the input or output list, should be written here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我可以回答您的大部分问题:(更新:澄清“通道”问题)
1)NEON 指令一次只能从内存加载和存储整个寄存器(64 位、128 位)。有一种 MOV 指令变体,允许将单个“通道”移入或移出 ARM 寄存器。
2) 可以使用NEON MOV指令来影响单通道。执行太多单元素操作时,性能会受到影响。 NEON 指令通过对向量(浮点数/整数组)执行并行操作来提高应用程序性能。
3) ARM 汇编语言中的立即数偏移量是字节,而不是元素/寄存器。 NEON 指令允许使用寄存器进行后增量,而不是立即值。对于普通 ARM 指令,后置增量 8 将为源指针添加 8(字节)。
4) NEON 中的移位会影响向量的所有元素。使用 vshr.u32 右移 24 位会将两个 32 位无符号长整型移位 24 位,并丢弃移出的位。
5) NEON 指令允许将单个元素移入和移出普通 ARM 寄存器,但不允许从内存直接加载或存储到“通道”中。
6) ?
7) 从这里开始:http://blogs。 arm.com/software-enablement/161-coding-for-neon-part-1-load-and-stores/ ARM 网站有一个关于 NEON 的很好的教程。
I can answer most of your questions: (update: clarified "lane" issue)
1) NEON instructions can only load and store entire registers (64-bit, 128-bit) at a time to and from memory. There is a MOV instruction variant that allows single "lanes" to be moved to or from ARM registers.
2) You can use the NEON MOV instruction to affect single lanes. Performance will suffer when doing too many single element operations. NEON instructions benefit application performance by doing parallel operations on vectors (groups of floats/ints).
3) The immediate value offsets in ARM assembly language are bytes, not elements/registers. NEON instructions allow post increment with a register, not immediate value. For normal ARM instructions, your post-increment of 8 will add 8 (bytes) to the source pointer.
4) Shifts in NEON affect all elements of a vector. A shift right of 24 bits using vshr.u32 will shift both 32-bit unsigned longs by 24 bits and throw away the bits that get shifted out.
5) NEON instructions allow moving single elements in and out of normal ARM registers, but don't allow loads or stores from memory directly into "lanes".
6) ?
7) Start here: http://blogs.arm.com/software-enablement/161-coding-for-neon-part-1-load-and-stores/ The ARM site has a good tutorial on NEON.
6) 寄存器被破坏。
如果您使用的寄存器尚未作为操作数传递,则需要通知
编译器对此进行了说明。以下代码会将值调整为四的倍数。它
使用 r3 作为暂存寄存器,并通过在
破坏清单。此外,CPU 状态标志由 ands 指令修改。
将伪寄存器 cc 添加到 clobber 列表将使编译器了解
这个修改也是如此。
6) Clobbered registers.
If you are using registers, which had not been passed as operands, you need to inform
the compiler about this. The following code will adjust a value to a multiple of four. It
uses r3 as a scratch register and lets the compiler know about this by specifying r3 in the
clobber list. Furthermore the CPU status flags are modified by the ands instruction.
Adding the pseudo register cc to the clobber list will keep the compiler informed about
this modification as well.