ARM 汇编:存储上的自动递增寄存器
是否可以使用 [Rn]!
自动递增 STR 上寄存器的基地址?我仔细阅读了文档,但未能找到明确的答案,主要是因为命令语法同时适用于 LDR 和 STR - 理论上它应该适用于两者,但我找不到任何 auto 的示例- 在商店上递增(加载正常)。
我编写了一个小程序,将两个数字存储在向量中。完成后,out
的内容应为 {1, 2}
,但存储会覆盖第一个字节,就好像自动增量不起作用一样。
#include <stdio.h>
int main()
{
int out[]={0, 0};
asm volatile (
"mov r0, #1 \n\t"
"str r0, [%0]! \n\t"
"add r0, r0, #1 \n\t"
"str r0, [%0] \n\t"
:: "r"(out)
: "r0" );
printf("%d %d\n", out[0], out[1]);
return 0;
}
编辑: 虽然答案对于常规加载和存储来说是正确的,但我发现优化器搞乱了向量指令(例如 vldm/vstm)的自动增量。例如,
#include <stdio.h>
int main()
{
volatile int *in = new int[16];
volatile int *out = new int[16];
for (int i=0;i<16;i++) in[i] = i;
asm volatile (
"vldm %0!, {d0-d3} \n\t"
"vldm %0, {d4-d7} \n\t"
"vstm %1!, {d0-d3} \n\t"
"vstm %1, {d4-d7} \n\t"
:: "r"(in), "r"(out)
: "memory" );
for (int i=0;i<16;i++) printf("%d\n", out[i]);
return 0;
}
编译的
g++ -O2 -march=armv7-a -mfpu=neon main.cpp -o main
以下程序将在最后 8 个变量的输出上产生乱码,因为优化器保留增量变量并将其用于 printf。换句话说,out[i]
实际上是 out[i+8]
,因此前 8 个打印值是向量中的后 8 个值,其余的是内存位置出界。
我在整个代码中尝试了 易失性
关键字的不同组合,但只有当我使用 -O0
标志进行编译或使用易失性向量时,行为才会改变指针和新的,就像
volatile int out[16];
Is it possible to auto-increment the base address of a register on a STR with a [Rn]!
? I've peered through the documentation but haven't been able to find a definitive answer, mainly because the command syntax is presented for both LDR and STR - in theory it should work for both, but I couldn't find any examples of auto-incrementing on a store (the loading works ok).
I've made a small program which stores two numbers in a vector. When it's done the contents of out
should be {1, 2}
but the store overwrites the first byte, as if the auto-increment isn't working.
#include <stdio.h>
int main()
{
int out[]={0, 0};
asm volatile (
"mov r0, #1 \n\t"
"str r0, [%0]! \n\t"
"add r0, r0, #1 \n\t"
"str r0, [%0] \n\t"
:: "r"(out)
: "r0" );
printf("%d %d\n", out[0], out[1]);
return 0;
}
EDIT:
While the answer was right for regular loads and stores, I found that the optimizer messes up auto-increment on vector instructions such as vldm/vstm. For instance, the following program
#include <stdio.h>
int main()
{
volatile int *in = new int[16];
volatile int *out = new int[16];
for (int i=0;i<16;i++) in[i] = i;
asm volatile (
"vldm %0!, {d0-d3} \n\t"
"vldm %0, {d4-d7} \n\t"
"vstm %1!, {d0-d3} \n\t"
"vstm %1, {d4-d7} \n\t"
:: "r"(in), "r"(out)
: "memory" );
for (int i=0;i<16;i++) printf("%d\n", out[i]);
return 0;
}
compiled with
g++ -O2 -march=armv7-a -mfpu=neon main.cpp -o main
will produce gibberish on the output of the last 8 variables, because the optimizer is keeping the incremented variable and using it for the printf. In other words, out[i]
is actually out[i+8]
, so the first 8 printed values are the last 8 from the vector and the rest are memory locations out of bounds.
I've tried with different combinations of the volatile
keyword throughout the code, but the behavior changes only if I compile with the -O0
flag or if I use a volatile vector instead of a pointer and new, like
volatile int out[16];
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
对于存储和加载,您可以这样做:
无论您在末尾放置什么,在本例中为 4,都会在寄存器用于地址之后但在指令已完成,它非常类似于
编辑,您需要查看反汇编以查看发生了什么(如果有)。我正在使用最新的代码源,或者现在只是来自导师图形工具链的精简版源。
arm-none-linux-gnueabi-gcc (Sourcery CodeBench Lite 2011.09-70) 4.6.1
所以
分配两个本地 int out[0] 和 out[1]
是因为它们初始化为零,然后是内联汇编
,然后是 printf:
现在很清楚为什么它不起作用了。你没有声明为不稳定的。您没有给代码返回 ram 来获取 printf 的 out[0] 和 out[1] 的值,编译器知道 r4 包含 out[0] 和 out[1] 的值,那里这个函数中的代码非常少,以至于它不必逐出 r4 并重用它,因此它使用 r4 作为 printf。
如果你把它改成易失性的
,那么你应该得到想要的结果:
printf 的准备工作是从 ram 读取的。
For store and load you do this:
whatever you put at the end, 4 in this case, is added to the base register (r1 in the ldr example and r2 in the str example) after the register is used for the address but before the instruction has completed it is very much like
EDIT, you need to look at the disassembly to see what is going on, if anything. I am using the latest code sourcery or now just sourcery lite from mentor graphics toolchain.
arm-none-linux-gnueabi-gcc (Sourcery CodeBench Lite 2011.09-70) 4.6.1
so the
is to allocate the two local ints out[0] and out[1]
is because they are initialized to zero, then comes the inline assembly
and then the printf:
and now it is clear why it didnt work. you are didnt declare out as volatile. You gave the code no reason to go back to ram to get the values of out[0] and out[1] for the printf, the compiler knows that r4 contains the value for both out[0] and out[1], there is so little code in this function that it didnt have to evict r4 and reuse it so it used r4 for the printf.
If you change it to be volatile
Then you should get the desired result:
the preparation for printf reads from ram.
GCC 内联汇编器要求所有修改的寄存器和非易失性变量都列为输出或破坏者。在第二个示例中,GCC 可能并且确实假设分配给
in
和out
的寄存器不会改变。正确的做法是:
GCC inline assembler requires that all modified registers and non-volatile variables are listed as outputs or clobbers. In the second example GCC may and does assume that the registers allocated to
in
andout
do not change.A correct approach would be:
我在寻找类似问题的答案时发现了这个问题:如何绑定输入/输出寄存器。内联汇编器约束的 GCC 文档指出,输入寄存器列表中的
+
前缀指定输入/输出寄存器。在这个例子中,在我看来,您更愿意保留变量
out
的原始值。尽管如此,如果您想使用指令的后递增(!
)变体,我认为您应该将参数声明为读/写。以下内容在我的 Raspberry Pi 2 上有效:通过这种方式,编译器可以清楚地了解代码的语义:
in
和out
指针都将更改(增加8 个元素)。免责声明:我不知道 ARM ABI 是否允许函数自由破坏 NEON 寄存器 d0 到 d7。在这个简单的例子中,这可能并不重要。
I found this question while searching for the answer for a similar question: How to bind an input/output register. The GCC documentation of the inline assembler constrants says that the
+
prefix in the input register list designates an input/output register.In the example, it seems to me that you would prefer to preserve the original value of the variable
out
. Nevertheless, if you want to use the post-increment (!
) variant of the instructions, I think that you should declare the parameters as read/write. The following worked on my Raspberry Pi 2:In this way, the semantics of the code is clear to the compiler: both the
in
andout
pointers will be changed (incremented by 8 elements).Disclaimer: I do not know if the ARM ABI allows a function to freely clobber the NEON registers d0 through d7. In this simple example it probably does not matter.