ARM 汇编：存储上的自动递增寄存器

发布于 2025-01-02 08:15:03 字数 1598 浏览 4 评论 0原文

是否可以使用 [Rn]! 自动递增 STR 上寄存器的基地址？我仔细阅读了文档，但未能找到明确的答案，主要是因为命令语法同时适用于 LDR 和 STR - 理论上它应该适用于两者，但我找不到任何 auto 的示例- 在商店上递增（加载正常）。

我编写了一个小程序，将两个数字存储在向量中。完成后，out 的内容应为 {1, 2}，但存储会覆盖第一个字节，就好像自动增量不起作用一样。

#include <stdio.h>

int main()
{
        int out[]={0, 0};
        asm volatile (
        "mov    r0, #1          \n\t"
        "str    r0, [%0]!       \n\t"
        "add    r0, r0, #1      \n\t"
        "str    r0, [%0]        \n\t"
        :: "r"(out)
        : "r0" );
        printf("%d %d\n", out[0], out[1]);
        return 0;
}

编辑：虽然答案对于常规加载和存储来说是正确的，但我发现优化器搞乱了向量指令（例如 vldm/vstm）的自动增量。例如，

#include <stdio.h>

int main()
{
        volatile int *in = new int[16];
        volatile int *out = new int[16];

        for (int i=0;i<16;i++) in[i] = i;

        asm volatile (
        "vldm   %0!, {d0-d3}            \n\t"
        "vldm   %0,  {d4-d7}            \n\t"
        "vstm   %1!, {d0-d3}            \n\t"
        "vstm   %1,  {d4-d7}            \n\t"
        :: "r"(in), "r"(out)
        : "memory" );

        for (int i=0;i<16;i++) printf("%d\n", out[i]);
        return 0;
}

编译的

g++ -O2 -march=armv7-a -mfpu=neon main.cpp -o main

以下程序将在最后 8 个变量的输出上产生乱码，因为优化器保留增量变量并将其用于 printf。换句话说，out[i] 实际上是 out[i+8]，因此前 8 个打印值是向量中的后 8 个值，其余的是内存位置出界。

我在整个代码中尝试了 易失性 关键字的不同组合，但只有当我使用 -O0 标志进行编译或使用易失性向量时，行为才会改变指针和新的，就像

volatile int out[16];

原文

Is it possible to auto-increment the base address of a register on a STR with a [Rn]!? I've peered through the documentation but haven't been able to find a definitive answer, mainly because the command syntax is presented for both LDR and STR - in theory it should work for both, but I couldn't find any examples of auto-incrementing on a store (the loading works ok).

I've made a small program which stores two numbers in a vector. When it's done the contents of out should be {1, 2} but the store overwrites the first byte, as if the auto-increment isn't working.

#include <stdio.h>

int main()
{
        int out[]={0, 0};
        asm volatile (
        "mov    r0, #1          \n\t"
        "str    r0, [%0]!       \n\t"
        "add    r0, r0, #1      \n\t"
        "str    r0, [%0]        \n\t"
        :: "r"(out)
        : "r0" );
        printf("%d %d\n", out[0], out[1]);
        return 0;
}

EDIT:
While the answer was right for regular loads and stores, I found that the optimizer messes up auto-increment on vector instructions such as vldm/vstm. For instance, the following program

#include <stdio.h>

int main()
{
        volatile int *in = new int[16];
        volatile int *out = new int[16];

        for (int i=0;i<16;i++) in[i] = i;

        asm volatile (
        "vldm   %0!, {d0-d3}            \n\t"
        "vldm   %0,  {d4-d7}            \n\t"
        "vstm   %1!, {d0-d3}            \n\t"
        "vstm   %1,  {d4-d7}            \n\t"
        :: "r"(in), "r"(out)
        : "memory" );

        for (int i=0;i<16;i++) printf("%d\n", out[i]);
        return 0;
}

compiled with

g++ -O2 -march=armv7-a -mfpu=neon main.cpp -o main

will produce gibberish on the output of the last 8 variables, because the optimizer is keeping the incremented variable and using it for the printf. In other words, out[i] is actually out[i+8], so the first 8 printed values are the last 8 from the vector and the rest are memory locations out of bounds.

I've tried with different combinations of the volatile keyword throughout the code, but the behavior changes only if I compile with the -O0 flag or if I use a volatile vector instead of a pointer and new, like

volatile int out[16];

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

银河中√捞星星 2025-01-09 08:15:03

对于存储和加载，您可以这样做：

ldr r0,[r1],#4
str r0,[r2],#4

无论您在末尾放置什么，在本例中为 4，都会在寄存器用于地址之后但在指令已完成，它非常类似于

unsigned int a,*b,*c;
...
a = *b++;
*c++ = a;

编辑，您需要查看反汇编以查看发生了什么（如果有）。我正在使用最新的代码源，或者现在只是来自导师图形工具链的精简版源。

arm-none-linux-gnueabi-gcc (Sourcery CodeBench Lite 2011.09-70) 4.6.1

#include <stdio.h>
int main ()
{
        int out[]={0, 0};
        asm volatile (
        "mov    r0, #1          \n\t"
        "str    r0, [%0], #4       \n\t"
        "add    r0, r0, #1      \n\t"
        "str    r0, [%0]        \n\t"
        :: "r"(out)
        : "r0" );
        printf("%d %d\n", out[0], out[1]);
        return 0;
}


arm-none-linux-gnueabi-gcc str.c -O2  -o str.elf

arm-none-linux-gnueabi-objdump -D str.elf > str.list


00008380 <main>:
    8380:   e92d4010    push    {r4, lr}
    8384:   e3a04000    mov r4, #0
    8388:   e24dd008    sub sp, sp, #8
    838c:   e58d4000    str r4, [sp]
    8390:   e58d4004    str r4, [sp, #4]
    8394:   e1a0300d    mov r3, sp
    8398:   e3a00001    mov r0, #1
    839c:   e4830004    str r0, [r3], #4
    83a0:   e2800001    add r0, r0, #1
    83a4:   e5830000    str r0, [r3]
    83a8:   e59f0014    ldr r0, [pc, #20]   ; 83c4 <main+0x44>
    83ac:   e1a01004    mov r1, r4
    83b0:   e1a02004    mov r2, r4
    83b4:   ebffffe5    bl  8350 <_init+0x20>
    83b8:   e1a00004    mov r0, r4
    83bc:   e28dd008    add sp, sp, #8
    83c0:   e8bd8010    pop {r4, pc}
    83c4:   0000854c    andeq   r8, r0, ip, asr #10

所以

sub sp, sp, #8

分配两个本地 int out[0] 和 out[1]

mov r4,#0
str r4,[sp]
str r4,[sp,#4]

是因为它们初始化为零，然后是内联汇编

8398:   e3a00001    mov r0, #1
839c:   e4830004    str r0, [r3], #4
83a0:   e2800001    add r0, r0, #1
83a4:   e5830000    str r0, [r3]

，然后是 printf：

83a8:   e59f0014    ldr r0, [pc, #20]   ; 83c4 <main+0x44>
83ac:   e1a01004    mov r1, r4
83b0:   e1a02004    mov r2, r4
83b4:   ebffffe5    bl  8350 <_init+0x20>

现在很清楚为什么它不起作用了。你没有声明为不稳定的。您没有给代码返回 ram 来获取 printf 的 out[0] 和 out[1] 的值，编译器知道 r4 包含 out[0] 和 out[1] 的值，那里这个函数中的代码非常少，以至于它不必逐出 r4 并重用它，因此它使用 r4 作为 printf。

如果你把它改成易失性的

    volatile int out[]={0, 0};

，那么你应该得到想要的结果：

83a8:   e59f0014    ldr r0, [pc, #20]   ; 83c4 <main+0x44>
83ac:   e59d1000    ldr r1, [sp]
83b0:   e59d2004    ldr r2, [sp, #4]
83b4:   ebffffe5    bl  8350 <_init+0x20>

printf 的准备工作是从 ram 读取的。

For store and load you do this:

ldr r0,[r1],#4
str r0,[r2],#4

whatever you put at the end, 4 in this case, is added to the base register (r1 in the ldr example and r2 in the str example) after the register is used for the address but before the instruction has completed it is very much like

unsigned int a,*b,*c;
...
a = *b++;
*c++ = a;

EDIT, you need to look at the disassembly to see what is going on, if anything. I am using the latest code sourcery or now just sourcery lite from mentor graphics toolchain.

arm-none-linux-gnueabi-gcc (Sourcery CodeBench Lite 2011.09-70) 4.6.1

#include <stdio.h>
int main ()
{
        int out[]={0, 0};
        asm volatile (
        "mov    r0, #1          \n\t"
        "str    r0, [%0], #4       \n\t"
        "add    r0, r0, #1      \n\t"
        "str    r0, [%0]        \n\t"
        :: "r"(out)
        : "r0" );
        printf("%d %d\n", out[0], out[1]);
        return 0;
}


arm-none-linux-gnueabi-gcc str.c -O2  -o str.elf

arm-none-linux-gnueabi-objdump -D str.elf > str.list


00008380 <main>:
    8380:   e92d4010    push    {r4, lr}
    8384:   e3a04000    mov r4, #0
    8388:   e24dd008    sub sp, sp, #8
    838c:   e58d4000    str r4, [sp]
    8390:   e58d4004    str r4, [sp, #4]
    8394:   e1a0300d    mov r3, sp
    8398:   e3a00001    mov r0, #1
    839c:   e4830004    str r0, [r3], #4
    83a0:   e2800001    add r0, r0, #1
    83a4:   e5830000    str r0, [r3]
    83a8:   e59f0014    ldr r0, [pc, #20]   ; 83c4 <main+0x44>
    83ac:   e1a01004    mov r1, r4
    83b0:   e1a02004    mov r2, r4
    83b4:   ebffffe5    bl  8350 <_init+0x20>
    83b8:   e1a00004    mov r0, r4
    83bc:   e28dd008    add sp, sp, #8
    83c0:   e8bd8010    pop {r4, pc}
    83c4:   0000854c    andeq   r8, r0, ip, asr #10

so the

sub sp, sp, #8

is to allocate the two local ints out[0] and out[1]

mov r4,#0
str r4,[sp]
str r4,[sp,#4]

is because they are initialized to zero, then comes the inline assembly

8398:   e3a00001    mov r0, #1
839c:   e4830004    str r0, [r3], #4
83a0:   e2800001    add r0, r0, #1
83a4:   e5830000    str r0, [r3]

and then the printf:

83a8:   e59f0014    ldr r0, [pc, #20]   ; 83c4 <main+0x44>
83ac:   e1a01004    mov r1, r4
83b0:   e1a02004    mov r2, r4
83b4:   ebffffe5    bl  8350 <_init+0x20>

and now it is clear why it didnt work. you are didnt declare out as volatile. You gave the code no reason to go back to ram to get the values of out[0] and out[1] for the printf, the compiler knows that r4 contains the value for both out[0] and out[1], there is so little code in this function that it didnt have to evict r4 and reuse it so it used r4 for the printf.

If you change it to be volatile

    volatile int out[]={0, 0};

Then you should get the desired result:

83a8:   e59f0014    ldr r0, [pc, #20]   ; 83c4 <main+0x44>
83ac:   e59d1000    ldr r1, [sp]
83b0:   e59d2004    ldr r2, [sp, #4]
83b4:   ebffffe5    bl  8350 <_init+0x20>

the preparation for printf reads from ram.

回复收藏 0 原文

渔村楼浪 2025-01-09 08:15:03

GCC 内联汇编器要求所有修改的寄存器和非易失性变量都列为输出或破坏者。在第二个示例中，GCC 可能并且确实假设分配给 in 和 out 的寄存器不会改变。

正确的做法是：

out_temp = out;
asm volatile ("..." : "+r"(in), "+r"(out_temp) :: "memory" );

GCC inline assembler requires that all modified registers and non-volatile variables are listed as outputs or clobbers. In the second example GCC may and does assume that the registers allocated to in and out do not change.

A correct approach would be:

out_temp = out;
asm volatile ("..." : "+r"(in), "+r"(out_temp) :: "memory" );

回复收藏 0 原文

娇俏 2025-01-09 08:15:03

我在寻找类似问题的答案时发现了这个问题：如何绑定输入/输出寄存器。内联汇编器约束的 GCC 文档指出，输入寄存器列表中的 + 前缀指定输入/输出寄存器。

在这个例子中，在我看来，您更愿意保留变量 out 的原始值。尽管如此，如果您想使用指令的后递增（！）变体，我认为您应该将参数声明为读/写。以下内容在我的 Raspberry Pi 2 上有效：

#include <stdio.h>

int main()
{
  int* in = new int(16);
  volatile int* out = new int(16);

  for (int i=0; i<16; i++) in[i]=i;

  asm volatile(
    "vldm %0!, {d0-d3}\n\t"
    "vldm %0, {d4-d7}\n\t"
    "vstm %1!, {d0-d3}\n\t"
    "vstm %1, {d4-d7}\n\t"
    :"+r"(in), "+r"(out) :: "memory");

  for (int i=0; i<16; i++) printf("%d\n", out[i-8]);
  return 0;
}

通过这种方式，编译器可以清楚地了解代码的语义：in 和 out 指针都将更改（增加8 个元素）。

免责声明：我不知道 ARM ABI 是否允许函数自由破坏 NEON 寄存器 d0 到 d7。在这个简单的例子中，这可能并不重要。

I found this question while searching for the answer for a similar question: How to bind an input/output register. The GCC documentation of the inline assembler constrants says that the + prefix in the input register list designates an input/output register.

In the example, it seems to me that you would prefer to preserve the original value of the variable out. Nevertheless, if you want to use the post-increment (!) variant of the instructions, I think that you should declare the parameters as read/write. The following worked on my Raspberry Pi 2:

#include <stdio.h>

int main()
{
  int* in = new int(16);
  volatile int* out = new int(16);

  for (int i=0; i<16; i++) in[i]=i;

  asm volatile(
    "vldm %0!, {d0-d3}\n\t"
    "vldm %0, {d4-d7}\n\t"
    "vstm %1!, {d0-d3}\n\t"
    "vstm %1, {d4-d7}\n\t"
    :"+r"(in), "+r"(out) :: "memory");

  for (int i=0; i<16; i++) printf("%d\n", out[i-8]);
  return 0;
}

In this way, the semantics of the code is clear to the compiler: both the in and out pointers will be changed (incremented by 8 elements).

Disclaimer: I do not know if the ARM ABI allows a function to freely clobber the NEON registers d0 through d7. In this simple example it probably does not matter.

回复收藏 0 原文

~没有更多了~