LDR之后,为什么海湾合作委员会会产生额外的添加指令,以加载ARM拇指指令集上的Rodata指针?

发布于 2025-02-04 03:52:47 字数 1617 浏览 3 评论 0原文

此代码:

const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };

const char myTable[] = { 1, 2, 3, 4 };

int keepPadding() {
  return (int)(&padding);
}

int foo() {
  return (int)(&myTable);  // <-- this is the part I'm looking at
}

将拇指指令集的以下组件编译为以下组件(为了清楚起见,缩写)。特别注意添加作为foo的第二个指令:

...
foo:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    @ sp needed
    adds    r0, r0, #10
    bx  lr
.L6:
    .align  2
.L5:
    .word   .LANCHOR0
    .size   foo, .-foo
    .align  1
    .global bar
    .syntax unified
    .code   16
    .thumb_func
    .type   bar, %function

...
myTable:
    .ascii  "\001\002\003\004"

看起来好像是在.rodata的顶部加载指针(ldr),然后以编程方式抵消到mytable的位置(添加)。但是,为什么不直接加载表的地址呢?

注意:当我删除const时,它似乎在没有添加指令的情况下完成(带有mytable in .data

问题的上下文是,我正在尝试将某些C固件进行精选,并注意到此添加了指令,这似乎是多余的,所以我想知道是否有一种方法可以重组我摆脱它的代码。

注意:所有这些都是为ARM拇指指令集编译的,如下所示(使用Arm-None-Aebi-GCC版本11.2.1):

arm-none-eabi-gcc -Os -c -mcpu=cortex-m0 -mthumb temp.c -S

另外,请注意:此处的示例代码旨在表示较大的代码库的片段。如果mytable是唯一汇编的东西,则它以.rodata 添加指令降落0现实情况。为了表示产生此组件的典型现实情况,我在表格之前添加了填充。

另请参见在这里,它是在Godbolt上复制的

This code:

const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };

const char myTable[] = { 1, 2, 3, 4 };

int keepPadding() {
  return (int)(&padding);
}

int foo() {
  return (int)(&myTable);  // <-- this is the part I'm looking at
}

compiles to the following assembly for the thumb instruction set (abbreviated for clarity). Note particularly the adds as the second instruction of foo:

...
foo:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    @ sp needed
    adds    r0, r0, #10
    bx  lr
.L6:
    .align  2
.L5:
    .word   .LANCHOR0
    .size   foo, .-foo
    .align  1
    .global bar
    .syntax unified
    .code   16
    .thumb_func
    .type   bar, %function

...
myTable:
    .ascii  "\001\002\003\004"

It looks like it's loading a pointer (ldr) to the top of .rodata and then programmatically offsetting to the location of myTable (adds). But why not just load the address of the table itself directly?

Note: when I remove the const then it seems to do it without the ADDS instruction (with myTable in .data)

The context of the question is that I'm trying to hand-optimize some C firmware and noticed this adds instruction that seems to be superfluous, so I'm wondering if there's a way to restructure my code to get rid of it.

Note: this is all compiled for the ARM thumb instruction set as follows (using arm-none-eabi-gcc version 11.2.1):

arm-none-eabi-gcc -Os -c -mcpu=cortex-m0 -mthumb temp.c -S

Also note: the example code here is intended to represent a snippet of a larger codebase. If myTable were the only thing compiled then it lands at offset 0 in .rodata and the adds instruction disappears, but that is not the typcial case a real-world scenario. To represent the typical real-world scenario that produces this assembly, I added padding before the table.

See also here it's reproduced on Godbolt

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

牵强ㄟ 2025-02-11 03:52:47

这个问题最初包含了这一点:

const char myTable[] = { 1, 2, 3, 4 };
int foo() {
  return (int)(&myTable);
}


arm-none-eabi-gcc -Os -c -mthumb so.c -o so.o
arm-none-eabi-objdump -D so.o

但是它没有产生添加的内容:

Disassembly of section .text:

00000000 <foo>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <foo+0x4>)
   2:   4770        bx  lr
   4:   00000000    andeq   r0, r0, r0

Disassembly of section .rodata:

00000000 <myTable>:
   0:   04030201    streq   r0, [r3], #-513 ; 0xfffffdff

已经编辑了该问题以显示一个可重复的示例,结果是对此答案进行了编辑的。但是我只会将答案留给同一解决方案。由于感兴趣的是,到达锚点需要一些组件来避免被优化的问题。

因此,从您的问题和此问题中:

const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
const char myTable[] = { 1, 2, 3, 4 };
int foo() {
  return (int)(&myTable);
}

很明显,为什么Mytable占10的抵消。

但是填充是优化的,因此您仍然会得到相同的结果。

因此:

const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
const char myTable[] = { 1, 2, 3, 4 };
int keepPadding() {
  return (int)(&padding);
}
int foo() {
  return (int)(&myTable);
}

并且知道制作最低示例的需要。

arm-none-eabi-gcc -Os -c -mthumb so.c -S


foo:
    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    @ sp needed
    adds    r0, r0, #10
    bx  lr
.L6:
    .align  2
.L5:
    .word   .LANCHOR0
    .size   foo, .-foo
    .global myTable
    .global padding
    .section    .rodata
    .set    .LANCHOR0,. + 0
    .type   padding, %object
    .size   padding, 10
padding:
    .space  10
    .type   myTable, %object
    .size   myTable, 4
myTable:
    .ascii  "\001\002\003\004"
    .ident  "GCC: (GNU) 11.2.0"

该功能的名称意味着您已经知道所有这些,

我怀疑这是允许优化LDR。让我们尝试一下:

 arm-none-eabi-gcc -Os -c -mthumb -mcpu=cortex-m4 so.c -S

foo:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    bx  lr
.L6:
    .align  2
.L5:
    .word   .LANCHOR0+10
    .size   foo, .-foo

00000008 <foo>:
   8:   4800        ldr r0, [pc, #0]    ; (c <foo+0x4>)
   a:   4770        bx  lr
   c:   0000000a    .word   0x0000000a

是的,这样就解决了,但是链接它的又不

Disassembly of section .rodata:

00000000 <padding>:
    ...

0000000a <myTable>:
   a:   04030201    streq   r0, [r3], #-513 ; 0xfffffdff

Disassembly of section .text:

00000010 <keepPadding>:
  10:   4800        ldr r0, [pc, #0]    ; (14 <keepPadding+0x4>)
  12:   4770        bx  lr
  14:   00000000    andeq   r0, r0, r0

00000018 <foo>:
  18:   4801        ldr r0, [pc, #4]    ; (20 <foo+0x8>)
  1a:   300a        adds    r0, #10
  1c:   4770        bx  lr
  1e:   46c0        nop         ; (mov r8, r8)
  20:   00000000    andeq   r0, r0, r0

,希望链接器能替换PC相关负载并将其转换为MOV R0,#0 ...保存负载(可能是) )对不是Cortex-M(甚至Cortex-M)的系统的优化。

注意:这也

arm-none-eabi-gcc -Os -c -mthumb -fno-section-anchors so.c -o so.o

00000008 <foo>:
   8:   4800        ldr r0, [pc, #0]    ; (c <foo+0x4>)
   a:   4770        bx  lr
   c:   00000000    andeq   r0, r0, r0
foo:
    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    @ sp needed
    bx  lr
.L6:
    .align  2
.L5:
    .word   myTable
    .size   foo, .-foo
    .global myTable
    .section    .rodata
    .type   myTable, %object
    .size   myTable, 4
myTable:
    .ascii  "\001\002\003\004"
    .global padding
    .type   padding, %object
    .size   padding, 10

没有使用锚,因此直接使用了mytable的地址。

从我的角度来看,“为什么”是因为使用了锚,而前面的填充物则导致与锚的偏移。因此,负载加载锚地址,然后添加使您从锚点到表。

为什么锚?为读者或其他人锻炼。

The question originally contained just this:

const char myTable[] = { 1, 2, 3, 4 };
int foo() {
  return (int)(&myTable);
}


arm-none-eabi-gcc -Os -c -mthumb so.c -o so.o
arm-none-eabi-objdump -D so.o

but it did not produce the adds:

Disassembly of section .text:

00000000 <foo>:
   0:   4800        ldr r0, [pc, #0]    ; (4 <foo+0x4>)
   2:   4770        bx  lr
   4:   00000000    andeq   r0, r0, r0

Disassembly of section .rodata:

00000000 <myTable>:
   0:   04030201    streq   r0, [r3], #-513 ; 0xfffffdff

The question has been edited to show a repeatable example, and this answer has been edited as a result. But I will just leave the answer to work toward the same solution. As maybe it is of interest that to get to the anchor took a few components to avoid the problem being optimized out.

So from your question and this:

const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
const char myTable[] = { 1, 2, 3, 4 };
int foo() {
  return (int)(&myTable);
}

It is obvious why myTable is at an offset of 10.

But padding is optimized out so you still end up with the same result.

So:

const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
const char myTable[] = { 1, 2, 3, 4 };
int keepPadding() {
  return (int)(&padding);
}
int foo() {
  return (int)(&myTable);
}

The name of that function implies you know all of this already and know what it took to make a minimum example, etc.

arm-none-eabi-gcc -Os -c -mthumb so.c -S


foo:
    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    @ sp needed
    adds    r0, r0, #10
    bx  lr
.L6:
    .align  2
.L5:
    .word   .LANCHOR0
    .size   foo, .-foo
    .global myTable
    .global padding
    .section    .rodata
    .set    .LANCHOR0,. + 0
    .type   padding, %object
    .size   padding, 10
padding:
    .space  10
    .type   myTable, %object
    .size   myTable, 4
myTable:
    .ascii  "\001\002\003\004"
    .ident  "GCC: (GNU) 11.2.0"

It is generating an anchor then referencing from the anchor rather than directly to the label.

I suspect it is to allow for an optimization of the ldr. Let's try:

 arm-none-eabi-gcc -Os -c -mthumb -mcpu=cortex-m4 so.c -S

foo:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    bx  lr
.L6:
    .align  2
.L5:
    .word   .LANCHOR0+10
    .size   foo, .-foo

00000008 <foo>:
   8:   4800        ldr r0, [pc, #0]    ; (c <foo+0x4>)
   a:   4770        bx  lr
   c:   0000000a    .word   0x0000000a

yeah, so that fixed it, but what about linking it

Disassembly of section .rodata:

00000000 <padding>:
    ...

0000000a <myTable>:
   a:   04030201    streq   r0, [r3], #-513 ; 0xfffffdff

Disassembly of section .text:

00000010 <keepPadding>:
  10:   4800        ldr r0, [pc, #0]    ; (14 <keepPadding+0x4>)
  12:   4770        bx  lr
  14:   00000000    andeq   r0, r0, r0

00000018 <foo>:
  18:   4801        ldr r0, [pc, #4]    ; (20 <foo+0x8>)
  1a:   300a        adds    r0, #10
  1c:   4770        bx  lr
  1e:   46c0        nop         ; (mov r8, r8)
  20:   00000000    andeq   r0, r0, r0

Nope, was hoping that the linker would replace the pc-relative load and turn that into a mov r0,#0...Saving the load which is (might be) an optimization for systems that are not cortex-m (or even cortex-m).

Note: this also works

arm-none-eabi-gcc -Os -c -mthumb -fno-section-anchors so.c -o so.o

00000008 <foo>:
   8:   4800        ldr r0, [pc, #0]    ; (c <foo+0x4>)
   a:   4770        bx  lr
   c:   00000000    andeq   r0, r0, r0
foo:
    @ Function supports interworking.
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    @ link register save eliminated.
    ldr r0, .L5
    @ sp needed
    bx  lr
.L6:
    .align  2
.L5:
    .word   myTable
    .size   foo, .-foo
    .global myTable
    .section    .rodata
    .type   myTable, %object
    .size   myTable, 4
myTable:
    .ascii  "\001\002\003\004"
    .global padding
    .type   padding, %object
    .size   padding, 10

The anchor was not used so the address of myTable was used directly.

From my perspective the "why" is because an anchor was used and the padding in front caused myTable to be an offset from the anchor. So the load loads the anchor address then adds gets you from the anchor to the table.

Why the anchor? Exercise for the reader, or someone else.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文