查询 -ffunction-section & gcc 的 -fdata-sections 选项

发布于 2024-10-05 02:34:59 字数 500 浏览 2 评论 0原文

下面在 GCC 页面中提到的函数部分和数据部分选项：

<前><代码>-ffunction-sections -fdata-部分
如果目标支持任意部分，请将每个函数或数据项放入输出文件中自己的部分中。函数的名称或数据项的名称决定了输出文件中的节的名称。在链接器可以执行优化的系统上使用这些选项，以提高指令空间中引用的局部性。大多数使用 ELF 对象格式和运行 Solaris 2 的 SPARC 处理器的系统都具有具有此类优化的链接器。 AIX 将来可能会有这些优化。
仅当这样做可以带来显着好处时才使用这些选项。 当您指定这些选项时，汇编器和链接器将创建更大的对象和可执行文件，并且速度也会变慢。如果您指定此选项，如果同时指定此选项和 -g，则调试可能会出现问题。

我的印象是这些选项将有助于减少可执行文件的大小。为什么这个页面说它将创建更大的可执行文件？我错过了什么吗？

原文

The below mentioned in the GCC Page for the function sections and data sections options:

-ffunction-sections
-fdata-sections
Place each function or data item into its own section in the output file if the target supports arbitrary sections. The name of the function or the name of the data item determines the section's name in the output file.
Use these options on systems where the linker can perform optimizations to improve locality of reference in the instruction space. Most systems using the ELF object format and SPARC processors running Solaris 2 have linkers with such optimizations. AIX may have these optimizations in the future.
Only use these options when there are significant benefits from doing so. When you specify these options, the assembler and linker will create larger object and executable files and will also be slower. You will not be able to use gprof on all systems if you specify this option and you may have problems with debugging if you specify both this option and -g.

I was under the impression that these options will help in reducing the executable file size. Why does this page say that it will create larger executable files? Am I missing something?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

多像笑话 2024-10-12 02:34:59

有趣的是，使用 -fdata-sections 可以使函数的文字池变大，从而使函数本身变得更大。我在 ARM 上尤其注意到了这一点，但在其他地方也可能如此。我测试的二进制文件只增长了四分之一，但它确实增长了。看看更改后的函数的反汇编，原因就很清楚了。

如果目标文件中的所有 BSS（或 DATA）条目都分配给单个节，则编译器可以将该节的地址存储在函数文字池中，并生成具有距函数中该地址已知偏移量的负载以访问您的文件。数据。但是，如果您启用 -fdata-sections ，它会将每段 BSS（或 DATA）数据放入其自己的部分中，并且因为它不知道这些部分中的哪些部分稍后可能会被垃圾收集，或者什么为了使链接器将所有这些部分放入最终的可执行映像中，它不能再使用单个地址的偏移量加载数据。因此，它必须在每个使用的数据的文字池中分配一个条目，一旦链接器弄清楚最终图像中的内容和位置，它就可以使用实际地址来修复这些文字池条目数据。

所以，是的，即使使用 -Wl,--gc-sections ，生成的图像可能会更大，因为实际的函数文本更大。

下面我添加了一个最小的示例

下面的代码足以了解我正在谈论的行为。请不要被 volatile 声明和全局变量的使用所迷惑，这两者在实际代码中都是有问题的。在这里，它们确保使用 -fdata-sections 时创建两个数据部分。

static volatile int head;
static volatile int tail;

int queue_empty(void)
{
    return head == tail;
}

用于此测试的 GCC 版本是：

gcc version 6.1.1 20160526 (Arch Repository)

首先，在没有 -fdata-sections 的情况下，我们得到以下结果。

> arm-none-eabi-gcc -march=armv6-m \
                    -mcpu=cortex-m0 \
                    -mthumb \
                    -Os \
                    -c \
                    -o test.o \
                    test.c

> arm-none-eabi-objdump -dr test.o

00000000 <queue_empty>:
 0: 4b03     ldr   r3, [pc, #12]   ; (10 <queue_empty+0x10>)
 2: 6818     ldr   r0, [r3, #0]
 4: 685b     ldr   r3, [r3, #4]
 6: 1ac0     subs  r0, r0, r3
 8: 4243     negs  r3, r0
 a: 4158     adcs  r0, r3
 c: 4770     bx    lr
 e: 46c0     nop                   ; (mov r8, r8)
10: 00000000 .word 0x00000000
             10: R_ARM_ABS32 .bss

> arm-none-eabi-nm -S test.o

00000000 00000004 b head
00000000 00000014 T queue_empty
00000004 00000004 b tail

从arm-none-eabi-nm我们看到queue_empty有20个字节长（14十六进制），并且arm-none-eabi-objdump输出显示有一个函数末尾的单个重定位字，它是 BSS 部分（未初始化数据的部分）的地址。函数中的第一条指令将该值（BSS 的地址）加载到 r3 中。接下来的两条指令相对于 r3 加载，分别偏移 0 和 4 个字节。这两个载荷是head和tail值的载荷。我们可以在arm-none-eabi-nm输出的第一列中看到这些偏移量。函数末尾的nop用于对文字池的地址进行字对齐。

接下来我们将看看添加 -fdata-sections 后会发生什么。

arm-none-eabi-gcc -march=armv6-m \
                  -mcpu=cortex-m0 \
                  -mthumb \
                  -Os \
                  -fdata-sections \
                  -c \
                  -o test.o \
                  test.c

arm-none-eabi-objdump -dr test.o

00000000 <queue_empty>:
 0: 4b03     ldr   r3, [pc, #12]    ; (10 <queue_empty+0x10>)
 2: 6818     ldr   r0, [r3, #0]
 4: 4b03     ldr   r3, [pc, #12]    ; (14 <queue_empty+0x14>)
 6: 681b     ldr   r3, [r3, #0]
 8: 1ac0     subs  r0, r0, r3
 a: 4243     negs  r3, r0
 c: 4158     adcs  r0, r3
 e: 4770     bx    lr
    ...
             10: R_ARM_ABS32 .bss.head
             14: R_ARM_ABS32 .bss.tail

arm-none-eabi-nm -S test.o

00000000 00000004 b head
00000000 00000018 T queue_empty
00000000 00000004 b tail

我们立即看到queue_empty的长度增加了4个字节，达到24个字节（18进制），并且现在queue_empty的文字池中有两次重定位需要完成。这些重定位对应于所创建的两个 BSS 部分的地址，每个部分对应一个全局变量。这里需要有两个地址，因为编译器无法知道链接器最终将两个部分放入的相对位置。查看queue_empty开头的指令，我们看到有一个额外的负载，编译器必须生成单独的加载对来获取该部分的地址，然后获取该部分中变量的值。此版本的queue_empty 中的额外指令不会使函数体变长，它只是占用了之前为nop 的位置，但一般情况并非如此。

Interestingly, using -fdata-sections can make the literal pools of your functions, and thus your functions themselves larger. I've noticed this on ARM in particular, but it's likely to be true elsewhere. The binary I was testing only grew by a quarter of a percent, but it did grow. Looking at the disassembly of the changed functions it was clear why.

If all of the BSS (or DATA) entries in your object file are allocated to a single section then the compiler can store the address of that section in the functions literal pool and generate loads with known offsets from that address in the function to access your data. But if you enable -fdata-sections it puts each piece of BSS (or DATA) data into its own section, and since it doesn't know which of these sections might be garbage collected later, or what order the linker will place all of these sections into the final executable image, it can no longer load data using offsets from a single address. So instead, it has to allocate an entry in the literal pool per used data, and once the linker has figured out what is going into the final image and where, then it can go and fix up these literal pool entries with the actual address of the data.

So yes, even with -Wl,--gc-sections the resulting image can be larger because the actual function text is larger.

Below I've added a minimal example

The code below is enough to see the behavior I'm talking about. Please don't be thrown off by the volatile declaration and use of global variables, both of which are questionable in real code. Here they ensure the creation of two data sections when -fdata-sections is used.

static volatile int head;
static volatile int tail;

int queue_empty(void)
{
    return head == tail;
}

The version of GCC used for this test is:

gcc version 6.1.1 20160526 (Arch Repository)

First, without -fdata-sections we get the following.

> arm-none-eabi-gcc -march=armv6-m \
                    -mcpu=cortex-m0 \
                    -mthumb \
                    -Os \
                    -c \
                    -o test.o \
                    test.c

> arm-none-eabi-objdump -dr test.o

00000000 <queue_empty>:
 0: 4b03     ldr   r3, [pc, #12]   ; (10 <queue_empty+0x10>)
 2: 6818     ldr   r0, [r3, #0]
 4: 685b     ldr   r3, [r3, #4]
 6: 1ac0     subs  r0, r0, r3
 8: 4243     negs  r3, r0
 a: 4158     adcs  r0, r3
 c: 4770     bx    lr
 e: 46c0     nop                   ; (mov r8, r8)
10: 00000000 .word 0x00000000
             10: R_ARM_ABS32 .bss

> arm-none-eabi-nm -S test.o

00000000 00000004 b head
00000000 00000014 T queue_empty
00000004 00000004 b tail

From arm-none-eabi-nm we see that queue_empty is 20 bytes long (14 hex), and the arm-none-eabi-objdump output shows that there is a single relocation word at the end of the function, it's the address of the BSS section (the section for uninitialized data). The first instruction in the function loads that value (the address of the BSS) into r3. The next two instructions load relative to r3, offsetting by 0 and 4 bytes respectively. These two loads are the loads of the values of head and tail. We can see those offsets in the first column of the output from arm-none-eabi-nm. The nop at the end of the function is to word align the address of the literal pool.

Next we'll see what happens when -fdata-sections is added.

arm-none-eabi-gcc -march=armv6-m \
                  -mcpu=cortex-m0 \
                  -mthumb \
                  -Os \
                  -fdata-sections \
                  -c \
                  -o test.o \
                  test.c

arm-none-eabi-objdump -dr test.o

00000000 <queue_empty>:
 0: 4b03     ldr   r3, [pc, #12]    ; (10 <queue_empty+0x10>)
 2: 6818     ldr   r0, [r3, #0]
 4: 4b03     ldr   r3, [pc, #12]    ; (14 <queue_empty+0x14>)
 6: 681b     ldr   r3, [r3, #0]
 8: 1ac0     subs  r0, r0, r3
 a: 4243     negs  r3, r0
 c: 4158     adcs  r0, r3
 e: 4770     bx    lr
    ...
             10: R_ARM_ABS32 .bss.head
             14: R_ARM_ABS32 .bss.tail

arm-none-eabi-nm -S test.o

00000000 00000004 b head
00000000 00000018 T queue_empty
00000000 00000004 b tail

Immediately we see that the length of queue_empty has increased by four bytes to 24 bytes (18 hex), and that there are now two relocations to be done in queue_empty's literal pool. These relocations correspond to the addresses of the two BSS sections that were created, one for each global variable. There need to be two addresses here because the compiler can't know the relative position that the linker will end up putting the two sections in. Looking at the instructions at the beginning of queue_empty, we see that there is an extra load, the compiler has to generate separate load pairs to get the address of the section and then the value of the variable in that section. The extra instruction in this version of queue_empty doesn't make the body of the function longer, it just takes the spot that was previously a nop, but that won't be the case in general.

回复收藏 0 原文