查询 -ffunction-section & gcc 的 -fdata-sections 选项
下面在 GCC 页面中提到的函数部分和数据部分选项:
<前><代码>-ffunction-sections -fdata-部分如果目标支持任意部分,请将每个函数或数据项放入输出文件中自己的部分中。函数的名称或数据项的名称决定了输出文件中的节的名称。 在链接器可以执行优化的系统上使用这些选项,以提高指令空间中引用的局部性。大多数使用 ELF 对象格式和运行 Solaris 2 的 SPARC 处理器的系统都具有具有此类优化的链接器。 AIX 将来可能会有这些优化。
仅当这样做可以带来显着好处时才使用这些选项。 当您指定这些选项时,汇编器和链接器将创建更大的对象和可执行文件,并且速度也会变慢。如果您指定此选项,如果同时指定此选项和 -g,则调试可能会出现问题。
我的印象是这些选项将有助于减少可执行文件的大小。为什么这个页面说它将创建更大的可执行文件?我错过了什么吗?
The below mentioned in the GCC Page for the function sections and data sections options:
-ffunction-sections -fdata-sections
Place each function or data item into its own section in the output file if the target supports arbitrary sections. The name of the function or the name of the data item determines the section's name in the output file.
Use these options on systems where the linker can perform optimizations to improve locality of reference in the instruction space. Most systems using the ELF object format and SPARC processors running Solaris 2 have linkers with such optimizations. AIX may have these optimizations in the future.Only use these options when there are significant benefits from doing so. When you specify these options, the assembler and linker will create larger object and executable files and will also be slower. You will not be able to use gprof on all systems if you specify this option and you may have problems with debugging if you specify both this option and -g.
I was under the impression that these options will help in reducing the executable file size. Why does this page say that it will create larger executable files? Am I missing something?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
有趣的是,使用 -fdata-sections 可以使函数的文字池变大,从而使函数本身变得更大。我在 ARM 上尤其注意到了这一点,但在其他地方也可能如此。我测试的二进制文件只增长了四分之一,但它确实增长了。看看更改后的函数的反汇编,原因就很清楚了。
如果目标文件中的所有 BSS(或 DATA)条目都分配给单个节,则编译器可以将该节的地址存储在函数文字池中,并生成具有距函数中该地址已知偏移量的负载以访问您的文件。数据。但是,如果您启用
-fdata-sections
,它会将每段 BSS(或 DATA)数据放入其自己的部分中,并且因为它不知道这些部分中的哪些部分稍后可能会被垃圾收集,或者什么为了使链接器将所有这些部分放入最终的可执行映像中,它不能再使用单个地址的偏移量加载数据。因此,它必须在每个使用的数据的文字池中分配一个条目,一旦链接器弄清楚最终图像中的内容和位置,它就可以使用实际地址来修复这些文字池条目数据。所以,是的,即使使用 -Wl,--gc-sections ,生成的图像可能会更大,因为实际的函数文本更大。
下面我添加了一个最小的示例
下面的代码足以了解我正在谈论的行为。请不要被 volatile 声明和全局变量的使用所迷惑,这两者在实际代码中都是有问题的。在这里,它们确保使用 -fdata-sections 时创建两个数据部分。
用于此测试的 GCC 版本是:
首先,在没有 -fdata-sections 的情况下,我们得到以下结果。
从
arm-none-eabi-nm
我们看到queue_empty有20个字节长(14十六进制),并且arm-none-eabi-objdump
输出显示有一个函数末尾的单个重定位字,它是 BSS 部分(未初始化数据的部分)的地址。函数中的第一条指令将该值(BSS 的地址)加载到 r3 中。接下来的两条指令相对于 r3 加载,分别偏移 0 和 4 个字节。这两个载荷是head和tail值的载荷。我们可以在arm-none-eabi-nm输出的第一列中看到这些偏移量。函数末尾的nop
用于对文字池的地址进行字对齐。接下来我们将看看添加 -fdata-sections 后会发生什么。
我们立即看到queue_empty的长度增加了4个字节,达到24个字节(18进制),并且现在queue_empty的文字池中有两次重定位需要完成。这些重定位对应于所创建的两个 BSS 部分的地址,每个部分对应一个全局变量。这里需要有两个地址,因为编译器无法知道链接器最终将两个部分放入的相对位置。查看queue_empty开头的指令,我们看到有一个额外的负载,编译器必须生成单独的加载对来获取该部分的地址,然后获取该部分中变量的值。此版本的queue_empty 中的额外指令不会使函数体变长,它只是占用了之前为nop 的位置,但一般情况并非如此。
Interestingly, using
-fdata-sections
can make the literal pools of your functions, and thus your functions themselves larger. I've noticed this on ARM in particular, but it's likely to be true elsewhere. The binary I was testing only grew by a quarter of a percent, but it did grow. Looking at the disassembly of the changed functions it was clear why.If all of the BSS (or DATA) entries in your object file are allocated to a single section then the compiler can store the address of that section in the functions literal pool and generate loads with known offsets from that address in the function to access your data. But if you enable
-fdata-sections
it puts each piece of BSS (or DATA) data into its own section, and since it doesn't know which of these sections might be garbage collected later, or what order the linker will place all of these sections into the final executable image, it can no longer load data using offsets from a single address. So instead, it has to allocate an entry in the literal pool per used data, and once the linker has figured out what is going into the final image and where, then it can go and fix up these literal pool entries with the actual address of the data.So yes, even with
-Wl,--gc-sections
the resulting image can be larger because the actual function text is larger.Below I've added a minimal example
The code below is enough to see the behavior I'm talking about. Please don't be thrown off by the volatile declaration and use of global variables, both of which are questionable in real code. Here they ensure the creation of two data sections when -fdata-sections is used.
The version of GCC used for this test is:
First, without -fdata-sections we get the following.
From
arm-none-eabi-nm
we see that queue_empty is 20 bytes long (14 hex), and thearm-none-eabi-objdump
output shows that there is a single relocation word at the end of the function, it's the address of the BSS section (the section for uninitialized data). The first instruction in the function loads that value (the address of the BSS) into r3. The next two instructions load relative to r3, offsetting by 0 and 4 bytes respectively. These two loads are the loads of the values of head and tail. We can see those offsets in the first column of the output fromarm-none-eabi-nm
. Thenop
at the end of the function is to word align the address of the literal pool.Next we'll see what happens when -fdata-sections is added.
Immediately we see that the length of queue_empty has increased by four bytes to 24 bytes (18 hex), and that there are now two relocations to be done in queue_empty's literal pool. These relocations correspond to the addresses of the two BSS sections that were created, one for each global variable. There need to be two addresses here because the compiler can't know the relative position that the linker will end up putting the two sections in. Looking at the instructions at the beginning of queue_empty, we see that there is an extra load, the compiler has to generate separate load pairs to get the address of the section and then the value of the variable in that section. The extra instruction in this version of queue_empty doesn't make the body of the function longer, it just takes the spot that was previously a nop, but that won't be the case in general.
使用这些编译器选项时,您可以添加链接器选项
-Wl,--gc-sections
来删除所有未使用的代码。When using those compiler options, you can add the linker option
-Wl,--gc-sections
that will remove all unused code.您可以在静态库上使用
-ffunction-sections
和-fdata-sections
,这会增加静态库的大小,因为每个函数和全局数据变量都会被放入在一个单独的部分。然后在与此静态库链接的程序上使用
-Wl,--gc-sections
,这将删除未使用的部分。因此,最终的二进制文件将比没有这些标志时要小。
但要小心,因为
-Wl,--gc-sections
可能会破坏东西。You can use
-ffunction-sections
and-fdata-sections
on static libraries, which will increase the size of the static library, as each function and global data variable will be put in a separate section.And then use
-Wl,--gc-sections
on the program linking with this static library, which will remove unused sections.Thus, the final binary will be smaller than without those flags.
Be careful though, as
-Wl,--gc-sections
can break things.添加额外步骤并构建
.a
存档后,我得到了更好的结果:-ffunction-sections
-fdata-sections
一起使用code> 标志.o
对象都被放入.a
存档中,并带有ar rcs file.a *.o
-Wl,-gc-sections,-u,main
选项-Os
。I get better results adding an additional step and building an
.a
archive:-ffunction-sections
-fdata-sections
flags.o
objects are put into an.a
archive withar rcs file.a *.o
-Wl,-gc-sections,-u,main
options-Os
.我不久前尝试过,看看结果,大小的增加似乎来自于具有不同对齐方式的对象的顺序。通常,链接器对对象进行排序以保持它们之间的填充较小,但看起来这只适用于一个部分,而不是跨各个部分。因此,您经常会在每个函数的数据部分之间进行额外的填充,从而增加整体空间。
对于带有 -Wl,-gc-sections 的静态库,删除未使用的部分很可能会弥补小幅增加。
I tried it a while back and looking at the results it seems the size increase comes from the order of objects with different alignment. Normaly the linker sorts objects to keep the padding between them small but it looks like that only works within a section, not across the individual sections. So you often get extra padding between the data sections for each function increasing the overall space.
For a static lib with -Wl,-gc-sections the removal of unused section will most likely make more than up for the small increase though.