.bss 节零初始化变量是否占用 elf 文件中的空间?

发布于 2024-07-14 16:14:23 字数 182 浏览 13 评论 0原文

如果我理解正确的话,ELF 文件中的 .bss 部分用于为零初始化变量分配空间。 我们的工具链生成 ELF 文件,因此我的问题是:.bss 部分实际上必须包含所有这些零吗? 这似乎是一种可怕的空间浪费,以至于当我分配一个全局 10 MB 数组时,它会在 ELF 文件中产生 10 MB 的零。 我在这里看到了什么错误?

If I understand correctly, the .bss section in ELF files is used to allocate space for zero-initialized variables. Our tool chain produces ELF files, hence my question: does the .bss section actually have to contain all those zeroes? It seems such an awful waste of spaces that when, say, I allocate a global ten megabyte array, it results in ten megabytes of zeroes in the ELF file. What am I seeing wrong here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

烟火散人牵绊 2024-07-21 16:14:23

自从我和 ELF 合作已经有一段时间了。 但我想我仍然记得这些东西。 不,它实际上并不包含这些零。 如果您查看 ELF 文件程序头,您将看到每个头都有两个数字:一个是文件的大小。 另一个是在虚拟内存中分配该节时的大小 (readelf -l ./a.out):

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4
  INTERP         0x000114 0x08048114 0x08048114 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x00454 0x00454 R E 0x1000
  LOAD           0x000454 0x08049454 0x08049454 0x00104 0x61bac RW  0x1000
  DYNAMIC        0x000468 0x08049468 0x08049468 0x000d0 0x000d0 RW  0x4
  NOTE           0x000128 0x08048128 0x08048128 0x00020 0x00020 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

LOAD 类型的标头是复制到的标头文件加载执行时的虚拟内存。 其他标头包含其他信息,例如所需的共享库。 如您所见,对于包含 bss 部分的标头(第二个 LOAD一):

0x00104 (file-size) 0x61bac (mem-size)

对于此示例代码:

int a[100000];
int main() { }

ELF 规范规定,内存大小大于文件大小的段部分仅在虚拟内存中用零填充。 第二个 LOAD 标头的段到节映射如下所示:

03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss

因此,其中还有一些其他节。 对于 C++ 构造函数/析构函数。 对于 Java 来说也是如此。 然后它包含 .dynamic 部分的副本以及其他对动态链接有用的内容(我相信这是包含所需共享库以及其他内容的地方)。 之后的 .data 部分包含初始化的全局变量和局部静态变量。 最后,出现 .bss 部分,该部分在加载时用零填充,因为文件大小没有覆盖它。

顺便说一句,您可以使用 -M 链接器选项查看特定符号将被放置到哪个输出节中。 对于 gcc,您可以使用 -Wl,-M 将选项传递给链接器。 上面的示例显示 a 是在 .bss 内分配的。 它可以帮助您验证未初始化的对象是否确实最终位于 .bss 中,而不是其他地方:

.bss            0x08049560    0x61aa0
 [many input .o files...]
 *(COMMON) 
 *fill*         0x08049568       0x18 00
 COMMON         0x08049580    0x61a80 /tmp/cc2GT6nS.o
                0x08049580                a
                0x080ab000                . = ALIGN ((. != 0x0)?0x4:0x1) 
                0x080ab000                . = ALIGN (0x4) 
                0x080ab000                . = ALIGN (0x4) 
                0x080ab000                _end = .

GCC 默认情况下将未初始化的全局变量保留在 COMMON 部分中,以便与旧编译器兼容,允许定义全局变量在一个程序中两次,没有多个定义错误。 使用 -fno-common 使 GCC 使用目标文件的 .bss 部分(对于最终链接的可执行文件没有影响,因为正如您所见,它无论如何都会进入 .bss 输出部分这是由链接器脚本控制的。使用ld -verbose显示它。 但这不应该吓到您,这只是一个内部细节。 请参阅 gcc 的联机帮助页。

Has been some time since i worked with ELF. But i think i still remember this stuff. No, it does not physically contain those zeros. If you look into an ELF file program header, then you will see each header has two numbers: One is the size in the file. And another is the size as the section has when allocated in virtual memory (readelf -l ./a.out):

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4
  INTERP         0x000114 0x08048114 0x08048114 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x00454 0x00454 R E 0x1000
  LOAD           0x000454 0x08049454 0x08049454 0x00104 0x61bac RW  0x1000
  DYNAMIC        0x000468 0x08049468 0x08049468 0x000d0 0x000d0 RW  0x4
  NOTE           0x000128 0x08048128 0x08048128 0x00020 0x00020 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

Headers of type LOAD are the one that are copied into virtual memory when the file is loaded for execution. Other headers contain other information, like the shared libraries that are needed. As you see, the FileSize and MemSiz significantly differ for the header that contains the bss section (the second LOAD one):

0x00104 (file-size) 0x61bac (mem-size)

For this example code:

int a[100000];
int main() { }

The ELF specification says that the part of a segment that the mem-size is greater than the file-size is just filled out with zeros in virtual memory. The segment to section mapping of the second LOAD header is like this:

03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss

So there are some other sections in there too. For C++ constructor/destructors. The same thing for Java. Then it contains a copy of the .dynamic section and other stuff useful for dynamic linking (i believe this is the place that contains the needed shared libraries among other stuff). After that the .data section that contains initialized globals and local static variables. At the end, the .bss section appears, which is filled by zeros at load time because file-size does not cover it.

By the way, you can see into which output-section a particular symbol is going to be placed by using the -M linker option. For gcc, you use -Wl,-M to put the option through to the linker. The above example shows that a is allocated within .bss. It may help you verify that your uninitialized objects really end up in .bss and not somewhere else:

.bss            0x08049560    0x61aa0
 [many input .o files...]
 *(COMMON) 
 *fill*         0x08049568       0x18 00
 COMMON         0x08049580    0x61a80 /tmp/cc2GT6nS.o
                0x08049580                a
                0x080ab000                . = ALIGN ((. != 0x0)?0x4:0x1) 
                0x080ab000                . = ALIGN (0x4) 
                0x080ab000                . = ALIGN (0x4) 
                0x080ab000                _end = .

GCC keeps uninitialized globals in a COMMON section by default, for compatibility with old compilers, that allow to have globals defined twice in a program without multiple definition errors. Use -fno-common to make GCC use the .bss sections for object files (does not make a difference for the final linked executable, because as you see it's going to get into a .bss output section anyway. This is controlled by the linker script. Display it with ld -verbose). But that shouldn't scare you, it's just an internal detail. See the manpage of gcc.

征棹 2024-07-21 16:14:23

ELF 文件中的 .bss 部分用于静态数据,该数据未以编程方式初始化,但保证在运行时设置为零。 这是一个小例子来解释其中的差异。

int main() {
    static int bss_test1[100];
    static int bss_test2[100] = {0};
    return 0;
}

在本例中,bss_test1 被放入 .bss 中,因为它尚未初始化。 然而,bss_test2 与一堆零一起放入 .data 段中。 运行时加载器基本上分配为 .bss 保留的空间量,并在任何用户态代码开始执行之前将其清零。

您可以使用 objdumpnm 或类似实用程序查看差异:

moozletoots$ objdump -t a.out | grep bss_test
08049780 l     O .bss   00000190              bss_test1.3
080494c0 l     O .data  00000190              bss_test2.4

这通常是嵌入式开发人员遇到的第一个惊喜。永远不要将静态变量显式初始化为零。 运行时加载器(通常)会处理这个问题。 一旦显式初始化任何内容,您就告诉编译器/链接器将数据包含在可执行映像中。

The .bss section in an ELF file is used for static data which is not initialized programmatically but guaranteed to be set to zero at runtime. Here's a little example that will explain the difference.

int main() {
    static int bss_test1[100];
    static int bss_test2[100] = {0};
    return 0;
}

In this case bss_test1 is placed into the .bss since it is uninitialized. bss_test2 however is placed into the .data segment along with a bunch of zeros. The runtime loader basically allocates the amount of space reserved for the .bss and zeroes it out before any userland code begins executing.

You can see the difference using objdump, nm, or similar utilities:

moozletoots$ objdump -t a.out | grep bss_test
08049780 l     O .bss   00000190              bss_test1.3
080494c0 l     O .data  00000190              bss_test2.4

This is usually one of the first surprises that embedded developers run into... never initialize statics to zero explicitly. The runtime loader (usually) takes care of that. As soon as you initialize anything explicitly, you are telling the compiler/linker to include the data in the executable image.

一杯敬自由 2024-07-21 16:14:23

.bss 部分不存储在可执行文件中。 在最常见的部分(.text.data.bss)中,只有 .text(实际代码) 和 .data (初始化数据)存在于 ELF 文件中。

A .bss section is not stored in an executable file. Of the most common sections (.text, .data, .bss), only .text (actual code) and .data (initialized data) are present in an ELF file.

别念他 2024-07-21 16:14:23

这是正确的,.bss 在物理上并不存在于文件中,而只是存在有关其大小的信息,以便动态加载器为应用程序分配 .bss 部分。
根据经验,只有 LOAD,TLS Segment 获取应用程序的内存,其余部分用于动态加载器。

关于静态可执行文件,在可执行

嵌入式应用程序中,bss 部分也被赋予了空间,其中没有加载程序,这是常见的。

苏曼

That is correct, .bss is not present physically in the file, rather just the information about its size is present for the dynamic loader to allocate the .bss section for the application program.
As thumb rule only LOAD, TLS Segment gets the memory for the application program, rest are used for dynamic loader.

About static executable file, bss sections is also given space in the execuatble

Embedded application where there is no loader this is common.

Suman

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文