当前位置：文江博客话题详情

目标文件和共享目标文件之间的关系

发布于 2024-07-30 15:13:20 字数 85 浏览 7 评论 0原文

共享对象（.so）文件和对象（.o）文件之间的关系是什么？

你能通过例子解释一下吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

荒岛晴空 2024-08-06 15:13:25

.so 类似于 Windows 上的 .dll。 .o 与 Visual Studio 下的 .obj 完全相同。

回复收藏 0 原文

要走就滚别墨迹 2024-08-06 15:13:24

假设您有以下 C 源文件，将其命名为 name.c

#include <stdio.h>
#include <stdlib.h>

void print_name(const char * name)
{
    printf("My name is %s\n", name);
}

当您编译它时，使用 cc name.c 生成 name.o。 .o 包含 name.c 中定义的所有函数和变量的已编译代码和数据，以及将其名称与实际代码关联的索引。如果您查看该索引，例如使用 nm 工具（在 Linux 和许多其他 Unix 上可用），您会注意到两个条目：

00000000 T print_name
         U printf

这意味着什么：有两个符号（函数或变量的名称，但不是存储在 .o 中的类、结构或任何类型的名称）。第一个标记为 T，实际上在 name.o 中包含其定义。另一个标有 U 的只是一个参考。可以在此处找到 print_name 的代码，但不能找到 printf 的代码。当您的实际程序运行时，它将需要找到所有引用的符号，并在其他目标文件中查找它们的定义，以便链接在一起形成完整的程序或完整的库。因此，目标文件是在源文件中找到的定义，转换为二进制形式，可用于放入完整的程序中。

您可以将 .o 文件一一链接在一起，但您不会这样做：它们通常有很多，并且它们是实现细节。您确实希望将它们全部收集到相关对象的捆绑包中，并具有易于识别的名称。这些捆绑包称为库，它们有两种形式：静态和动态。

静态库（在 Unix 中）几乎总是以 .a 为后缀（示例包括 libc.a，它是 C 核心库，libm.a 这是 C 数学库）等等。继续该示例，您将使用 ar rc libname.a name.o 构建静态库。如果您在 libname.a 上运行 nm，您将看到以下内容：

name.o:
00000000 T print_name
         U printf

如您所见，它主要是一个包含查找 all< 的索引的目标文件大表。 /em> 其中的名字。就像目标文件一样，它包含每个 .o 中定义的符号以及它们引用的符号。如果您要链接另一个 .o（例如date.o 到print_date），您会看到与上面类似的另一个条目。

如果将静态库链接到可执行文件中，它将整个库嵌入到可执行文件中。这就像链接所有单独的 .o 文件一样。正如您可以想象的那样，这可能会使您的程序变得非常大，尤其是当您使用（就像大多数现代应用程序一样）大量库时。

动态或共享库以.so为后缀。与它的静态类似物一样，它是一个大型目标文件表，引用所有已编译的代码。您可以使用 cc -shared libname.so name.o 来构建它。不过，用 nm 来看与静态库有很大不同。在我的系统上，它包含大约两打符号，其中只有两个是 print_name 和 printf：

00001498 a _DYNAMIC
00001574 a _GLOBAL_OFFSET_TABLE_
         w _Jv_RegisterClasses
00001488 d __CTOR_END__
00001484 d __CTOR_LIST__
00001490 d __DTOR_END__
0000148c d __DTOR_LIST__
00000480 r __FRAME_END__
00001494 d __JCR_END__
00001494 d __JCR_LIST__
00001590 A __bss_start
         w __cxa_finalize@@GLIBC_2.1.3
00000420 t __do_global_ctors_aux
00000360 t __do_global_dtors_aux
00001588 d __dso_handle
         w __gmon_start__
000003f7 t __i686.get_pc_thunk.bx
00001590 A _edata
00001594 A _end
00000454 T _fini
000002f8 T _init
00001590 b completed.5843
000003c0 t frame_dummy
0000158c d p.5841
000003fc T print_name
         U printf@@GLIBC_2.0

共享库在一个非常重要的方面不同于静态库：它不嵌入本身在您的最终可执行文件中。相反，可执行文件包含对共享库的引用，该引用不是在链接时而是在运行时解析的。这有很多优点：

您的可执行文件要小得多。它仅包含您通过目标文件显式链接的代码。外部库是引用，它们的代码不会进入二进制文件。
您可以在多个可执行文件之间共享（因此得名）一个库的位。
如果您非常注意二进制兼容性，则可以在程序运行之间更新库中的代码，并且程序将采用新库，而无需您进行更改。

有一些缺点：

将程序链接在一起需要时间。对于共享库，部分时间会推迟到每次可执行文件运行时。
这个过程比较复杂。共享库中的所有附加符号都是使库在运行时链接所需的基础设施的一部分。
您面临着不同版本的库之间细微不兼容的风险。在 Windows 上，这称为“DLL 地狱”。

（如果你仔细想想，其中很多都是程序使用或不使用引用和指针而不是直接将类的对象嵌入到其他对象中的原因。这个类比非常直接。）

好的，这是很多细节，我跳过了很多内容，例如链接过程的实际工作原理。我希望你能遵循它。如果没有要求澄清。

Let's say you have the following C source file, call it name.c

#include <stdio.h>
#include <stdlib.h>

void print_name(const char * name)
{
    printf("My name is %s\n", name);
}

When you compile it, with cc name.c you generate name.o. The .o contains the compiled code and data for all functions and variables defined in name.c, as well as index associated their names with the actual code. If you look at that index, say with the nm tool (available on Linux and many other Unixes) you'll notice two entries:

00000000 T print_name
         U printf

What this means: there are two symbols (names of functions or variables, but not names of classes, structs, or any types) stored in the .o. The first, marked with T actually contains its definition in name.o. The other, marked with U is merely a reference. The code for print_name can be found here, but the code for printf cannot. When your actual program runs it will need to find all the symbols that are references and look up their definitions in other object files in order to be linked together into a complete program or complete library. An object file is therefore the definitions found in the source file, converted to binary form, and available for placing into a full program.

You can link together .o files one by one, but you don't: there are generally a lot of them, and they are an implementation detail. You'd really prefer to have them all collected into bundles of related objects, with well recognized names. These bundles are called libraries and they come in two forms: static and dynamic.

A static library (in Unix) is almost always suffixed with .a (examples include libc.a which is the C core library, libm.a which is the C math library) and so on. Continuing the example you'd build your static library with ar rc libname.a name.o. If you run nm on libname.a you'll see this:

name.o:
00000000 T print_name
         U printf

As you can see it is primarily a big table of object files with an index finding all the names in it. Just like object files it contains both the symbols defined in every .o and the symbols referred to by them. If you were to link in another .o (e.g. date.o to print_date), you'd see another entry like the one above.

If you link in a static library into an executable it embeds the entire library into the executable. This is just like linking in all the individual .o files. As you can imagine this can make your program very large, especially if you are using (as most modern applications are) a lot of libraries.

A dynamic or shared library is suffixed with .so. It, like its static analogue, is a large table of object files, referring to all the code compiled. You'd build it with cc -shared libname.so name.o. Looking at with nm is quite a bit different than the static library though. On my system it contains about two dozen symbols only two of which are print_name and printf:

00001498 a _DYNAMIC
00001574 a _GLOBAL_OFFSET_TABLE_
         w _Jv_RegisterClasses
00001488 d __CTOR_END__
00001484 d __CTOR_LIST__
00001490 d __DTOR_END__
0000148c d __DTOR_LIST__
00000480 r __FRAME_END__
00001494 d __JCR_END__
00001494 d __JCR_LIST__
00001590 A __bss_start
         w __cxa_finalize@@GLIBC_2.1.3
00000420 t __do_global_ctors_aux
00000360 t __do_global_dtors_aux
00001588 d __dso_handle
         w __gmon_start__
000003f7 t __i686.get_pc_thunk.bx
00001590 A _edata
00001594 A _end
00000454 T _fini
000002f8 T _init
00001590 b completed.5843
000003c0 t frame_dummy
0000158c d p.5841
000003fc T print_name
         U printf@@GLIBC_2.0

A shared library differs from a static library in one very important way: it does not embed itself in your final executable. Instead the executable contains a reference to that shared library that is resolved, not at link time, but at run-time. This has a number of advantages:

Your executable is much smaller. It only contains the code you explicitly linked via the object files. The external libraries are references and their code does not go into the binary.
You can share (hence the name) one library's bits among multiple executables.
You can, if you are careful about binary compatibility, update the code in the library between runs of the program, and the program will pick up the new library without you needing to change it.

There are some disadvantages:

It takes time to link a program together. With shared libraries some of this time is deferred to every time the executable runs.
The process is more complex. All the additional symbols in the shared library are part of the infrastructure needed to make the library link up at run-time.
You run the risk of subtle incompatibilities between differing versions of the library. On Windows this is called "DLL hell".

(If you think about it many of these are the reasons programs use or do not use references and pointers instead of directly embedding objects of a class into other objects. The analogy is pretty direct.)

Ok, that's a lot of detail, and I've skipped a lot, such as how the linking process actually works. I hope you can follow it. If not ask for clarification.

回复收藏 0 原文

~没有更多了~