链接器有什么作用?
我一直想知道。我知道编译器会将您编写的代码转换为二进制文件,但是链接器会做什么呢?他们对我来说一直是个谜。
我大致了解什么是“链接”。将对库和框架的引用添加到二进制文件中。除此之外我什么都不明白。对我来说它“只是有效”。我也了解动态链接的基础知识,但没有太深入。
有人可以解释一下这些术语吗?
I've always wondered. I know that compilers convert the code you write into binaries but what do linkers do? They've always been a mystery to me.
I roughly understand what 'linking' is. It is when references to libraries and frameworks are added to the binary. I don't understand anything beyond that. For me it "just works". I also understand the basics of dynamic linking but nothing too deep.
Could someone explain the terms?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
要了解链接器,首先了解当您将源文件(例如 C 或 C++ 文件)转换为可执行文件(可执行文件是可以在您的计算机或计算机上执行的文件)时“幕后”发生的情况会有所帮助。其他人的机器运行相同的机器架构)。
在幕后,当编译程序时,编译器将源文件转换为目标字节代码。该字节代码(有时称为目标代码)是只有您的计算机体系结构才能理解的助记符指令。传统上,这些文件具有 .OBJ 扩展名。
创建目标文件后,链接器就开始发挥作用。通常,执行任何有用操作的实际程序都需要引用其他文件。例如,在 C 语言中,一个将您的名字打印到屏幕上的简单程序将包括以下内容:
当编译器将您的程序编译为 obj 文件时,它只是放置对 printf 函数的引用。链接器解析此引用。大多数编程语言都有一个标准例程库来涵盖该语言所期望的基本内容。链接器将 OBJ 文件与该标准库链接起来。链接器还可以将您的 OBJ 文件与其他 OBJ 文件链接。您可以创建其他 OBJ 文件,这些文件具有可由另一个 OBJ 文件调用的函数。链接器的工作方式几乎就像文字处理器的复制和粘贴一样。它“复制”出程序引用的所有必要函数并创建单个可执行文件。有时,复制出的其他库还依赖于其他 OBJ 或库文件。有时链接器必须非常递归才能完成其工作。
请注意,并非所有操作系统都会创建单个可执行文件。例如,Windows 使用 DLL 将所有这些函数保存在一个文件中。这会减少可执行文件的大小,但会使可执行文件依赖于这些特定的 DLL。 DOS 过去使用称为“覆盖”(.OVL 文件)的东西。这有很多目的,但其中一个是将常用的函数保存在一个文件中(如果您想知道,它的另一个目的是能够将大型程序放入内存中。DOS 在内存方面有限制,并且覆盖可以从内存中“卸载”,其他覆盖可以“加载”到该内存之上,因此得名“覆盖”)。 Linux 有共享库,这基本上与 DLL 的想法相同(我认识的硬核 Linux 人员会告诉我有很多很大的区别)。
To understand linkers, it helps to first understand what happens "under the hood" when you convert a source file (such as a C or C++ file) into an executable file (an executable file is a file that can be executed on your machine or someone else's machine running the same machine architecture).
Under the hood, when a program is compiled, the compiler converts the source file into object byte code. This byte code (sometimes called object code) is mnemonic instructions that only your computer architecture understands. Traditionally, these files have an .OBJ extension.
After the object file is created, the linker comes into play. More often than not, a real program that does anything useful will need to reference other files. In C, for example, a simple program to print your name to the screen would consist of:
When the compiler compiled your program into an obj file, it simply puts a reference to the
printf
function. The linker resolves this reference. Most programming languages have a standard library of routines to cover the basic stuff expected from that language. The linker links your OBJ file with this standard library. The linker can also link your OBJ file with other OBJ files. You can create other OBJ files that have functions that can be called by another OBJ file. The linker works almost like a word processor's copy and paste. It "copies" out all the necessary functions that your program references and creates a single executable. Sometimes other libraries that are copied out are dependent on yet other OBJ or library files. Sometimes a linker has to get pretty recursive to do its job.Note that not all operating systems create a single executable. Windows, for example, uses DLLs that keep all these functions together in a single file. This reduces the size of your executable, but makes your executable dependent on these specific DLLs. DOS used to use things called Overlays (.OVL files). This had many purposes, but one was to keep commonly used functions together in 1 file (another purpose it served, in case you're wondering, was to be able to fit large programs into memory. DOS has a limitation in memory and overlays could be "unloaded" from memory and other overlays could be "loaded" on top of that memory, hence the name, "overlays"). Linux has shared libraries, which is basically the same idea as DLLs (hard core Linux guys I know would tell me there are MANY BIG differences).
地址重定位最小示例
地址重定位是链接的关键功能之一。
那么让我们通过一个最小的例子来看看它是如何工作的。
0) 简介
摘要:重定位编辑目标文件的
.text
部分以将:这必须由链接器完成,因为编译器只能看到一个输入文件一次,但我们必须立即了解所有目标文件,以决定如何:
.text
和.data
部分发生冲突目标文件先决条件:至少了解:
链接与 C 或 C++ 具体无关:编译器只需生成目标文件。然后链接器将它们作为输入,而不知道它们是用什么语言编译的。也可能是 Fortran。
因此,为了减少负担,让我们研究一个 NASM x86-64 ELF Linux hello world:
编译和组装:
使用 NASM 2.10.09。
1) .o 的 .text
首先我们反编译目标文件的 .text 部分:
其中给出:
关键行是:
应该将 hello world 字符串的地址移动到 rsi 中 register,它被传递给 write 系统调用。
但是等等!当程序加载时,编译器如何知道
“Hello world!”
将在内存中结束?嗯,它不能,特别是在我们将一堆
.o
文件与多个.data
部分链接在一起之后。只有链接器才能做到这一点,因为只有他才能拥有所有这些目标文件。
因此,编译器只是:
0x0
此“额外信息”包含在
.rela.text 部分
2) .rela.text
.rela.text
代表“.text 部分的重定位”。使用重定位一词是因为链接器必须将地址从对象重定位到可执行文件中。
我们可以用以下内容反汇编
.rela.text
部分:其中包含;
本节的格式固定记录在:http:// /www.sco.com/developers/gabi/2003-12-17/ch4.reloc.html
每一项都告诉链接器一个需要重定位的地址,这里我们只有一个字符串。
稍微简化一下,对于这一特定行,我们有以下信息:
Offset = C
:此条目更改的.text
的第一个字节是什么。< /p>如果我们回顾一下反编译的文本,它恰好位于关键的
movabs $0x0,%rsi
内,那些了解 x86-64 指令编码的人会注意到,它编码的是 64 位指令的地址部分。Name = .data
:地址指向.data
部分Type = R_X86_64_64
,它指定了计算的具体内容已完成地址翻译。该字段实际上与处理器相关,因此记录在 AMD64 System V ABI 扩展< /a> 第 4.4 节“搬迁”。
该文档说
R_X86_64_64
可以:Field = word64
:8 个字节,因此00 00 00 00 00 00 00 00
位于地址0xC
计算 = S + A
S
是重定位地址处的值,因此00 00 00 00 00 00 00 00
A
是加数,这里是0
。这是重定位条目的字段。所以
S + A == 0
,我们将被重新定位到.data
部分的第一个地址。3) .out 的 .text
现在让我们看看为我们生成的可执行文件
ld
的文本区域:给出:
所以目标文件中唯一改变的是关键行:
现在指向地址
0x6000d8
(d8 00 60 00 00 00 00 00
,小端字节序)而不是0x0
。这是
hello_world
字符串的正确位置吗?为了做出决定,我们必须检查程序头,它告诉 Linux 在哪里加载每个部分。
我们用以下命令反汇编它们:
这给出:
这告诉我们
.data
部分(第二个)从VirtAddr
=0x06000d8
开始。数据部分唯一的东西就是我们的 hello world 字符串。
奖金级别
PIE
链接:gcc 和 ld 中与位置无关的可执行文件的 -fPIE 选项是什么?_start
入口点:汇编语言中的 global _start 是什么?etext
、edata
和end
: 符号 etext、edata 和 end 在哪里定义了?Address relocation minimal example
Address relocation is one of the crucial functions of linking.
So let's have a look on how it works with a minimal example.
0) Introduction
Summary: relocation edits the
.text
section of object files to translate:This must be done by the linker because the compiler only sees one input file at a time, but we must know about all object files at once to decide how to:
.text
and.data
sections of multiple object filesPrerequisites: minimal understanding of:
Linking has nothing to do with C or C++ specifically: compilers just generate the object files. The linker then takes them as input without ever knowing what language compiled them. It might as well be Fortran.
So to reduce the crust, let's study a NASM x86-64 ELF Linux hello world:
compiled and assembled with:
with NASM 2.10.09.
1) .text of .o
First we decompile the
.text
section of the object file:which gives:
the crucial lines are:
which should move the address of the hello world string into the
rsi
register, which is passed to the write system call.But wait! How can the compiler possibly know where
"Hello world!"
will end up in memory when the program is loaded?Well, it can't, specially after we link a bunch of
.o
files together with multiple.data
sections.Only the linker can do that since only he will have all those object files.
So the compiler just:
0x0
on the compiled outputThis "extra information" is contained in the
.rela.text
section of the object file2) .rela.text
.rela.text
stands for "relocation of the .text section".The word relocation is used because the linker will have to relocate the address from the object into the executable.
We can disassemble the
.rela.text
section with:which contains;
The format of this section is fixed documented at: http://www.sco.com/developers/gabi/2003-12-17/ch4.reloc.html
Each entry tells the linker about one address which needs to be relocated, here we have only one for the string.
Simplifying a bit, for this particular line we have the following information:
Offset = C
: what is the first byte of the.text
that this entry changes.If we look back at the decompiled text, it is exactly inside the critical
movabs $0x0,%rsi
, and those that know x86-64 instruction encoding will notice that this encodes the 64-bit address part of the instruction.Name = .data
: the address points to the.data
sectionType = R_X86_64_64
, which specifies what exactly what calculation has to be done to translate the address.This field is actually processor dependent, and thus documented on the AMD64 System V ABI extension section 4.4 "Relocation".
That document says that
R_X86_64_64
does:Field = word64
: 8 bytes, thus the00 00 00 00 00 00 00 00
at address0xC
Calculation = S + A
S
is value at the address being relocated, thus00 00 00 00 00 00 00 00
A
is the addend which is0
here. This is a field of the relocation entry.So
S + A == 0
and we will get relocated to the very first address of the.data
section.3) .text of .out
Now lets look at the text area of the executable
ld
generated for us:gives:
So the only thing that changed from the object file are the critical lines:
which now point to the address
0x6000d8
(d8 00 60 00 00 00 00 00
in little-endian) instead of0x0
.Is this the right location for the
hello_world
string?To decide we have to check the program headers, which tell Linux where to load each section.
We disassemble them with:
which gives:
This tells us that the
.data
section, which is the second one, starts atVirtAddr
=0x06000d8
.And the only thing on the data section is our hello world string.
Bonus level
PIE
linking: What is the -fPIE option for position-independent executables in gcc and ld?_start
entry point: What is global _start in assembly language?etext
,edata
andend
: Where are the symbols etext, edata and end defined?在像“C”这样的语言中,单个代码模块传统上被单独编译成目标代码块,除了模块在其自身之外进行的所有引用(即对库或其他模块)之外,这些目标代码已准备好在各个方面执行。尚未解决(即它们是空白的,等待有人过来并建立所有连接)。
链接器所做的就是一起查看所有模块,查看每个模块需要连接到外部的内容,并查看其导出的所有内容。然后它修复所有问题,并生成最终的可执行文件,然后可以运行。
在动态链接也正在进行的情况下,链接器的输出仍然无法运行 - 仍然有一些对外部库的引用尚未解析,并且它们当时由操作系统解析它加载应用程序(或者甚至可能在运行期间加载)。
In languages like 'C', individual modules of code are traditionally compiled separately into blobs of object code, which is ready to execute in every respect other than that all the references that module makes outside itself (i.e. to libraries or to other modules) have not yet been resolved (i.e. they're blank, pending someone coming along and making all the connections).
What the linker does is to look at all the modules together, look at what each module needs to connect to outside itself, and look at all the things it is exporting. It then fixes that all up, and produces a final executable, which can then be run.
Where dynamic linking is also going on, the output of the linker is still not capable of being run - there are still some references to external libraries not yet resolved, and they get resolved by the OS at the time it loads the app (or possibly even later during the run).
当编译器生成目标文件时,它包括该目标文件中定义的符号的条目以及对该目标文件中未定义的符号的引用。链接器获取这些并将它们放在一起,以便(当一切正常时)每个文件中的所有外部引用都由其他目标文件中定义的符号满足。
然后,它将所有这些目标文件组合在一起,并将地址分配给每个符号,并且当一个目标文件具有对另一个目标文件的外部引用时,它会填充每个符号的地址,无论它被另一个对象使用。在典型情况下,它还会构建一个包含所使用的任何绝对地址的表,因此加载程序可以/将在加载文件时“修复”地址(即,它将基本加载地址添加到每个地址,因此它们都引用正确的内存地址)。
相当多的现代链接器还可以执行一些(在少数情况下很多)其他“东西”,例如以只有在所有模块都可见时才可能的方式优化代码(例如,删除包含的函数,因为其他模块可能可能会调用它们,但是一旦所有模块放在一起,很明显没有任何东西会调用它们)。
When the compiler produces an object file, it includes entries for symbols that are defined in that object file, and references to symbols that aren't defined in that object file. The linker takes those and puts them together so (when everything works right) all the external references from each file are satisfied by symbols that are defined in other object files.
It then combines all those object files together and assigns addresses to each of the symbols, and where one object file has an external reference to another object file, it fills in the address of each symbol wherever it's used by another object. In a typical case, it'll also build a table of any absolute addresses used, so the loader can/will "fix up" the addresses when the file is loaded (i.e., it'll add the base load address to each of those addresses so they all refer to the correct memory address).
Quite a few modern linkers can also carry out some (in a few cases a lot) of other "stuff", such as optimizing the code in ways that are only possible once all of the modules are visible (e.g., removing functions that were included because it was possible that some other module might call them, but once all the modules are put together it's apparent that nothing ever calls them).
链接器将 OBJ 文件与该标准库链接起来。
The linker links your OBJ file with this standard library.