distcc 中的链接阶段
使用 distcc 构建项目时的链接阶段是在本地完成的,而不是像编译那样发送到其他计算机上完成,是否有任何特殊原因?阅读 distcc 白页并没有给出明确的答案,但我猜测与编译相比,链接目标文件所花费的时间并不是很重要。有什么想法吗?
Is there any particular reason that the linking phase when building a project with distcc is done locally rather than sent off to other computers to be done like compiling is? Reading the distcc whitepages didn't give a clear answer but I'm guessing that the time spent linking object files is not very significant compared to compilation. Any thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
distcc 的工作方式是在本地预处理输入文件,直到创建单个文件翻译单元。然后该文件通过网络发送并编译。在此阶段,远程 distcc 服务器只需要一个编译器,甚至不需要项目的头文件。然后,编译的输出被移回客户端并作为目标文件存储在本地。请注意,这意味着不仅链接,而且预处理都是在本地执行的。这种工作分工对于其他构建工具来说很常见,例如 ccache(始终执行预处理,然后它尝试使用先前缓存的结果解析输入,如果成功则返回二进制文件而不重新编译)。
如果要实现分布式链接器,则必须确保网络中的所有主机都具有完全相同的配置,否则必须批量发送操作所需的所有输入。这意味着分布式编译将生成一组目标文件,并且所有这些目标文件都必须通过网络推送,以便远程系统链接并返回链接的文件。请注意,这可能需要引用并存在于链接器路径中但不存在于链接器命令行中的系统库,因此“预链接”必须确定实际需要发送哪些库集。即使可能,这也需要本地系统猜测/计算所有真正的依赖关系并发送它们,这会对网络流量产生很大的影响,并且实际上可能会减慢进程,因为发送的成本可能大于链接的成本 - 如果获取依赖项的成本本身并不像链接那么昂贵。
我目前正在进行的项目有一个超过 100M 的静态链接可执行文件。静态库的大小不等,但如果分布式系统认为最终的可执行文件要远程链接,则可能需要最终可执行文件(模板、内联......所有这些都出现在所有文件中)的三到五倍的网络流量。包含它们的翻译单元,因此网络上会有多个副本)。
The way that distcc works is by locally preprocessing the input files until a single file translation unit is created. That file is then sent over the network and compiled. At that stage the remote distcc server only needs a compiler, it does not even need the header files for the project. The output of the compilation is then moved back to the client and stored locally as an object file. Note that this means that not only linking, but also preprocessing is performed locally. That division of work is common to other build tools, like ccache (preprocessing is always performed, then it tries to resolve the input with previously cached results and if succeeds returns the binary without recompiling).
If you were to implement a distributed linker, you would have to either ensure that all hosts in the network have the exact same configuration, or else you would have to send all required inputs for the operation in one batch. That would imply that distributed compilation would produce a set of object files, and all those object files would have to be pushed over the network for a remote system to link and return the linked file. Note that this might require system libraries that a referred and present in the linker path, but not present in the linker command line, so a 'pre-link' would have to determine what set of libraries are actually required to be sent. Even if possible this would require the local system to guess/calculate all real dependencies and send them with a great impact in network traffic and might actually slow down the process, as the cost of sending might be greater than the cost of linking --if the cost of getting the dependencies is not itself almost as expensive as linking.
The project I am currently working on has a single statically linked executable of over 100M. The static libraries range in size but if a distributed system would consider that the final executable was to be linked remotely it would require probably three to five times as much network traffic as the final executable (templates, inlines... all these appear in all translation units that include them, so there would be multiple copies flying around the network).
链接几乎按照定义,要求您将所有目标文件放在一处。由于 distcc 将对象文件放置在首先调用 distcc 的计算机上,因此本地计算机是执行链接的最佳位置,因为对象已经存在。
此外,一旦您开始将库投入其中,远程链接就会变得特别棘手。如果远程计算机将您的程序链接到其本地版本的库,那么当链接的二进制文件返回到本地计算机时,您就会遇到潜在的问题。
Linking, almost by definition, requires that you have all of the object files in one place. Since distcc drops the object files on the computer that invoked distcc in the first place, the local machine is the best place to perform the link as the objects are already present.
In addition, remote linking would become particularly hairy once you start throwing libraries into the mix. If the remote machine linked your program against its local version of a library, you've opened up potential problems when the linked binary is returned to the local machine.
编译可以发送到其他机器的原因是每个源文件都是独立编译的。简单来说,对于每个
.c
输入文件,编译步骤中都会有一个.o
输出文件。您可以根据需要在任意多台不同的机器上完成此操作。另一方面,链接会收集所有
.o
文件并创建一个输出二进制文件。那里确实没有太多可以分发的东西。The reason that compliation can be sent to other machines is that each source file is compiled independently of others. In simple terms, for each
.c
input file there is a.o
output file from the compilation step. This can be done on as many different machines as you like.On the other hand, linking collects all the
.o
files and creates one output binary. There isn't really much to distribute there.