gcc 的引导程序如何工作?
我正在查找pypy项目(Python in Python),并开始思考Python外层运行的问题是什么?我猜想,肯定不会像那句老话所说的“乌龟一路走来”吧!毕竟,python 不是有效的 x86 程序集!
很快我就记住了引导的概念,并查找了编译器引导。 “好吧”,我想,“所以它可以用不同的语言编写,也可以从汇编中手工编译”。出于性能考虑,我确信 C 编译器只是通过汇编构建的。
这一切都很好,但问题仍然存在,计算机如何获取该汇编文件?!
假设我买了一个新的CPU,上面什么都没有。在第一次操作期间,我希望安装一个运行 C 的操作系统。什么运行 C 编译器? BIOS中有微型C编译器吗?
有人可以向我解释一下吗?
I was looking up the pypy project (Python in Python), and started pondering the issue of what is running the outer layer of python? Surely, I conjectured, it can't be as the old saying goes "turtles all the way down"! Afterall, python is not valid x86 assembly!
Soon I remembered the concept of bootstrapping, and looked up compiler bootstrapping. "Ok", I thought, "so it can be either written in a different language or hand compiled from assembly". In the interest of performance, I'm sure C compilers are just built up from assembly.
This is all well, but the question still remains, how does the computer get that assembly file?!
Say I buy a new cpu with nothing on it. During the first operation I wish to install an OS, which runs C. What runs the C compiler? Is there a miniature C compiler in the BIOS?
Can someone explain this to me?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我明白你在问什么...如果我们没有 C 编译器并且必须从头开始会发生什么?
答案是你必须从组装或硬件开始。也就是说,您可以在软件或硬件中构建编译器。如果全世界都没有编译器,现在你可能可以用汇编更快地完成它;然而,当时我相信编译器实际上是专用的硬件。 维基百科文章有点短,并不能支持我的观点,但是没关系。
我想下一个问题是今天会发生什么?好吧,那些编译器编写者多年来一直忙于编写可移植的 C,因此编译器应该能够自行编译。值得在很高的层面上讨论什么是编译。基本上,您采用一组语句并从中生成汇编。就是这样。嗯,它实际上比这更复杂 - 你可以使用词法分析器和解析器做各种各样的事情,我只理解其中的一小部分,但本质上,你正在寻找将 C 映射到汇编。
在正常操作下,编译器会生成与您的平台匹配的汇编代码,但这不是必须的。只要它知道如何操作,它就可以为您喜欢的任何平台生成汇编代码。因此,让 C 在您的平台上运行的第一步是在现有编译器中创建目标,开始添加指令并让基本代码运行。
理论上,完成此操作后,您现在可以从一个平台交叉编译到另一个平台。接下来的阶段是:为该平台构建内核、引导加载程序和一些基本的用户区实用程序。
然后,您可以尝试为该平台编译编译器(一旦您拥有了一个可用的用户空间以及运行构建过程所需的一切)。如果成功,您将获得基本的实用程序、工作内核、用户空间和编译器系统。现在你已经上路了。
请注意,在移植编译器的过程中,您可能还需要为该平台编写汇编器和链接器。为了使描述简单,我省略了它们。
如果您对此感兴趣,Linux from Scratch 是一本有趣的读物。它不会告诉您如何从头开始创建新目标(这非常重要) - 它假设您要为现有的已知目标进行构建,但它确实向您展示了如何交叉编译要素并开始构建启动系统。
Python 实际上并不汇编为程序集。首先,正在运行的 python 程序会跟踪对象的引用计数,这是 cpu 不会为您做的事情。然而,基于指令的代码的概念也是 Python 的核心。尝试一下:
在那里您可以看到 Python 如何看待您输入的代码。这是python字节码,即python的汇编语言。如果您喜欢实现该语言,它实际上拥有自己的“指令集”。这就是虚拟机的概念。
Java 也有完全相同的想法。我采用了一个类函数并运行 javap -c class 来得到这个:
我想你明白了。这些是python和java世界的汇编语言,即python解释器和java编译器分别是如何思考的。
其他值得阅读的内容是 JonesForth。这既是一个正在运行的解释器,也是一个教程,我强烈推荐它来思考“事情如何执行”以及如何编写简单、轻量级的语言。
I understand what you're asking... what would happen if we had no C compiler and had to start from scratch?
The answer is you'd have to start from assembly or hardware. That is, you can either build a compiler in software or hardware. If there were no compilers in the whole world, these days you could probably do it faster in assembly; however, back in the day I believe compilers were in fact dedicated pieces of hardware. The wikipedia article is somewhat short and doesn't back me up on that, but never mind.
The next question I guess is what happens today? Well, those compiler writers have been busy writing portable C for years, so the compiler should be able to compile itself. It's worth discussing on a very high level what compilation is. Basically, you take a set of statements and produce assembly from them. That's it. Well, it's actually more complicated than that - you can do all sorts of things with lexers and parsers and I only understand a small subset of it, but essentially, you're looking to map C to assembly.
Under normal operation, the compiler produces assembly code matching your platform, but it doesn't have to. It can produce assembly code for any platform you like, provided it knows how to. So the first step in making C work on your platform is to create a target in an existing compiler, start adding instructions and get basic code working.
Once this is done, in theory, you can now cross compile from one platform to another. The next stages are: building a kernel, bootloader and some basic userland utilities for that platform.
Then, you can have a go at compiling the compiler for that platform (once you've got a working userland and everything you need to run the build process). If that succeeds, you've got basic utilities, a working kernel, userland and a compiler system. You're now well on your way.
Note that in the process of porting the compiler, you probably needed to write an assembler and linker for that platform too. To keep the description simple, I omitted them.
If this is of interest, Linux from Scratch is an interesting read. It doesn't tell you how to create a new target from scratch (which is significantly non trivial) - it assumes you're going to build for an existing known target, but it does show you how you cross compile the essentials and begin building up the system.
Python does not actually assemble to assembly. For a start, the running python program keeps track of counts of references to objects, something that a cpu won't do for you. However, the concept of instruction-based code is at the heart of Python too. Have a play with this:
There you can see how Python thinks of the code you entered. This is python bytecode, i.e. the assembly language of python. It effectively has its own "instruction set" if you like for implementing the language. This is the concept of a virtual machine.
Java has exactly the same kind of idea. I took a class function and ran
javap -c class
to get this:I take it you get the idea. These are the assembly languages of the python and java worlds, i.e. how the python interpreter and java compiler think respectively.
Something else that would be worth reading up on is JonesForth. This is both a working forth interpreter and a tutorial and I can't recommend it enough for thinking about "how things get executed" and how you write a simple, lightweight language.
如今,C 编译器(几乎?)完全用 C(或更高级语言 - 例如,Clang 是 C++)编写。编译器从包含手写汇编代码中获得的收益几乎为零。花费大部分时间的事情之所以慢,是因为它们解决了非常困难的问题,其中“困难”意味着“巨大的计算复杂性”——在汇编中重写最多只能带来恒定的加速,但那时这些已经不再重要了等级。
此外,大多数编译器都需要高可移植性,因此前端和中端的特定于体系结构的技巧是毫无疑问的(在后端,它们也不可取,因为它们可能会破坏交叉编译)。
当您安装操作系统时,(通常)不会运行 C 编译器。安装 CD 中充满了该架构的易于编译的二进制文件。如果包含一个 C 编译器(许多 Linux 发行版都是如此),那么它也是一个已经编译的可执行文件。那些让您构建自己的内核等的发行版也至少包含一个可执行文件 - 编译器。当然,除非您必须在现有的任何带有 C 编译器的安装上编译自己的内核。
如果“新 CPU”指的是一种不向后兼容任何尚支持的架构的新架构,那么自托管编译器可以遵循通常的移植过程:首先为该新目标编写一个后端,然后自行编译它,突然间,您在新平台上获得了一个成熟的编译器,具有久经考验的(编译了整个编译器)本机后端。
C compilers are, nowadays, (almost?) completely written in C (or higher-level languages - Clang is C++, for instance). Compilers gain little to nothing from including hand-written assembly code. The things that take most time are as slow as they are because they solve very hard problems, where "hard" means "big computational complexity" - rewriting in assembly brings at most a constant speedup, but those don't really matter anymore at that level.
Also, most compilers want high portability, so architecture-specific tricks in the front and middle end are out of question (and in the backends, they' not desirable either, because they may break cross-compilation).
When you're installing an OS, there's (usually) no C compiler run. The setup CD is full of readily-compiled binaries for that architecture. If there's a C compiler included (as it's the case with many Linux distros), that's an already-compiled exectable too. And those distros that make you build your own kernel etc. also have at least one executable included - the compiler. That is, of course, unless you have to compile your own kernel on an existing installation of anything with a C compiler.
If by "new CPU" you mean a new architecture that isn't backwards-compatible to anything that's yet supported, self-hosting compilers can follow the usual porting procedure: First write a backend for that new target, then compile yourself for it, and suddenly you got a mature compiler with a battle-hardened (compiled a whole compiler) native backend on the new platform.
如果您购买预装操作系统的新机器,它甚至不需要在任何地方包含编译器,因为所有可执行代码都已由提供操作系统的人在其他机器上编译 - 您的机器不需要自己编译任何东西。
如果你有一个全新的CPU架构,你是如何做到这一点的呢?在这种情况下,您可能会首先为在其他平台(“主机”)上运行的现有 C 编译器的新 CPU 架构(“目标”)编写新的代码生成后端 - a 交叉编译器。
一旦你的交叉编译器(在主机上运行)工作得足够好,可以生成将在目标上运行的正确编译器(以及必要的库等),那么你就可以在目标平台上编译编译器本身,并最终使用目标本机编译器,该编译器在目标上运行并生成在目标上运行的代码。
对于新语言来说也是同样的原理:你必须用现有的语言编写代码,并且你有一个工具链,它将把你的新语言编译成你可以使用的东西(我们称之为“引导编译器”)。一旦你让它工作得足够好,你就可以用你的新语言编写一个编译器(“真正的编译器”),然后用引导编译器编译真正的编译器。此时,您正在用新语言本身编写新语言的编译器,并且您的语言被称为“自托管”。
If you buy a new machine with a pre-installed OS, it doesn't even need to include a compiler anywhere, because all the executable code has been compiled on some other machine, by whoever provides the OS - your machine doesn't need to compile anything itself.
How do you get to this point if you have a completely new CPU architecture? In this case, you would probably start by writing a new code generation back-end for your new CPU architecture (the "target") for an existing C compiler that runs on some other platform (the "host") - a cross-compiler.
Once your cross-compiler (running on the host) works well enough to generate a correct compiler (and necessary libraries, etc.) that will run on the target, then you can compile the compiler with itself on the target platform, and end up with a target-native compiler, which runs on the target and generates code which runs on the target.
It's the same principle with a new language: you have to write code in an existing language that you do have a toolchain for, which will compile your new language into something that you can work with (let's call this the "bootstrap compiler"). Once you get this working well enough, you can write a compiler in your new language (the "real compiler"), and then compile the real compiler with the bootstrap compiler. At this point you're writing the compiler for your new language in the new language itself, and your language is said to be "self-hosting".