程序如何执行?操作系统在哪里发挥作用?
程序从某种语言编译为 ASM -->机器代码(直接可执行)。当人们说这是平台相关时,意味着形成的二进制文件只能在具有相同指令集架构(如 x86、x86-64)的 CPU 上运行(正确)。由于 ISA 的差异,它可能(错误地)/可能(根本)不在其他进程上运行。正确的?
现在,二进制文件的概念让我感到困惑。一切都与“机器语言代码”有关“中央处理器”。操作系统在哪里发挥作用?我的意思是,编译后的二进制文件在加载到内存中时有针对 CPU 的直接指令。 & CPU 一次执行一条指令。除了流程管理之外,我看不到操作系统的作用链接文本。无论操作系统如何,它都应该在相同 ISA 的 CPU 上运行。正确的?
但事实并非如此。如果我在 Windows 机器上构建 x86 代码。它无法在 Mac x86 计算机或 Linux x86 计算机上运行。
我在这里缺少一些东西。请解开我的困惑。
A program is compiled from some language to ASM --> Machine Code (directly executable). When people say that this is platform dependent, the mean that the binaries formed will run (correctly) only on the CPUs with same Instruction Set Architecture like x86, x86-64. It may (incorrectly) / may not (at all) run on other processes because of the difference in ISA. Right?
Now, the concept of binaries is confusing me. Everything is about the "Machine Language Code" & "CPU". Where does the OS come into play? I mean the compiled binary has direct instructions for CPU when it is loaded into memory. & CPU executes one instruction at a time. I couldn't see the role of Operating System any where except in process management link text . It should be running on the CPU of same ISA irrespective of Operating System. right?
Yet its not the case. If I build a code to x86 on windows machine. It won't run on Mac x86 machine or Linux x86 machine.
I'm missing something here. Please clear my confusion.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
对于初学者来说,现代 CPU(至少)有两种模式,一种是运行操作系统本身的核心的模式(“内核模式”),另一种是运行程序的模式(“用户模式”)。当处于用户模式时,CPU 无法执行很多操作。
例如,鼠标单击通常在内核中而不是在用户模式中被注意到。但是,操作系统将事件分派到用户模式,并从那里分派到正确的程序。反过来也需要合作:程序不能自由地在屏幕上绘图,而是需要通过操作系统和内核模式来在其部分上绘图。
同样,启动一个程序的行为通常也是一种合作。操作系统的外壳部分也是用户模式程序。它获取您的鼠标单击,并确定这是旨在启动进程的鼠标单击。然后 shell 告诉操作系统的内核模式部分为该程序启动一个新进程。
当内核态需要启动一个新进程时,它首先分配内存用于记账,然后继续加载程序。这涉及从二进制文件中检索指令,还将程序连接到操作系统。这通常需要找到二进制文件的入口点(典型的
int main(int argc, char** argv)
),以及程序想要调用操作系统的所有点。不同的操作系统使用不同的方式将程序与操作系统连接起来。因此,加载过程不同,二进制文件的文件格式也可能不同。这不是绝对的;二进制文件的 ELF 格式用于许多操作系统,Microsoft 在其当前的所有操作系统上都使用其 PE 格式。在这两种情况下,格式确实描述了二进制文件的精确格式,因此操作系统可以决定程序是否可以连接到操作系统。例如,如果它是 Win32 二进制文件,它将采用 PE 格式,因此 Linux 不会加载它,Windows 2000 会加载,Windows 7-64 也会加载。另一方面,Win64 二进制文件也是 PE 格式,但 Windows 2000 会拒绝它。
For starters, a modern CPU has (at least) two modes, a mode in which it's running the core of the Operating System itself ("kernel mode") and a mode in which it's running programs ("user mode"). When in user mode, the CPU can't do a whole lot of things.
For instance, a mouse click is typically noticed in the kernel, not user mode. However, the OS dispatches the event to user mode and from there to the correct program. The other way around also requires cooperation: a program can't draw to the screen freely, but needs to go through the OS and kernel mode to draw on its part.
Similarly, the act of starting a program is typically a cooperation. The shell part of the OS is a user-mode program too. It gets your mouse click, and determines that it's a mouse click intended to start a process. The shell then tells the kernel-mode part of the OS to start a new process for that program.
When the kernel mode needs to start a new process, it first allocates memory for bookkeeping, and then proceeds to load the program. This involves retrieving the instructions from the binary, but also hooking up the program to the OS. This usually requires finding the entry point (classically
int main(int argc, char** argv)
) of the binary, and all points where the program wants to call the OS.Different Operating Systems use different ways to hook up programs with the OS. As a result, the loading process differs, and the file formats for binaries can differ too. It's not absolute; the ELF format for binaries is used for a number of Operating Systems, and Microsoft uses its PE format on all its current Operating Systems. In both cases, the format does describe the precise format of the binary, so the OS can decide whether the program can be hooked up to the OS. For instance, if it's a Win32 binary, it will be in the PE format, therefore Linux won't load that, Windows 2000 will, as will Windows 7-64. A Win64 binary on the other hand is in PE format too, but Windows 2000 will reject it.
它不会在其他处理器上运行,因为 01010110011 在 x86 上意味着某些内容,而在 ARM 上则意味着其他内容。 x86-64 恰好向后兼容 x86,因此它可以运行 x86 程序。
二进制文件采用您的操作系统可以理解的特定格式(windows = PE,mac/linux = ELF)
对于任何普通的二进制文件,您的操作系统都会将其加载到内存中并使用某些值填充多个字段。这些“某些值”是共享库(dll、so)(例如 kernel32 或 libc)中存在的 api 函数的地址。
需要 API 地址是因为二进制文件本身不知道如何访问硬盘驱动器、网卡、游戏手柄等。程序使用这些地址来调用操作系统或其他库中存在的某些函数。
从本质上讲,二进制文件缺少一些重要部分,需要操作系统来填充这些部分才能使一切正常工作。如果操作系统填写了错误的部分,则二进制文件将无法工作,因为它们无法相互通信。如果您将 user32.dll 替换为另一个文件,或者尝试在 mac osx 上运行 Linux 可执行文件,就会发生这种情况。
那么libc如何知道如何打开一个文件呢?
libc 使用系统调用,这是对操作系统核心功能的低级访问。它有点像函数调用,只不过你是通过填充某些 CPU 寄存器然后触发中断(特殊 CPU 指令)来实现的。
那么操作系统如何知道如何打开文件呢?
这是操作系统所做的事情之一。但它怎么知道如何与硬盘对话呢?
我不知道这些东西到底是如何工作的,但我想操作系统是通过写入/读取某些恰好映射到 BIOS 功能的内存位置来实现这一点的。
那么 BIOS 如何知道如何与硬盘通信呢?
我也不知道,我从来没有做过那个级别的编程。我想 BIOS 是硬连线到硬盘驱动器连接器的,并且能够发送正确的 1 和 0 序列来与硬盘驱动器进行“SATA”对话。它可能只能说一些简单的事情,比如“读取这个扇区”
那么硬盘是如何知道如何读取一个扇区的呢?
我真的不知道这一点,所以我会让一些硬件人员继续。
It will not run on other processors since 01010110011 means something on x86 and something else on ARM. x86-64 happens to be backwards compatible with x86 so it can run x86 programs.
The binary is in a specific format that your OS understands (windows = PE, mac/linux = ELF)
With any normal binary, your OS loads it into memory and populates a number of fields with certain values. These "certain values" are addresses to api functions that exist in shared libraries (dll, so) such as kernel32 or libc.
The API addresses are needed because the binary itself does not know how to access hard drives, network cards, gamepads etc. The program uses these addresses to invoke certain functions that exist in your OS or in other libraries.
In essence, the binary is missing some vital parts that need to be filled by the OS to make everything work. If the OS fills in the wrong parts, the binary won't work since they can't communicate with each other. That's what would happen if you would replace user32.dll with another file, or if you try to run a linux executable on mac osx.
So how does libc know how to open a file?
libc uses syscalls, which is low-level access to the OS core functions. It's sort of like a function call except you do it by populating certain CPU registers and then triggering an interrupt (special CPU instruction)
So how does the OS then know how to open files?
That's one of the things an OS does. But how does it know how to talk to a hard drive?
I don't know exactly how that stuff works but I imagine the OS does this by writing/reading certain memory locations which happen to be mapped to BIOS functions.
So how does the BIOS know how to talk to a hard drive?
I don't know that either, I've never done any programming at that level. I imagine the BIOS is hardwired to the hard drive connectors and is able to send the correct sequence of 1 and 0 to talk "SATA" with the hard drive. It can probably only say simple things such as "read this sector"
So how does the hard drive know how to read a sector?
I really don't know this at all so I'll let some hardware guy continue.
有两种方法:
首先也是最重要的答案是“系统调用”。每当您调用需要执行任何 I/O、与设备交互、分配内存、分叉进程等的函数时,该函数都需要执行“系统调用”。虽然系统调用指令本身是 X86 的一部分,但可用的系统调用和参数是特定于操作系统的。
即使您的程序没有进行任何系统调用(我不确定这是否可能,而且肯定不会很有用),围绕机器代码的格式对于不同的操作系统也是不同的。因此exe(PE)和linux可执行文件(通常是ELF)的文件格式是不同的,这就是exe文件无法在Linux上执行的原因。
编辑:这些是低级细节。更高级别的答案是,任何需要访问文件、控制台/GUI、分配内存等的东西都是特定于操作系统的。
Two ways:
First and foremost the answer is "system calls". Whenever you call a function that needs to do any I/O, interact with devices, allocate memory, fork processes, etc., that function needs to do a "system call". While the syscall instruction itself is part of X86, the available system calls and parameters to them are OS-specific.
Even if your program doesn't make ANY system calls (which I'm not sure is possible, and certainly wouldn't be very useful) the formats that wrap around the machine code are different for different OSes. So the file formats of exe (PE) and a linux executable (ELF usually) are different, which is why an exe file won't execute on Linux.
EDIT: these are low-level details. The higher-level answer is to say that anything that needs to access files, the console/GUI, allocate memory, etc. is OS-specific.
当您尝试访问它在硬件级别为您抽象的“服务”时,操作系统就会发挥作用,例如打开称为文件系统的“数据库”内的文件,生成随机数(每个现代操作系统都有此功能)。
例如,在 GNU/Linux 下,您必须填写寄存器并调用 int 80h 来访问“服务”(实际上称为“系统调用”)。
您的程序也不会在其他操作系统上运行,因为可执行文件有不同的文件格式,例如Win有COFF/PE,Linux有ELF文件格式(就像任何其他文件格式一样,这也包含“元数据”,例如HTML(或 SGML)文件格式)。
The OS comes into play when you try to access "a service" which it abstracts out for you at the hardware level, e.g. open a file inside the "database" called filesystem, generate a random number (every modern OS has this feature).
Under GNU/Linux for instance, you got to fill in the registers and call int 80h to access a "service" (actually called "syscall").
Your program won't run on another OS also because there are different file formats for executables, for example Win has COFF/PE, Linux has the ELF file format (just like any other file format, this also contains "meta data", e.g. the HTML (or SGML) file format).
操作系统提供 (a) 机器代码运行的环境,以及 (b) 标准服务。如果没有(a),您的代码将永远不会首先执行,如果没有(b),您将必须自己实现所有内容并直接访问硬件。
The OS provides (a) the environment that your machine code runs in, and (b) standard services. Without (a), your code will never get to execute in the first place, and without (b), you would have to implement absolutely everything yourself and hit the hardware directly.
由高级语言生成的机器指令将适合提供您所做的调用的库的调用约定,包括任何系统调用(尽管这些通常包装在用户空间库中的某个位置,因此有关如何进行系统调用的细节可能没有必要)。
此外,它适用于目标指令集架构,但有一些例外(例如,必须注意有关指针大小、基元类型、结构布局、C++ 中的类实现等的假设)。
文件格式将规定必要的挂钩/公开可见的函数和数据,以使操作系统能够将代码作为进程执行,并将进程引导到所需的状态。如果您熟悉 Windows 下的 C/C++ 开发,子系统的概念决定了引导级别、提供的资源和入口点签名(在大多数情况下通常为
main(int, char **)
)系统)。有一些很好的例子说明了高级语言、指令集体系结构和可执行文件格式的选择如何影响在任何给定系统上运行二进制文件的能力:
汇编语言必须针对特定 ISA 进行编码。它们使用特定于 CPU 类型系列的指令。这些指令可能适用于其他 CPU 系列,如果这些 CPU 支持给定的指令集。例如,x86 代码在一定程度上可以在 amd64 操作系统上运行,并且肯定可以在运行 x86 操作系统的 amd64 CPU 上运行。
C 抽象了 ISA 的许多细节。一些明显的例外包括指针大小和字节顺序。各种众所周知的接口,将通过libc提供到预期的水平,例如
printf
,main
,fopen
等。其中包括进行这些调用所需的寄存器和堆栈状态,从而使 C 代码能够在不同的操作系统和体系结构上工作而无需更改。可以直接提供其他接口,也可以通过将特定于平台的接口包装到预期接口中来提供,以增加 C 代码的可移植性。Python 和其他类似的“虚拟化”语言在另一个抽象级别上运行,并且除了一些例外之外,例如特定平台上不存在的功能或字符编码差异,可以在许多系统上运行而无需修改。这是通过为许多不同的 ISA 和操作系统组合提供统一的接口来实现的,但代价是性能和可执行文件的大小。
The machine instructions generated by a high-level language will be appropriate for the calling conventions for libraries providing those calls you make, including any system calls (albeit these are usually wrapped in a userspace library somewhere, so specifics about how to make a system call might not be necessary).
Additionally, it will be appropriate for the targetted instruction set architecture, with a few exceptions (care must be taken for example, about assumptions regarding pointer sizes, primitive types, structure layouts, class implementations in C++ etc.).
The file format will dictate the necessary hooks/publically visible functions and data to enable the operating system to execute your code as a process, and to bootstrap the process to the required state. If you're familiar with development for C/C++ under Windows, the concept of subsystem dictates the level of bootstrapping, resources provided, and entry point signature (normally
main(int, char **)
on most systems).There are some good examples of how the choice of high-level language, instruction set architecture, and executable file format might affect the ability to run a binary on any given system:
Assembly languages must code for a specific ISA. They use instructions that are specific to a family of CPU types. These instructions may work on other families of CPUs, if those CPUs support the given instruction set. For instance x86 code will work to a degree, on an amd64 operating system, and definitely work on an amd64 CPU running an x86 operating system.
C abstracts much of the specifics of an ISA. A few obvious exceptions include pointer sizes and endianness. Various well-known interfaces, will be provided to an expected level via libc, such as
printf
,main
,fopen
, and others. These include the expected register and stack states in order to make these calls, enabling C code to work on different operating systems and architectures without change. Other interfaces can be provided, either directly or by wrapping platform-specific into the expected interface to increase the portability of C code.Python, and other similar "virtualized" languages operate at yet another level of abstraction, and again with a few exceptions, for instance features that don't exist on particular platforms, or character encoding differences, can run without modification on numerous systems. This is achieved by providing a uniform interface for many different ISA and operating system combinations, at the expense of performance and executable size.
打个比方:
假设您从另一个国家雇用了一名管家。他听不懂你说的一个字,所以你得到了一个类似星际迷航的翻译设备。现在他可以理解你的高级语言,因为当你说话时他听到他自己的(相当粗俗的)语言。
现在假设你想让他从 A 走到 B。你不会直接和他的腿或脚说话,你会当着他的面问他!他掌控着自己的身体。如果 1) 你正确地传达了你的要求,2) 他认为这属于他的雇佣职责,他将从 A 调到 B。
现在你得到了一位新仆人,来自与上一位仆人相同的国家(因为你更愿意)不要购买新的星际迷航翻译器)。您也希望他从 A 步行到 B。但这位仆人要求你在询问时大声说话并说“请”。你可以忍受,因为他更灵活:如果你愿意,你可以要求他从A经C到B——以前的管家可以这么做,但拖拖拉拉并抱怨。
另一个幸运的突破是您可以调整翻译器设置来处理这个问题,因此,从您的语言角度来看,没有任何变化。但如果你用新的设置与老管家交谈,即使你说的是他的语言,他也会感到困惑并且听不懂。
如果不清楚的话,管家是具有相同 ISA 但不同操作系统的计算机。翻译器是针对其 ISA 的交叉编译器工具链。
An analogy:
Say you hire a butler from another country. He doesn't understand a word you say, so you get a star-trek-like translator device. Now he can understand your high level language, because when you speak he hears his own (rather crude) language.
Now suppose you want him to walk from A to B. You wouldn't talk to his legs or feet directly, you'd ask him to his face! He is in control of his own body. If 1) you communicate your request properly and 2) he decides that it falls under his employment duties, he will move from A to B.
Now you get a new servant, from the same country as the last one (because you'd rather not buy a new star-trek-translator). You want him to walk from A to B as well. But this servant requires you to talk louder and say please while asking. You put up with this because he is more flexible: you can ask him to go from A to B via C if you want--the previous butler could do that but dragged his feet and complained.
Another lucky break is you can adjust your translator settings to handle this, so, from your language perspective, nothing changes. But if you were to talk to the old butler with the new settings, he'd be confused and wouldn't understand even though you're speaking his language.
In case it's not clear, the butlers are computers with the same ISA but different operating systems. The translator is your cross-compiler toolchain targeting their ISA.
操作系统提供用于访问某些功能和硬件的工具和 API。
例如,要在 Microsoft Windows 上创建窗口,您需要操作系统的 DLL 来创建窗口。
除非您希望自己编写 API,否则您将使用操作系统提供的 API。这就是操作系统发挥作用的地方。
The OS provides the tools and API for access to certain features and the hardware.
For example to create a window on Microsoft Windows, you need the OS's DLL to create the window.
Unless you wish to write the API yourself, you'll use the API that the OS provides. That's where the OS come into play.
另外我想添加操作系统处理程序的启动。
它准备进程空间并对其进行初始化,以便程序可以开始、加载程序指令并将控制权交给程序。
Also I want to add that OS handles the startup of the program.
It prepares process space and initializes it so that program can begin, loads the program instructions and gives control to the program.