系统调用如何转换为CPU指令?

发布于 2024-10-30 23:35:38 字数 460 浏览 1 评论 0原文

假设有一个简单的程序,例如:

#include<stdio.h>

void main() 
{ 
    int x;
    printf("Cool");
    fd = open("/tmp/cool.txt", O_READONLY)
}

open 这里是一个系统调用。我想当 shell 运行它时,它会进行数百个其他系统调用来实现它?像 int x 这样的声明怎么样?在某些时候,它应该在后台有一些额外的系统调用来从计算机获取内存吗?

我不确定系统调用和普通东西之间的界限是什么......一切最终都需要操作系统的帮助,对吗?!

或者就像 C 生成一个可以在处理器上运行的可执行文件(代码),并且不需要操作系统协助,直到到达系统调用 - 此时它必须执行某些操作来加载操作系统指令等...

有点模糊:) 请澄清。

Let's say there is a simple program like:

#include<stdio.h>

void main() 
{ 
    int x;
    printf("Cool");
    fd = open("/tmp/cool.txt", O_READONLY)
}

The open is a system call here. I suppose when the shell runs it, it makes some hundred other system calls to implement it? How about a declaration like int x - at some point should it have some additional system calls in the backdrop to get the memory from the computer?

I am not sure what is the boundary between a system call and a normal stuff ... everything, in the end, needs the operating system's help right?!

Or is it like the C generates an executable (code) which can be run on the processor and need no OS assistance is needed until a system call is reached - at which point it has to do something to load the OS instructions etc ...

A bit vague :) Please clarify.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

逆蝶 2024-11-06 23:35:38

我没有按顺序回答问题,所以我在问题前面加上我的答案。我冒昧地对它们进行了一些编辑。您没有指定处理器架构,但我假设您想了解 x86,因此处理器级别的详细信息将与 x86 有关。其他架构的行为可能有所不同(内存管理、系统调用的方式等)。我还使用 Linux 作为示例。

C 编译器是否生成可以直接在处理器上运行的可执行代码,无需操作系统协助,直到到达系统调用,此时它必须执行某些操作来加载操作系统指令?

是的,这是正确的。编译器生成可以直接在处理器上运行的本机机器代码。然而,从编译器获得的可执行文件包含代码和其他所需数据,例如有关将代码加载到内存中的位置的指令。在 Linux 上,ELF 格式通常用于可执行文件。

如果进程已完全加载到内存中并且具有足够的堆栈空间,则在进行系统调用之前它将不需要操作系统的进一步帮助。当您进行系统调用时,它只是调用操作系统的机器代码中的一条指令。程序本身不需要以任何方式“加载操作系统指令”。处理器负责将执行转移到操作系统代码。

对于 x86 架构上的 Linux,机器代码进行系统调用的一种方法是使用软件中断向量 128 将执行转移到操作系统。在 x86 汇编(Intel 语法)中,表示为 int 0x80。然后,Linux 将根据调用程序在进行系统调用之前放入处理器寄存器中的值来执行任务:系统调用号可在 eax 处理器寄存器中找到,系统调用参数可在其他处理器寄存器中找到。处理器寄存器。操作系统完成后,它将在eax寄存器中返回一个结果,并且可能修改了系统调用参数等指向的缓冲区。但请注意,这不是唯一的方法系统调用。

但是,如果进程并不完全在内存中,并且执行转移到当前不在内存中的代码部分,则处理器会导致页面错误,从而将执行转移到操作系统,然后操作系统加载所需的部分进程的内存并将执行转移回进程,然后进程可以继续正常执行,甚至不会注意到发生了任何事情。

我对下一点并不完全确定,所以要持保留态度。维基百科关于堆栈溢出的文章(计算机错误,不是这个网站:)似乎表明堆栈通常具有固定大小,因此 int x; 不应导致操作系统运行,除非堆栈的该部分不在内存中(请参阅上一段)。如果您有一个具有动态堆栈大小的系统(如果有可能的话,但据我所知,确实如此),当使用堆栈空间时, int x; 也可能导致页面错误up,提示操作系统为进程分配更多的堆栈空间。

页面错误导致执行转移到操作系统,但不是通常意义上的系统调用。系统调用是当您希望操作系统为您执行某些工作时对操作系统的显式调用。页面错误和其他此类事件是隐含的。硬件中断不断地将执行从进程转移到操作系统,以便操作系统可以对它们做出反应。之后,它将执行转移回您的进程或其他进程。

在多任务操作系统上,即使只有一个处理器/内核,您也可以同时运行多个程序。这是通过一次仅运行一个程序但在程序之间快速切换来实现的。硬件定时器中断可确保控制权及时转移回操作系统,这样一个进程就不会独占 CPU。当控制权传递给操作系统并且它完成了所需的操作时,它可能总是启动与被中断的进程不同的进程。操作系统完全透明地处理所有这些,因此您不必考虑它,您的进程也不会注意到它。从流程的角度来看,它是连续执行的。

简而言之:您的程序仅在您明确要求时才执行系统调用。操作系统还可以在需要时将进程的部分内容换入或换出内存,并且通常会在后台执行与进程相关和无关的操作,但您通常根本不需要考虑这一点。 (不过,您可以通过使程序尽可能小以及类似的事情来减少页面错误的数量)

在本例中,open() 是一个显式系统调用,但我想当 shell 运行它时,它会进行数百个其他系统调用来实现它。

不,shell 与 C 程序中的 open() 调用无关。你的程序进行了一个系统调用,而 shell 根本不参与其中。

shell 仅在启动程序时才会影响您的程序。当您使用 shell 启动程序时,shell 会执行 fork 系统调用来派生第二个进程,然后该进程执行 execve 系统调用以将其自身替换为您的程序。之后,您的程序就处于控制之中。不过,在控件到达您的 main() 函数之前,它会执行一些由编译器放在那里的初始化代码。如果您想查看进程进行了哪些系统调用,在 Linux 上您可以使用 strace 来查看它们。例如,只需说 strace ls 即可查看 ls 在执行期间进行的系统调用。如果您仅使用立即返回的 main() 函数来编译 ac 程序,则可以使用 strace 查看初始化代码进行的系统调用。

进程如何从计算机等获取内存?它必须再次涉及一些系统调用,对吧?我不确定系统调用和普通东西之间的界限是什么。一切最终都需要操作系统的帮助,对吧?

是的,系统调用。当您的程序通过 execve 系统调用加载到内存中时,它会为您的进程获取足够的内存。当您需要更多内存并调用 malloc() 时,如果内部缓存内存已用完,它将进行 brk 系统调用来增加进程的数据段给你。

并非所有事情都需要操作系统的明确帮助。如果您有足够的内存,将所有输入都存储在内存中,并将输出数据写入内存,则根本不需要操作系统。也就是说,只要你只对内存中已有的数据进行计算,不需要更多内存,也不需要与外界通信,就不需要操作系统。另一方面,一个完全不与外界通信的程序是一个非常无用的程序,因为它无法获得任何输入,也无法给出任何输出。即使计算出圆周率的百万分之一,如果不输出给用户也根本没有关系。

这个答案相当大,所以如果我错过了一些东西或者没有解释清楚一些东西,请给我留言,我会尽力详细说明。如果有人发现任何错误,也请务必指出。

I'm not answering the questions in order, so I'm prefixing my answers with the questions. I've taken the liberty of editing them a bit. You didn't specify the processor architecture, but I'm assuming you want to know about x86, so the processor-level details will pertain to x86. Other architectures can behave differently (memory management, how system calls are made, etc.). I'm also using Linux for examples.

Does the c compiler generate executable code that can be run straight on the processor without need for OS assistance until a system call is reached, at which point it has to do something to load the OS instructions?

Yes, that is correct. The compiler generates native machine code that can be run straight on the processor. The executable files that you get from the compiler, however, contain both the code and other needed data, for example, instructions on where to load the code in the memory. On Linux the ELF format is typically used for executables.

If the process is completely loaded into memory and has sufficient stack space, it will not need further OS assistance before it wants to make a system call. When you make a system call, it is just an instruction in the machine code that calls the OS. The program itself does not need to "load the OS instructions" in any way. The processor handles transferring execution to the OS code.

With Linux on the x86 architecture, one way for the machine code to make a system call is to use the software interrupt vector 128 to transfer execution to the operating system. In x86 assembly (Intel syntax), that is expressed as int 0x80. Linux will then perform tasks based on the values that the calling program placed into processor registers before making the system call: the system call number is found in the eax processor register and the system call parameters are found in other processor registers. After the OS is done, it will return a result in the eax register, and has possibly modified buffers pointed to by the system call parameters etc. Note however, that this is not the only way to make a system call.

However, if the process is not entirely in memory, and execution moves to a part of the code that is not in memory at the moment, the processor causes a page fault, which moves execution to the operating system, which then loads the required part of the process into memory and transfers execution back to the process, which can then continue execution normally, without even noticing that anything happened.

I'm not entirely sure on the next point, so take it with a grain of salt. The Wikipedia article on stack overflow (the computer error, not this site :) seems to indicate that stacks are usually of fixed size, so int x; should not cause the OS to run, unless that part of the stack is not in the memory (see previous paragraph). If you had a system with dynamic stack size (if it is even possible, but as far as I can see, it is), int x; could also cause a page fault when the stack space is used up, prompting the operating system to allocate more stack space for the process.

Page faults cause the execution to move to the operating system, but are not system calls in the usual sense of the word. System calls are explicit calls to the OS when you want it to perform some work for you. Page faults and other such events are implicit. Hardware interrupts continuously transfer the execution from your process to the OS so that it can react to them. After that it transfers the execution back to your process, or some other process.

On a multitasking OS, you can run many programs at once even if you have only one processor/core. This is accomplished by running only one program at a time, but switching between programs quickly. The hardware timer interrupt makes sure that control is transferred back to the OS in a timely fashion, so that one process can't hog the CPU all for itself. When control is passed to the OS and it has done what it needs to, it may always start a different process from the one that was interrupted. The OS handles all this totally transparently, so you don't have to think about it, and your process won't notice it. From the viewpoint of your process, it is executing continuously.

In short: Your program executes system calls only when you explicitly ask it to. The operating system may also swap parts of your process in and out of the memory when it wants to, and generally does things related and unrelated to your process in the background, but you don't normally need to think about that at all. (You can reduce the amount of page faults, though, by keeping your program as small as possible, and things like that)

In this case open() is an explicit system call, but I suppose when the shell runs it, it makes some hundred other system calls to implement it.

No, the shell has got nothing to do with an open() call in your c program. Your program makes that one system call, and shell doesn't come into the picture at all.

The shell will only affect your program when it starts it. When you start your program with the shell, the shell does a fork system call to fork off a second process, which then does an execve system call to replace itself with your program. After that, your program is in control. Before the control gets to your main() function though, it executes some initialization code, that was put there by the compiler. If you want to see what system calls a process makes, on Linux you can use strace to view them. Just say strace ls, for example, to see what system calls ls makes during its execution. If you compile a c program with just a main() function that returns immediately, you can see with strace what system calls the initialization code makes.

How does the process get its memory from the computer etc.? It has to involve some system calls again right? I am not sure what is the boundary between a system call and normal stuff. Everything in the end needs the OS help, right?

Yep, system calls. When your program is loaded into memory with the execve system call, it takes care of getting enough memory for your process. When you need more memory and call malloc(), it will make a brk system call to grow the data segment of your process if it has run out of internally cached memory to give you.

Not everything needs explicit help from the OS. If you have enough memory, have all your input in memory, and you write your output data to memory, you won't need the OS at all. That is, as long as you only do calculations on data you already have in memory, don't need more memory, and don't need to communicate with the outside world, you don't need the OS. On the other hand, a program that does not communicate with the outside world at all is a pretty useless one, because it can't get any input, and cannot give any output. Even if you calculate the millionth decimal of pi, it doesn't matter at all if you don't output it to the user.

This answer got quite big, so in case I missed something or didn't explain something clearly enough, please leave me a comment and I'll try to elaborate. If anyone spots any mistakes, be sure to point them out also.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文