操作系统通常如何管理内核内存和页面处理?
我正在研究内核设计,并且有一些有关分页的问题。
到目前为止我的基本想法是这样的:每个程序都有自己的(或者它认为的)4G 内存,减去我为程序可以调用的内核函数保留的部分。 因此,操作系统需要找到某种方法来加载程序在运行期间需要使用的内存页面。
现在,假设我们有无限量的内存和处理器时间,我可以使用不存在(或被换出)的页面的页面错误来加载/分配程序写入或读取的任何页面,因此操作系统可以快速分配它们或交换它们。但在现实世界中,我需要优化这个过程,这样我们就不会有一个程序不断消耗它曾经接触过的所有内存。
所以我想我的问题是,操作系统通常如何解决这个问题? 我最初的想法是创建一个函数,程序调用该函数来设置/释放页面,然后它可以自行管理内存,但是程序通常会这样做,还是编译器假设它具有自由支配权? 另外,编译器如何处理需要分配相当大的内存段的情况? 我是否需要提供一个尝试按顺序提供 X 页的函数?
这显然不是一个特定于语言的问题,但我偏爱标准 C 并且擅长 C++,所以我希望任何代码示例都位于该语言或汇编语言中。 (汇编应该不是必需的,我完全打算让它与尽可能多的 C 代码一起工作,并作为最后一步进行优化。)
另一件事也应该更容易回答:通常如何处理内核函数程序需要调用? 是否可以只拥有一组包含程序可以调用的最基本函数/进程特定内存的内存区域(我正在考虑虚拟空间的末尾)? 我的想法是让内核函数做一些非常奇特的事情,并在程序需要做任何重大事情时交换页面(这样程序就无法在自己的空间中看到敏感的内核函数),但我并不是真的此时重点关注安全性。
所以我想我更担心的是总体设计思路而不是具体细节。 我想让内核与 GCC 完全兼容(以某种方式),并且我需要确保它提供正常程序所需的一切。
感谢您的任何建议。
I'm working on kernel design, and I've got some questions concerning paging.
The basic idea that I have so far is this: Each program gets its own (or so it thinks) 4G of memory, minus a section somewhere that I reserve for kernel functions that the program can call. So, the OS needs to figure out some way to load the pages in memory that the program needs to use during its operation.
Now, assuming that we had infinite amounts of memory and processor time, I could load/allocate any page the program wrote to or read from as it happened using page faults for pages that didn't exist (or were swapped out) so the OS could quickly allocate them or swap them in. In the real world though, I need to optimize this process, so that we don't have a program constantly consuming all memory that it ever touched.
So I guess my question is, how does an OS generally go about this? My initial thought is to create a function that the program calls to set/free pages, which it can then memory manage on its own, but does a program generally do this, or does the compiler assume it has free reign? Also, how does the compiler handle situations where it needs to allocate a fairly large segment of memory? Do I need to provide a function that tries to give it X pages in order?
This is obviously not a language specific question, but I'm partial to standard C and good with C++, so I'd like any code examples to be in either that or assembly. (Assembly shouldn't be necessary, I fully intend to make it work with as much C code as possible, and optimize as a last step.)
Another thing that should be easier to answer as well: How does one generally handle kernel functions that a program needs to call? Is it OK just to have a set area of memory (I was thinking toward the end of virtual space) that contains most basic functions/process specific memory that the program can call? My thought from there would be to have the kernel functions do something very fancy and swap the pages out (so that programs couldn't see sensitive kernel functions in their own space) when programs needed to do anything major, but I'm not really focusing on security at this point.
So I guess I'm more worried about the general design ideas than the specifics. I'd like to make the kernel completely compatible with GCC (somehow) and I need to make sure that it provides everything that a normal program would need.
Thanks for any advice.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
解决所有这些问题的一个好的起点是看看 Unix 是如何做到的。 正如一句名言所说:“那些不懂 UNIX 的人注定要重新发明它,而且很糟糕。”
首先,关于调用内核函数。 仅仅在程序可以调用的地方提供函数是不够的,因为程序很可能在“用户模式”(IA-32 上的环 3)下运行,而内核必须在“内核模式”(通常是环 0)下运行IA-32)执行其特权操作。 您必须以某种方式在两种模式之间进行转换,这是非常特定于体系结构的。
在IA-32上,传统的方法是在IDT中使用门和软件中断(Linux使用int 0x80)。 较新的处理器有其他(更快)的方法来实现这一点,哪些方法可用取决于 CPU 是来自 AMD 还是 Intel,以及具体的 CPU 型号。 为了适应这种变化,最近的 Linux 内核在每个进程的地址空间顶部使用由内核映射的代码页。 因此,在最近的 Linux 上,要进行系统调用,您需要调用此页面上的一个函数,该函数将依次执行切换到内核模式所需的任何操作(内核具有该页面的多个副本,并选择要使用的副本)启动时取决于您的处理器的功能)。
现在,内存管理。 这是一个巨大的主题; 你可以写一本关于它的大书,但不会耗尽这个主题。
请务必记住,内存至少有两个视图:物理视图(页面的真实顺序,对硬件内存子系统可见,并且通常对外部外设可见)和逻辑视图视图(CPU 上运行的程序看到的页面顺序)。 两者很容易混淆。 您将分配物理页并将它们分配给程序或内核地址空间上的逻辑地址。 单个物理页可以有多个逻辑地址,并且可以在不同进程中映射到不同的逻辑地址。
内核内存(为内核保留)通常映射到每个进程的地址空间的顶部。 但是,它被设置为只能在内核模式下访问。 不需要花哨的技巧来隐藏这部分内存; 硬件完成阻止访问的所有工作(在 IA-32 上,它是通过页标志或段限制完成的)。
程序通过多种方式在其余地址空间上分配内存:
brk()
系统调用更改其上限)。 传统上,这是由堆使用的(C 库上的内存分配器,malloc()
是其接口之一,负责堆)。mmap()
。 使用匿名mmap
,您可以分配不受任何文件支持的地址空间的新区域,但其他方面的操作方式相同。 内核的程序加载器通常会使用mmap
来分配部分程序代码(例如,程序代码可以由可执行文件本身支持)。访问未以任何方式分配(或为内核保留)的地址空间区域被视为错误,并且在 Unix 上将导致向程序发送信号。
编译器要么静态分配内存(通过在可执行文件头中指定;内核的程序加载器将在加载程序时分配内存),要么动态分配内存(通过调用语言标准库上的函数,然后通常调用语言标准库中的函数)。 C语言标准库,然后调用内核分配内存并根据需要进行细分)。
学习所有这些基础知识的最佳方法是阅读有关操作系统的几本书中的一本,特别是那些使用 Unix 变体作为示例的书籍。 它会比我在 StackOverflow 上的回答更详细。
A good starting point for all these questions is to look at how Unix does it. As a famous quote says, "Those who don't understand UNIX are doomed to reinvent it, poorly."
First, about calling kernel functions. It is not enough to simply have the functions somewhere a program can call, since the program is most probably running in "user mode" (ring 3 on IA-32) and the kernel has to run in "kernel mode" (usually ring 0 on IA-32) to do its priviledged operations. You have to somehow do the transition between both modes, and this is very architecture specific.
On IA-32, the traditional way is to use a gate in the IDT together with a software interrupt (Linux uses int 0x80). Newer processors have other (faster) ways to do it, and which ones are available depends on whether the CPU is from AMD or Intel, and on the specific CPU model. To accomodate this variation, recent Linux kernels use a page of code mapped by the kernel at the top of the address space for every process. So, on recent Linux, to do a system call you call a function on this page, which will in turn do whatever is needed to switch to kernel mode (the kernel has more than one copy of that page, and choses which copy to use on boot depending on your processor's features).
Now, the memory management. This is a huge subject; you could write a large book about it and not exaust the subject.
Be sure to keep in mind that there are at least two views of the memory: the physical view (the real order of the pages, visible to the hardware memory subsystem and often to external peripherals) and the logical view (the order of the pages seen by programs running on the CPU). It's quite easy to confuse both. You will be allocating physical pages and assigning them to logical addresses on the program or kernel address space. A single physical page can have several logical addresses, and can be mapped to different logical addresses in different processes.
The kernel memory (reserved for the kernel) is usually mapped at the top of the address space of every process. However, it is set up so it can only be acessed on kernel mode. There is no need for fancy tricks to hide that portion of memory; the hardware does all the work of blocking the access (on IA-32, it is done via page flags or segment limits).
The programs allocate memory on the rest of the address space in several ways:
brk()
system call). This is traditionally used by the heap (the memory allocator on the C library, of whichmalloc()
is one of the interfaces, is responsible for the heap).mmap()
. With an anonymousmmap
, you can allocate new areas of the address space which are not backed by any file, but otherwise act the same way. The kernel's program loader will often usemmap
to allocate parts of the program code (for instance, the program code can be backed by the executable itself).Acessing areas of the address space which are not allocated in any way (or are reserved for the kernel) is considered an error, and on Unix will cause a signal to be sent to the program.
The compiler either allocates memory statically (by specifying it on the executable file headers; the kernel's program loader will allocate the memory when loading the program) or dynamically (by calling a function on the language's standard library, which usually then calls a function in the C language standard library, which then calls the kernel to allocate memory and subdivides it if necessary).
The best way to learn the basics of all this is to read one of the several books on operating systems, in particular the ones which use a Unix variant as an example. It will go in way more detail than I could on an answer on StackOverflow.
这个问题的答案高度依赖于架构。 我假设您正在谈论 x86。 对于 x86,内核通常提供一组系统调用,它们是内核的预定入口点。 用户代码只能在这些特定点进入内核,因此内核可以仔细控制它与用户代码的交互方式。
在 x86 中,有两种方法来实现系统调用:使用中断和使用 sysenter/sysexit 指令。 对于中断,内核会设置一个中断描述符表(IDT),它定义了内核可能的入口点。 然后,用户代码可以使用
int
指令生成软中断来调用内核。 中断也可以由硬件产生(所谓的硬中断); 这些中断通常应与软中断不同,但并非必须如此。sysenter 和 sysexit 指令是执行系统调用的更快方式,因为处理中断很慢; 我对使用它们不太熟悉,所以我无法评论它们是否适合您的情况。
无论您使用哪一种,都必须定义系统调用接口。 您可能希望在寄存器中而不是在堆栈上传递系统调用参数,因为生成中断会导致您将堆栈切换到内核堆栈。 这意味着您几乎肯定必须在用户模式端编写一些汇编语言存根以进行系统调用,并再次在内核端编写一些汇编语言存根以收集系统调用参数并保存寄存器。
一旦所有这些都准备就绪,您就可以开始考虑处理页面错误了。 页错误实际上只是另一种类型的中断 - 当用户代码尝试访问没有页表条目的虚拟地址时,它将生成中断 14,并且您还将获得错误地址作为错误代码。 内核可以获取此信息,然后决定从磁盘读入丢失的页面,添加页表映射,然后跳回用户代码。
我强烈建议您查看MIT 操作系统课程中的一些材料。 查看参考资料部分,里面有很多好东西。
The answer to this question is highly architecture-dependent. I'm going to assume you're talking about x86. With x86, a kernel generally provides a set of system calls, which are predetermined entry points into the kernel. User code can only enter the kernel at those specific points, so the kernel has careful control over how it interacts with user code.
In x86, there are two ways to implement system calls: with interrupts, and with the sysenter/sysexit instructions. With interrupts, the kernel sets up an interrupt descriptor table (IDT), which defines the possible entry points into the kernel. User code can then use the
int
instruction to generate a soft interrupt to call into the kernel. Interrupts can also be generated by hardware (so-called hard interrupts); those interrupts should generally be distinct from soft interrupts, but they don't have to be.The sysenter and sysexit instructions are a faster way of performing system calls, since handling interrupts is slow; I'm not that familiar with using them, so I can't comment on whether or not they're a better choice for your situation.
Whichever you use, you'll have to define the system call interface. You'll probably want to pass system call arguments in registers and not on the stack, since generating an interrupt will cause you to switch stacks to the kernel stack. This means you'll almost certainly have to write some assembly language stubs on both the user-mode end to make the system call, and again on the kernel end to gather the system call arguments and save the registers.
Once you have all of that in place, you can start thinking about handling page faults. Page faults are effectively just another type of interrupt - when user code tries to access a virtual address for which there is no page table entry, it will generate interrupt 14, and you'll also get the faulting address as an error code. The kernel can take this information and then decide to read in the missing page from disk, add the page table mapping, and jump back into user code.
I highly recommend you take a look at some of the materials from the MIT Operating Systems class. Check out the references section, it's got loads of good stuff.