然而,微内核的问题在于性能。在 x86 CPU 上,内存保护是通过两件事来实现的 - 当前权限级别(CPL,或“环'),0(最大访问权限)和3(用户模式)之间的数字,以及页表。页表将虚拟地址映射到物理地址,并在每个页(4096 字节的内存块)上设置访问限制。每个进程都有自己的页表,页表中的每个页都可以通过设置最大CPL、只读标志、禁止执行标志或禁止访问标志来限制。
更改 CPL 是一个相对较快的操作(尽管对于如何以及何时允许您这样做存在安全限制)。然而,更改页表的成本相当高,因为它需要清除 CPU 上名为翻译后备缓冲区( TLB)。
也就是说,近年来,宏观内核和微观内核之间的界限有些模糊。例如,在 Windows 7 中,图形驱动程序实际上与内核的其余部分隔离;如果崩溃,系统可以恢复。在Linux和OS X中,FUSE可以在用户空间加载文件系统驱动程序;事实上,这些系统上的 NTFS 驱动程序就使用了这种机制。
First, at some level you will always have a component that Cannot Fail™. If this component crashes, no recovery is possible. For example, if you trash the table of running processes, you can't rebuild this apart from rebooting. So even with memory protection limiting this component's crashes to affecting only itself, BSODs (or the equivalent) can occur.
But your point is a good one - there are a number of components that often can be reset without a catastrophic failure. Drivers, for example, or the networking stack. Indeed, there are OSes that do protection at this level - they are referred to as microkernel architectures.
The problem with microkernels, however, is performance. On x86 CPUs memory protection is achieved with two things - the Current Privilege Level (CPL, or 'ring'), a number between 0 (maximum access) and 3 (user mode), and the Page Table. The page table maps virtual addresses to physical addresses, and sets access restrictions on each page (4096-byte block of memory). Each process has its own page table, and each page in the page table can be restricted by setting a maximum CPL, read-only flag, no-execute flag, or no-access flag.
Changing your CPL is a relatively fast operation (although there are security restrictions on how and when you're allowed to do so). Changing the page table, however, is quite expensive, as it requires clearing a cache on the CPU called the Translation Lookaside Buffer (TLB).
Typically in a normal OS, the OS will reserve the lowest X GB of memory for user processes (3GB is usually the number chosen for 32-bit architectures). The upper (4 - X) GB are directly mapped to the first (4 - X) GB of physical memory, and restricted to CPL 0 ('ring 0') only. Thus, the kernel can put its private datastructures in the upper 1GB or so and always access them at the same virtual address, no matter what process is running. If a process makes a syscall that requires half a dozen subsystems to do something, no problem - you can just call functions between them.
However, in a microkernel system, each kernel subsystem gets its own page table, and its own address mappings. To service a user call, the CPU might need to make quite a few page table changes, and this performance hit adds up. Moreover, each subsystem needs to be prepared to deal with failures of its dependencies, increasing the complexity of the system. Because of these problems, microkernels, by and large, have been only used as research and toy OSes (eg, minix, GNU HURD).
That said, in recent years, there has been some blurring of the line between macro- and micro-kernels. For example, in Windows 7, the graphics driver is in fact isolated from the rest of the kernel; if it crashes, the system can recover. In Linux and OS X, FUSE can load filesystem drivers in userspace; the NTFS driver on these systems, in fact, uses this mechanism.
发布评论
评论(1)
首先,在某种程度上,您将始终拥有一个“不会失败”的组件。如果该组件崩溃,则无法恢复。例如,如果您删除了正在运行的进程表,则除了重新启动之外无法重建该表。因此,即使内存保护限制该组件的崩溃仅影响其自身,也可能会发生 BSOD(或类似情况)。
但你的观点是一个很好的观点 - 有许多组件通常可以重置而不会发生灾难性故障。例如,驱动程序或网络堆栈。事实上,有些操作系统在这个级别上提供保护 - 它们被称为微内核架构。
然而,微内核的问题在于性能。在 x86 CPU 上,内存保护是通过两件事来实现的 - 当前权限级别(CPL,或“环'),0(最大访问权限)和3(用户模式)之间的数字,以及页表。页表将虚拟地址映射到物理地址,并在每个页(4096 字节的内存块)上设置访问限制。每个进程都有自己的页表,页表中的每个页都可以通过设置最大CPL、只读标志、禁止执行标志或禁止访问标志来限制。
更改 CPL 是一个相对较快的操作(尽管对于如何以及何时允许您这样做存在安全限制)。然而,更改页表的成本相当高,因为它需要清除 CPU 上名为翻译后备缓冲区( TLB)。
通常,在普通操作系统中,操作系统将为用户进程保留最低 X GB 的内存(3GB 通常是为 32 位体系结构选择的数字)。上部 (4 - X) GB 直接映射到物理内存的第一个 (4 - X) GB,并且仅限于 CPL 0(“环 0”)。因此,内核可以将其私有数据结构放在上层 1GB 左右,并且始终在相同的虚拟地址上访问它们,无论正在运行什么进程。如果一个进程进行系统调用,需要六个子系统来做某事,没问题 - 您只需在它们之间调用函数即可。
然而,在微内核系统中,每个内核子系统都有自己的页表和地址映射。为了服务用户调用,CPU 可能需要进行相当多的页表更改,并且这种性能损失会增加。此外,每个子系统都需要准备好处理其依赖项的故障,从而增加了系统的复杂性。由于这些问题,微内核总体上仅被用作研究和玩具操作系统(例如,minix、GNU HURD)。
也就是说,近年来,宏观内核和微观内核之间的界限有些模糊。例如,在 Windows 7 中,图形驱动程序实际上与内核的其余部分隔离;如果崩溃,系统可以恢复。在Linux和OS X中,FUSE可以在用户空间加载文件系统驱动程序;事实上,这些系统上的 NTFS 驱动程序就使用了这种机制。
First, at some level you will always have a component that Cannot Fail™. If this component crashes, no recovery is possible. For example, if you trash the table of running processes, you can't rebuild this apart from rebooting. So even with memory protection limiting this component's crashes to affecting only itself, BSODs (or the equivalent) can occur.
But your point is a good one - there are a number of components that often can be reset without a catastrophic failure. Drivers, for example, or the networking stack. Indeed, there are OSes that do protection at this level - they are referred to as microkernel architectures.
The problem with microkernels, however, is performance. On x86 CPUs memory protection is achieved with two things - the Current Privilege Level (CPL, or 'ring'), a number between 0 (maximum access) and 3 (user mode), and the Page Table. The page table maps virtual addresses to physical addresses, and sets access restrictions on each page (4096-byte block of memory). Each process has its own page table, and each page in the page table can be restricted by setting a maximum CPL, read-only flag, no-execute flag, or no-access flag.
Changing your CPL is a relatively fast operation (although there are security restrictions on how and when you're allowed to do so). Changing the page table, however, is quite expensive, as it requires clearing a cache on the CPU called the Translation Lookaside Buffer (TLB).
Typically in a normal OS, the OS will reserve the lowest X GB of memory for user processes (3GB is usually the number chosen for 32-bit architectures). The upper (4 - X) GB are directly mapped to the first (4 - X) GB of physical memory, and restricted to CPL 0 ('ring 0') only. Thus, the kernel can put its private datastructures in the upper 1GB or so and always access them at the same virtual address, no matter what process is running. If a process makes a syscall that requires half a dozen subsystems to do something, no problem - you can just call functions between them.
However, in a microkernel system, each kernel subsystem gets its own page table, and its own address mappings. To service a user call, the CPU might need to make quite a few page table changes, and this performance hit adds up. Moreover, each subsystem needs to be prepared to deal with failures of its dependencies, increasing the complexity of the system. Because of these problems, microkernels, by and large, have been only used as research and toy OSes (eg, minix, GNU HURD).
That said, in recent years, there has been some blurring of the line between macro- and micro-kernels. For example, in Windows 7, the graphics driver is in fact isolated from the rest of the kernel; if it crashes, the system can recover. In Linux and OS X, FUSE can load filesystem drivers in userspace; the NTFS driver on these systems, in fact, uses this mechanism.