为什么 Windows 为其系统地址空间保留 1Gb(或 2Gb)?

发布于 2024-07-26 06:26:13 字数 430 浏览 9 评论 0原文

众所周知,Windows 应用程序在 32 位系统上通常有 2Gb 的私有地址空间。 使用 /3Gb 开关可以将该空间扩展到 3Gb。

操作系统自行保留剩余的 4Gb。

我的问题是为什么?

在内核模式下运行的代码(即设备驱动程序代码)有自己的地址空间。 为什么操作系统除了独享 4Gb 地址空间之外,还要为每个用户模式进程保留 2Gb?

我认为原因是用户模式和内核模式调用之间的转换。 例如,对 NtWriteFile 的调用将需要内核调度例程的地址(因此系统在每个应用程序中保留 2Gb)。 但是,使用SYSENTER,系统服务编号不足以让内核模式代码知道正在调用哪个函数/服务吗?

如果您能向我解释一下为什么操作系统占用每个用户模式进程的 2Gb(或 1Gb)空间如此重要。

It's a known fact that Windows applications usually have 2Gb of private address space on a 32bit system. This space can be extended to 3Gb with the /3Gb switch.

The operating system reserves itself the remaining of the 4Gb.

My question is WHY?

Code running in kernel mode (i.e. device driver code) has its own address space. Why, on top of a exclusive 4Gb address space, the operating system still want to reserve 2Gb of each user-mode process?

I thought the reason is the transition between user-mode and kernel-mode call. For example, a call to NtWriteFile will need an address for the kernel dispatch routine (hence why the system reserve 2Gb in each application). But, using SYSENTER, isn't the system service number enough for the kernel-mode code to know which function/service is being called?

If you could clarify to me why it's so important for the operating system to take 2Gb (or 1Gb) of each user-mode process.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

单身狗的梦 2024-08-02 06:26:13

两个不同的用户进程具有不同的虚拟地址空间。 由于虚拟↔物理地址映射不同,当从一个用户切换上下文时,TLB 缓存会失效处理到另一个。 这是非常昂贵的,因为如果没有在 TLB 中缓存地址,任何内存访问都会导致错误并遍历 PTE

系统调用涉及两个上下文切换:用户→内核,然后内核→用户。 为了加快速度,通常会保留顶部 1GB 或 2GB 的虚拟地址空间供内核使用。 由于虚拟地址空间在这些上下文切换中不会发生变化,因此不需要 TLB 刷新。 这是由每个 PTE 中的用户/主管位启用的,这确保内核内存只能在内核空间中访问; 即使页表相同,用户空间也无权访问。

如果硬件支持两个独立的 TLB,其中一个专门供内核使用,那么这种优化将不再有用。 但是,如果您有足够的专用空间,那么制作一个更大的 TLB 可能更值得。

Linux on x86 曾经支持一种称为“4G/4G split”的模式。 在这种模式下,用户空间可以完全访问整个4GB虚拟地址空间,内核也拥有完整的4GB虚拟地址空间。 如上所述,代价是每个系统调用都需要 TLB 刷新,以及在用户和内核内存之间复制数据的更复杂的例程。 经测量,这会造成高达 30% 的性能损失。


自从这个问题最初被提出和回答以来,时代已经发生了变化:64 位操作系统现在更加流行。 在当前 x86-64 操作系统中,用户程序允许使用 0 到 247-1 (0-128TB) 的虚拟地址,而内核则永久驻留在 247-1 的虚拟地址内support>×(217-1) 到 264-1(或者从 -247 到 -1,如果您将地址视为有符号整数)。

如果在 64 位 Windows 上运行 32 位可执行文件会发生什么? 您可能会认为从 0 到 232 (0-4GB) 的所有虚拟地址都很容易可用,但为了避免暴露现有程序中的错误,32 位可执行文件仍然限制为 0- 2GB,除非使用 /LARGEADDRESSAWARE 重新编译。 对于那些能够访问 0-4GB 的用户。 (这不是一个新标志;同样适用于使用 /3GB 开关运行的 32 位 Windows 内核,它将默认的 2G/2G 用户/内核分割更改为 3G/1G,当然, 3-4GB 仍然超出范围。)

可能存在哪些类型的错误? 例如,假设您正在实现快速排序,并且有两个指针,ab 分别指向数组的开头和末尾。 如果你用(a+b)/2选择中间作为枢轴,只要两个地址都在2GB以下,它就可以工作,但如果它们都在2GB以上,那么加法就会遇到整数溢出,结果将超出数组。 (正确的表达式是 a+(ba)/2。)

顺便说一句,32 位 Linux 具有默认的 3G/1G 用户/内核分割,历史上运行的程序的堆栈位于2-3GB 范围,因此任何此类编程错误都可能很快被清除。 64 位 Linux 允许 32 位程序访问 0-4GB。

Two different user processes have different virtual address spaces. Because the virtual↔physical address mappings are different, the TLB cache is invalidated when switching contexts from one user process to another. This is very expensive, as without the address already cached in the TLB, any memory access will result in a fault and a walk of the PTEs.

Syscalls involve two context switches: user→kernel, and then kernel→user. In order to speed this up, it is common to reserve the top 1GB or 2GB of virtual address space for kernel use. Because the virtual address space does not change across these context switches, no TLB flushes are necessary. This is enabled by a user/supervisor bit in each PTE, which ensures that kernel memory is only accessible while in the kernelspace; userspace has no access even though the page table is the same.

If there were hardware support for two separate TLBs, with one exclusively for kernel use, then this optimization would no longer be useful. However, if you have enough space to dedicate, it's probably more worthwhile to just make one larger TLB.

Linux on x86 once supported a mode known as "4G/4G split". In this mode, userspace has full access to the entire 4GB virtual address space, and the kernel also has a full 4GB virtual address space. The cost, as mentioned above, is that every syscall requires a TLB flush, along with more complex routines to copy data between user and kernel memory. This has been measured to impose up to a 30% performance penalty.


Times have changed since this question was originally asked and answered: 64-bit operating systems are now much more prevalent. In current OSes on x86-64, virtual addresses from 0 to 247-1 (0-128TB) are allowed for user programs while the kernel permanently resides within virtual addresses from 247×(217-1) to 264-1 (or from -247 to -1, if you treat addresses as signed integers).

What happens if you run a 32-bit executable on 64-bit Windows? You would think that all virtual addresses from 0 to 232 (0-4GB) would easily be available, but in order to avoid exposing bugs in existing programs, 32-bit executables are still limited to 0-2GB unless they are recompiled with /LARGEADDRESSAWARE. For those that are, they get access to 0-4GB. (This is not a new flag; the same applied in 32-bit Windows kernels running with the /3GB switch, which changed the default 2G/2G user/kernel split to 3G/1G, although of course 3-4GB would still be out of range.)

What sorts of bugs might there be? As an example, suppose you are implementing quicksort and have two pointers, a and b pointing at the start and past the end of an array. If you choose the middle as the pivot with (a+b)/2, it'll work as long as both the addresses are below 2GB, but if they are both above, then the addition will encounter integer overflow and the result will be outside the array. (The correct expression is a+(b-a)/2.)

As an aside, 32-bit Linux, with its default 3G/1G user/kernel split, has historically run programs with their stack located in the 2-3GB range, so any such programming errors would likely have be flushed out quickly. 64-bit Linux gives 32-bit programs access to 0-4GB.

紧拥背影 2024-08-02 06:26:13

Windows(像任何操作系统一样)不仅仅是内核+驱动程序。

您的应用程序依赖于许多不仅仅存在于内核空间中的操作系统服务。
有很多缓冲区、句柄和各种资源可以映射到进程自己的地址空间。 每当您调用返回窗口句柄或画笔的 Win32 API 函数时,这些东西都必须分配在进程中的某个位置。 因此,Windows 的一部分在内核中运行,是的,其他部分在它们自己的用户模式进程中运行,还有一些(您的应用程序需要直接访问的部分)映射到您的地址空间。 其中一部分是难以避免的,但另一个重要的附加因素是性能。 如果每个 Win32 调用都需要上下文切换,那么这将严重影响性能。 如果其中一些可以在用户模式下处理,因为它们所依赖的数据已经映射到您的地址空间,则可以避免上下文切换,并且可以节省相当多的 CPU 周期。

因此任何操作系统都需要预留一定量的地址空间。
我相信 Linux 默认情况下只为操作系统设置 1GB。

Raymond Chen 的博客曾经解释过 MS 在 Windows 上选择 2GB 的原因。 我没有链接,也不记得细节,但做出这个决定是因为 Windows NT 最初也是针对 Alpha 处理器,而在 Alpha 上,有一些非常好的理由进行 50/50分裂。 ;)

这与 Alpha 对 32 位和 64 位代码的支持有关。 :)

Windows (like any OS) is a lot more than the kernel + drivers.

Your application relies on a lot of OS services that do not just exist in kernel space.
There are a lot of buffers, handles and all sorts of resources that can get mapped to your process' own address space. Whenever you call a Win32 API function that returns, say, a window handle, or a brush, those things have to be allocated somewhere in your process. So part of Windows runs in the kernel, yes, other parts run in their own user-mode processes, and some, the ones your application needs direct access to, are mapped to your address space. Part of this is hard to avoid, but an important additional factor is performance. If every Win32 call required a context switch, it would be a major performance hit. If some of them can be handled in usermode because the data they rely on is already mapped to your address space, the context switch is avoided, and you save quite a few CPU cycles.

So any OS needs some amount of the address space set aside.
I believe Linux by default sets only 1GB for the OS.

The reason why MS settled on 2GB with Windows was explained on Raymond Chen's blog once. I don't have the link, and I can't remember the details, but the decision was made because Windows NT was originally targeted at Alpha processors as well, and on Alpha's, there was some REALLY good reason to do the 50/50 split. ;)

It was something to do with the Alpha's support for 32 as well as 64-bit code. :)

够运 2024-08-02 06:26:13

在内核模式下运行的代码(即设备驱动程序代码)有自己的地址空间。

不,不是的。 它必须与 x86 处理器上进程的用户模式部分共享该地址空间。 这就是为什么内核必须预留足够的空间并限制地址空间。

Code running in kernel mode (ie device driver code) has it's own address space.

No it does not. It has to share that address space with the user mode portion of a process on x86 processors. That's why the kernel have to reserve space enough in total and finite the address space.

不知在何时 2024-08-02 06:26:13

我相信最好的答案是,操作系统设计者认为,当您必须关心时,人们将使用 64 位 Windows。

但这里有一个 更好的讨论

I believe the best answer is that the OS designers felt that by the time you would have to care, people would be using 64-bit Windows.

But here's a better discussion.

走过海棠暮 2024-08-02 06:26:13

部分答案与微处理器架构的历史有关。 这是我所知道的一些,其他人可以提供更多最新的细节。

Intel 8086 处理器具有内存段偏移架构,提供 20 位内存地址,因此总可寻址物理内存为 1MB。

与当时的竞争处理器(如 Zilog Z80)不同,Intel 8086 只有一个地址空间,它不仅必须容纳电子存储器,还必须容纳与键盘、串行等次要外围设备的所有输入/输出通信端口、打印机端口和视频显示器。 (作为比较,Zilog Z80 有一个单独的输入/输出地址空间,带有用于访问的专用汇编操作码)

需要为不断增长的外设扩展范围留出空间,导致最初决定将地址空间从 0 分段到电子存储器中-640K,以及从 640K 到 1MB 的“其他内容”(输入/输出、ROM、视频内存等)。

随着 x86 系列的成长和发展,以及 PC 随之发展,类似的方案也被使用,最终形成了今天 4G 地址空间的 2G/2G 划分。

Part of the answer is to do with the history of microprocessor architectures. Here's some of what I know, others can provide more recent details.

The Intel 8086 processor had a segment-offset architecture for memory, giving 20 bit memory addresses, and therefore total addressable physical memory of 1MB.

Unlike competing processors of the era - like the Zilog Z80 - the Intel 8086 had only one address space which had to accommodate not only electronic memory, but all input/output communication with such minor peripherals as keyboard, serial ports, printer ports and video displays. (For comparison, the Zilog Z80 had a separate input/output address space with dedicated assembly opcodes for access)

The need to allow space for an ever growing range of peripheral expansions led to the original decision to segment the address space into electronic memory from 0-640K, and "other stuff" (input/output, ROMS, video memory etc) from 640K to 1MB.

As the x86 line grew and evolved, and PCs evolved with them, similar schemes have been used, ending with todays 2G/2G split of the 4G address space.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文