如何在Linux上为clone()系统调用映射堆栈?
Linux 上的 clone() 系统调用采用一个指向堆栈的参数,供新创建的线程使用。 执行此操作的明显方法是简单地 malloc 一些空间并传递它,但是您必须确保您已经 malloc 了与该线程将使用的一样多的堆栈空间(很难预测)。
我记得使用 pthreads 时我不必这样做,所以我很好奇它做了什么。 我发现 这个网站 解释道:“Linux pthreads 实现使用的最佳解决方案是使用 mmap 来分配内存,并使用标志指定在使用时分配的内存区域。这样,可以根据需要为堆栈分配内存,如果系统无法分配额外的内存,则会发生分段冲突。记忆。”
我听说过 mmap 使用的唯一上下文是将文件映射到内存中,并且实际上读取 mmap 手册页它需要一个文件描述符。 如何使用它来分配动态长度的堆栈以提供给clone()? 那个网站是不是太疯狂了? ;)
在任何一种情况下,内核都不需要知道如何为新堆栈找到一堆空闲内存,因为当用户启动新进程时它必须一直这样做? 如果内核已经能够弄清楚这一点,为什么还需要首先指定堆栈指针?
The clone() system call on Linux takes a parameter pointing to the stack for the new created thread to use. The obvious way to do this is to simply malloc some space and pass that, but then you have to be sure you've malloc'd as much stack space as that thread will ever use (hard to predict).
I remembered that when using pthreads I didn't have to do this, so I was curious what it did instead. I came across this site which explains, "The best solution, used by the Linux pthreads implementation, is to use mmap to allocate memory, with flags specifying a region of memory which is allocated as it is used. This way, memory is allocated for the stack as it is needed, and a segmentation violation will occur if the system is unable to allocate additional memory."
The only context I've ever heard mmap used in is for mapping files into memory, and indeed reading the mmap man page it takes a file descriptor. How can this be used for allocating a stack of dynamic length to give to clone()? Is that site just crazy? ;)
In either case, doesn't the kernel need to know how to find a free bunch of memory for a new stack anyway, since that's something it has to do all the time as the user launches new processes? Why does a stack pointer even need to be specified in the first place if the kernel can already figure this out?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
堆栈的增长空间不是、也永远不可能是无限的。 与其他所有东西一样,它们存在于进程的虚拟地址空间中,并且它们可以增长的数量始终受到到相邻映射内存区域的距离的限制。
当人们谈论堆栈动态增长时,他们可能指的是以下两件事之一:
尝试依赖
MAP_GROWSDOWN
标志是不可靠且危险的,因为它无法保护您免受mmap
创建紧邻堆栈的新映射的影响,这然后就会被打垮。 (参见http://lwn.net/Articles/294001/)对于主线程,内核自动在堆栈下方保留堆栈大小ulimit
的地址空间(不是内存),并防止mmap
分配它。 (但要注意!一些损坏的供应商修补内核会禁用此行为,从而导致随机内存损坏!)对于其他线程,您只需必须mmap
线程的整个地址空间范围创建时可能需要堆栈。 没有其他办法。 您可以将其大部分最初设置为不可写/不可读取,并在出现错误时进行更改,但是随后您需要信号处理程序,并且此解决方案在 POSIX 线程实现中是不可接受的,因为它会干扰应用程序的信号处理程序。 (请注意,作为扩展,内核可以提供特殊的MAP_
标志来在非法访问映射时传递不同的信号而不是SIGSEGV
,然后线程实现可以捕获此信号并对其进行操作。)最后,请注意clone
系统调用不采用堆栈指针参数,因为它不需要它。 系统调用必须从汇编代码执行,因为用户空间包装器需要更改“子”线程中的堆栈指针以指向所需的堆栈,并避免向父级堆栈写入任何内容。实际上,
clone
确实接受堆栈指针参数,因为返回用户空间后等待更改“子”中的堆栈指针是不安全的。 除非信号全部被阻止,否则信号处理程序可能会立即在错误的堆栈上运行,并且在某些体系结构上,堆栈指针必须有效并始终指向可以安全写入的区域。不仅无法从 C 语言中修改堆栈指针,而且还无法避免编译器在系统调用之后、堆栈指针更改之前破坏父堆栈的可能性。
Stacks are not, and never can be, unlimited in their space for growth. Like everything else, they live in the process's virtual address space, and the amount by which they can grow is always limited by the distance to the adjacent mapped memory region.
When people talk about the stack growing dynamically, what they might mean is one of two things:
Trying to rely on the
MAP_GROWSDOWN
flag is unreliable and dangerous because it cannot protect you againstmmap
creating a new mapping just adjacent to your stack, which will then get clobbered. (See http://lwn.net/Articles/294001/) For the main thread, the kernel automatically reserves the stack-sizeulimit
worth of address space (not memory) below the stack and preventsmmap
from allocating it. (But beware! Some broken vendor-patched kernels disable this behavior leading to random memory corruption!) For other threads, you simply mustmmap
the entire range of address space the thread might need for stack when creating it. There is no other way. You could make most of it initially non-writable/non-readable, and change that on faults, but then you'd need signal handlers and this solution is not acceptable in a POSIX threads implementation because it would interfere with the application's signal handlers. (Note that, as an extension, the kernel could offer specialMAP_
flags to deliver a different signal instead ofSIGSEGV
on illegal access to the mapping, and then the threads implementation could catch and act on this signal. But Linux at present has no such feature.)Finally, note that theclone
syscall does not take a stack pointer argument because it does not need it. The syscall must be performed from assembly code, because the userspace wrapper is required to change the stack pointer in the "child" thread to point to the desired stack, and avoid writing anything to the parent's stack.Actually,
clone
does take a stack pointer argument, because it's unsafe to wait to change stack pointer in the "child" after returning to userspace. Unless signals are all blocked, a signal handler could run immediately on the wrong stack, and on some architectures the stack pointer must be valid and point to an area safe to write at all times.Not only is modifying the stack pointer impossible from C, but you also couldn't avoid the possibility that the compiler would clobber the parent's stack after the syscall but before the stack pointer was changed.
您需要 mmap 的 MAP_ANONYMOUS 标志。 还有 MAP_GROWSDOWN 因为你想将它用作堆栈。
类似于:
请参阅 mmap 手册页以获取更多信息。 请记住,克隆是一个低级概念,除非您确实需要它提供的功能,否则您不应该使用它。 它提供了很多控制 - 例如设置它自己的堆栈 - 以防万一你想做一些欺骗(例如在所有相关进程中都可以访问堆栈)。 除非您有充分的理由使用克隆,否则请坚持使用 fork 或 pthreads。
You'd want the MAP_ANONYMOUS flag for mmap. And the MAP_GROWSDOWN since you want to make use it as a stack.
Something like:
See the mmap man page for more info. And remember, clone is a low level concept, that you're not meant to use unless you really need what it offers. And it offers a lot of control - like setting it's own stack - just in case you want to do some trickering(like having the stack accessible in all the related processes). Unless you have very good reason to use clone, stick with fork or pthreads.
约瑟夫,回答你的最后一个问题:
当用户创建一个“正常”的新进程时,这是由 fork() 完成的。 在这种情况下,内核根本不必担心创建新堆栈,因为新进程是旧进程的完整复制品,一直到堆栈。
如果用户使用 exec() 替换当前正在运行的进程,那么内核确实需要创建一个新的堆栈 - 但在这种情况下这很容易,因为它从一张白纸开始。 exec() 会清除进程的内存空间并重新初始化它,因此内核会说“在 exec() 之后,堆栈始终驻留在此处”。
然而,如果我们使用clone(),那么我们可以说新进程将与旧进程共享内存空间(CLONE_VM)。 在这种情况下,内核不能像在调用进程中那样离开堆栈(就像 fork() 所做的那样),因为这样我们的两个进程就会互相踩踏对方的堆栈。 内核也不能只是将其放在默认位置(如 exec()),因为该位置已占用此内存空间。 唯一的解决方案是允许调用进程为它找到一个位置,这就是它所做的。
Joseph, in answer to your last question:
When a user creates a "normal" new process, that's done by fork(). In this case, the kernel doesn't have to worry about creating a new stack at all, because the new process is a complete duplicate of the old one, right down to the stack.
If the user replaces the currently running process using exec(), then the kernel does need to create a new stack - but in this case that's easy, because it gets to start from a blank slate. exec() wipes out the memory space of the process and reinitialises it, so the kernel gets to say "after exec(), the stack always lives HERE".
If, however, we use clone(), then we can say that the new process will share a memory space with the old process (CLONE_VM). In this situation, the kernel can't leave the stack as it was in the calling process (like fork() does), because then our two processes would be stomping on each other's stack. The kernel also can't just put it in a default location (like exec()) does, because that location is already taken in this memory space. The only solution is to allow the calling process to find a place for it, which is what it does.
下面是代码,它映射堆栈区域并指示克隆系统调用使用该区域作为堆栈。
Here is the code, which mmaps a stack region and instructs the clone system call to use this region as the stack.
mmap 不仅仅是将文件映射到内存。 事实上,一些 malloc 实现将使用 mmap 进行大量分配。 如果您阅读了详细的手册页,您会注意到 MAP_ANONYMOUS 标志,并且您会发现您根本不需要提供文件描述符。
至于为什么内核不能只是“找到一堆空闲内存”,如果你希望有人为你做这项工作,要么使用 fork,要么使用 pthread。
mmap is more than just mapping a file into memory. In fact, some malloc implementations will use mmap for large allocations. If you read the fine man page you'll notice the MAP_ANONYMOUS flag, and you'll see that you need not need supply a file descriptor at all.
As for why the kernel can't just "find a bunch of free memory", well if you want someone to do that work for you, either use fork instead, or use pthreads.
请注意,
clone
系统调用不采用堆栈位置的参数。 它实际上就像fork
一样工作。 这只是 glibc 包装器接受这个参数。Note that the
clone
system call doesn't take an argument for the stack location. It actually works just likefork
. It's just the glibc wrapper which takes that argument.我认为堆栈会向下增长,直到无法增长为止,例如当它增长到之前分配过的内存时,可能会通知错误。可以看出,如果存在冗余空间,则默认是最小可用堆栈大小downs 当栈满时,可以向下增长,否则系统可能会报错。
I think the stack grows downwards until it can not grow, for example when it grows to a memory that has been allocated before, maybe a fault is notified.That can be seen a default is the minimum available stack size, if there is redundant space downwards when the stack is full, it can grow downwards, otherwise, the system may notify a fault.