如何在Linux上为clone()系统调用映射堆栈？

发布于 2024-07-26 07:53:13 字数 579 浏览 4 评论 0原文

Linux 上的 clone() 系统调用采用一个指向堆栈的参数，供新创建的线程使用。执行此操作的明显方法是简单地 malloc 一些空间并传递它，但是您必须确保您已经 malloc 了与该线程将使用的一样多的堆栈空间（很难预测）。

我记得使用 pthreads 时我不必这样做，所以我很好奇它做了什么。我发现这个网站解释道：“Linux pthreads 实现使用的最佳解决方案是使用 mmap 来分配内存，并使用标志指定在使用时分配的内存区域。这样，可以根据需要为堆栈分配内存，如果系统无法分配额外的内存，则会发生分段冲突。记忆。”

我听说过 mmap 使用的唯一上下文是将文件映射到内存中，并且实际上读取 mmap 手册页它需要一个文件描述符。如何使用它来分配动态长度的堆栈以提供给clone()？那个网站是不是太疯狂了？ ;)

在任何一种情况下，内核都不需要知道如何为新堆栈找到一堆空闲内存，因为当用户启动新进程时它必须一直这样做？如果内核已经能够弄清楚这一点，为什么还需要首先指定堆栈指针？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

陌路终见情 2024-08-02 07:53:13

堆栈的增长空间不是、也永远不可能是无限的。与其他所有东西一样，它们存在于进程的虚拟地址空间中，并且它们可以增长的数量始终受到到相邻映射内存区域的距离的限制。

当人们谈论堆栈动态增长时，他们可能指的是以下两件事之一：

堆栈的页面可能是写时复制零页面，在执行第一次写入之前不会获得私有副本。
堆栈区域的较低部分可能尚未被保留（因此不计入进程的提交费用，即内核为进程保留的物理内存/交换量），直到命中保护页，其中如果内核提交更多内容并移动保护页，或者如果没有剩余内存可供提交，则终止进程。

尝试依赖 MAP_GROWSDOWN 标志是不可靠且危险的，因为它无法保护您免受 mmap 创建紧邻堆栈的新映射的影响，这然后就会被打垮。（参见http://lwn.net/Articles/294001/）对于主线程，内核自动在堆栈下方保留堆栈大小 ulimit 的地址空间（不是内存），并防止 mmap分配它。（但要注意！一些损坏的供应商修补内核会禁用此行为，从而导致随机内存损坏！）对于其他线程，您只需必须 mmap 线程的整个地址空间范围创建时可能需要堆栈。没有其他办法。您可以将其大部分最初设置为不可写/不可读取，并在出现错误时进行更改，但是随后您需要信号处理程序，并且此解决方案在 POSIX 线程实现中是不可接受的，因为它会干扰应用程序的信号处理程序。（请注意，作为扩展，内核可以提供特殊的MAP_标志来在非法访问映射时传递不同的信号而不是SIGSEGV ，然后线程实现可以捕获此信号并对其进行操作。）

最后，请注意 clone 系统调用不采用堆栈指针参数，因为它不需要它。系统调用必须从汇编代码执行，因为用户空间包装器需要更改“子”线程中的堆栈指针以指向所需的堆栈，并避免向父级堆栈写入任何内容。

实际上， clone 确实接受堆栈指针参数，因为返回用户空间后等待更改“子”中的堆栈指针是不安全的。除非信号全部被阻止，否则信号处理程序可能会立即在错误的堆栈上运行，并且在某些体系结构上，堆栈指针必须有效并始终指向可以安全写入的区域。

不仅无法从 C 语言中修改堆栈指针，而且还无法避免编译器在系统调用之后、堆栈指针更改之前破坏父堆栈的可能性。

Stacks are not, and never can be, unlimited in their space for growth. Like everything else, they live in the process's virtual address space, and the amount by which they can grow is always limited by the distance to the adjacent mapped memory region.

When people talk about the stack growing dynamically, what they might mean is one of two things:

Pages of the stack might be copy-on-write zero pages, which do not get private copies made until the first write is performed.
Lower parts of the stack region may not yet be reserved (and thus not count towards the process's commit charge, i.e. the amount of physical memory/swap the kernel has accounted for as reserved for the process) until a guard page is hit, in which case the kernel commits more and moves the guard page, or kills the process if there is no memory left to commit.

Trying to rely on the MAP_GROWSDOWN flag is unreliable and dangerous because it cannot protect you against mmap creating a new mapping just adjacent to your stack, which will then get clobbered. (See http://lwn.net/Articles/294001/) For the main thread, the kernel automatically reserves the stack-size ulimit worth of address space (not memory) below the stack and prevents mmap from allocating it. (But beware! Some broken vendor-patched kernels disable this behavior leading to random memory corruption!) For other threads, you simply must mmap the entire range of address space the thread might need for stack when creating it. There is no other way. You could make most of it initially non-writable/non-readable, and change that on faults, but then you'd need signal handlers and this solution is not acceptable in a POSIX threads implementation because it would interfere with the application's signal handlers. (Note that, as an extension, the kernel could offer special MAP_ flags to deliver a different signal instead of SIGSEGV on illegal access to the mapping, and then the threads implementation could catch and act on this signal. But Linux at present has no such feature.)

Finally, note that the clone syscall does not take a stack pointer argument because it does not need it. The syscall must be performed from assembly code, because the userspace wrapper is required to change the stack pointer in the "child" thread to point to the desired stack, and avoid writing anything to the parent's stack.

Actually, clone does take a stack pointer argument, because it's unsafe to wait to change stack pointer in the "child" after returning to userspace. Unless signals are all blocked, a signal handler could run immediately on the wrong stack, and on some architectures the stack pointer must be valid and point to an area safe to write at all times.

Not only is modifying the stack pointer impossible from C, but you also couldn't avoid the possibility that the compiler would clobber the parent's stack after the syscall but before the stack pointer was changed.

回复收藏 0 原文

南七夏 2024-08-02 07:53:13

您需要 mmap 的 MAP_ANONYMOUS 标志。还有 MAP_GROWSDOWN 因为你想将它用作堆栈。

类似于：

void *stack = mmap(NULL,initial_stacksize,PROT_WRITE|PROT_READ,MAP_PRIVATE|MAP_GROWSDOWN|MAP_ANONYMOUS,-1,0);

请参阅 mmap 手册页以获取更多信息。请记住，克隆是一个低级概念，除非您确实需要它提供的功能，否则您不应该使用它。它提供了很多控制 - 例如设置它自己的堆栈 - 以防万一你想做一些欺骗（例如在所有相关进程中都可以访问堆栈）。除非您有充分的理由使用克隆，否则请坚持使用 fork 或 pthreads。

You'd want the MAP_ANONYMOUS flag for mmap. And the MAP_GROWSDOWN since you want to make use it as a stack.

Something like:

void *stack = mmap(NULL,initial_stacksize,PROT_WRITE|PROT_READ,MAP_PRIVATE|MAP_GROWSDOWN|MAP_ANONYMOUS,-1,0);

See the mmap man page for more info. And remember, clone is a low level concept, that you're not meant to use unless you really need what it offers. And it offers a lot of control - like setting it's own stack - just in case you want to do some trickering(like having the stack accessible in all the related processes). Unless you have very good reason to use clone, stick with fork or pthreads.

回复收藏 0 原文

不甘平庸 2024-08-02 07:53:13

约瑟夫，回答你的最后一个问题：

当用户创建一个“正常”的新进程时，这是由 fork() 完成的。在这种情况下，内核根本不必担心创建新堆栈，因为新进程是旧进程的完整复制品，一直到堆栈。

如果用户使用 exec() 替换当前正在运行的进程，那么内核确实需要创建一个新的堆栈 - 但在这种情况下这很容易，因为它从一张白纸开始。 exec() 会清除进程的内存空间并重新初始化它，因此内核会说“在 exec() 之后，堆栈始终驻留在此处”。

然而，如果我们使用clone()，那么我们可以说新进程将与旧进程共享内存空间（CLONE_VM）。在这种情况下，内核不能像在调用进程中那样离开堆栈（就像 fork() 所做的那样），因为这样我们的两个进程就会互相踩踏对方的堆栈。内核也不能只是将其放在默认位置（如 exec()），因为该位置已占用此内存空间。唯一的解决方案是允许调用进程为它找到一个位置，这就是它所做的。

回复收藏 0 原文

温柔少女心 2024-08-02 07:53:13

下面是代码，它映射堆栈区域并指示克隆系统调用使用该区域作为堆栈。

#include <sys/mman.h>
#include <stdio.h>
#include <string.h>
#include <sched.h>

int execute_clone(void *arg)
{
    printf("\nclone function Executed....Sleeping\n");
    fflush(stdout);
    return 0;
}

int main()
{
    void *ptr;
    int rc;
    void *start =(void *) 0x0000010000000000;
    size_t len = 0x0000000000200000;

    ptr = mmap(start, len, PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED|MAP_GROWSDOWN, 0, 0);
    if(ptr == (void *)-1) 
    {
        perror("\nmmap failed");
    }

    rc = clone(&execute_clone, ptr + len, CLONE_VM, NULL);

    if(rc <= 0) 
    {
        perror("\nClone() failed");
    }
}

Here is the code, which mmaps a stack region and instructs the clone system call to use this region as the stack.

#include <sys/mman.h>
#include <stdio.h>
#include <string.h>
#include <sched.h>

int execute_clone(void *arg)
{
    printf("\nclone function Executed....Sleeping\n");
    fflush(stdout);
    return 0;
}

int main()
{
    void *ptr;
    int rc;
    void *start =(void *) 0x0000010000000000;
    size_t len = 0x0000000000200000;

    ptr = mmap(start, len, PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED|MAP_GROWSDOWN, 0, 0);
    if(ptr == (void *)-1) 
    {
        perror("\nmmap failed");
    }

    rc = clone(&execute_clone, ptr + len, CLONE_VM, NULL);

    if(rc <= 0) 
    {
        perror("\nClone() failed");
    }
}

回复收藏 0 原文