Linux ptrace 怎么会不安全或包含竞争条件?
我想通过 ptrace()
ing 我启动的进程及其所有子进程将创建的进程(包括孙子进程等)来实现沙箱。 ptrace()
父进程,即主管。将是一个简单的 C 或 Python 程序,从概念上讲,它将限制文件系统访问(基于路径名和访问方向(读或写)和套接字访问(例如不允许创建套接字)。
我应该注意什么,以便ptrace()
d 进程及其子进程(递归地)将无法绕过沙箱?主管在 fork()
时应该做些什么特殊的事情来避免?是否可以在没有竞争条件的情况下从子进程读取例如 rename()
的文件名
参数
- ? 以避免在
fork()
默认情况下禁止所有系统调用时出现(某些)竞争情况 - ,并编写允许的系统调用白名单,
- 确保
*at()< /code> 系统调用变体(例如
openat
)已得到适当保护
我还应该注意什么?
I'd like to implement a sandbox by ptrace()
ing a process I start and all its children would create (including grandchildren etc.). The ptrace()
parent process, i.e. the supervisor. would be a simple C or Python program, and conceptually it would limit filesystem access (based on the path name and the access direction (read or write) and socket access (e.g. disallowing socket creation).
What should I pay attention to so that the ptrace()
d process and its children (recursively) won't be able to bypass the sandbox? Is there anything special the supervisor should do at fork()
time to avoid race conditions? Is it possible to read the filename arguments of e.g. rename()
from child process without a race condition?
Here is what I've already planned to do:
PTRACE_O_TRACEFORK | PTRACE_O_TRACEVFORK | PTRACE_O_TRACECLONE
to avoid (some) race coditions whenfork()
ing- disallow all system calls by default, and compose a whitelist of allowed system calls
- make sure that the
*at()
system call variants (such asopenat
) are properly protected
What else should I pay attention to?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
主要问题是许多系统调用参数(例如文件名)作为用户空间指针传递到内核。任何允许同时运行并且对指针指向的内存具有写访问权限的任务都可以在主管检查这些参数之后以及内核对它们进行操作之前有效地修改这些参数。当内核跟踪指针时,所指向的内容可能已被另一个有权访问该内存的可调度任务(进程或线程)故意更改。例如:
阻止这种情况的一种方法是禁止使用
CLONE_VM
标志调用clone()
,此外还阻止创建任何可写的MAP_SHARED
内存映射(或者至少跟踪它们,以便拒绝任何尝试直接引用此类映射中的数据的系统调用)。您还可以在允许系统调用继续之前将任何此类参数复制到非共享反弹缓冲区中。这将有效防止任何线程应用程序在沙箱中运行。另一种方法是对每个潜在危险系统调用周围的跟踪组中的每个其他进程进行 SIGSTOP,等待它们实际停止,然后允许系统调用继续进行。返回后,您可以
SIGCONT
它们(除非它们已经停止)。不用说,这可能会对性能产生重大影响。(在堆栈上传递的系统调用参数以及共享的打开文件表也存在类似的问题)。
The major problem is that many syscall arguments, like filenames, are passed to the kernel as userspace pointers. Any task that is allowed to run simultaneously and has write access to the memory that the pointer points to can effectively modify these arguments after they are inspected by your supervisor and before the kernel acts on them. By the time the kernel follows the pointer, the pointed-to contents may have been deliberately changed by another schedulable task (process or thread) with access to that memory. For example:
One way to stop this is to disallow calling
clone()
with theCLONE_VM
flag, and in addition prevent any creation of writeableMAP_SHARED
memory mappings (or at least keep track of them such that you deny any syscall that tries to directly reference data from such a mapping). You could also copy any such argument into a non-shared bounce-buffer before allowing the syscall to proceed. This will effectively prevent any threaded application from running in the sandbox.The alternative is to
SIGSTOP
every other process in the traced group around every potentially dangerous syscall, wait for them to actually stop, then allow the syscall to proceed. After it returns, you thenSIGCONT
them (unless they were already stopped). Needless to say, this may have a significant performance impact.(There are also analogous problems with syscall arguments that are passed on the stack, and with shared open file tables).
ptrace 不是只能在事后收到通知吗?我认为你没有机会真正阻止系统调用的发生,只能在看到“邪恶”的东西时尽快杀死它。
看起来你更喜欢 SELinux 或 AppArmor 之类的东西,在那里你可以保证甚至没有一个非法调用能够通过。
Doesn't ptrace only get notifications after-the-fact? I don't think you have a chance to actually stop the syscall from happening, only to kill it as fast as you can once you see something "evil".
It seems like you're more looking for something like SELinux or AppArmor, where you can guarantee that not even one illegal call gets through.