了解取消共享 CLONE_NEWNS 的行为
我编写了一个小型 C 程序,它仅执行取消共享(CLONE_NEWNS),然后执行系统(“bash”)。
手册页说进程应该有自己的名称空间。因此,在 shell 中我尝试卸载 /cgroup (cgroup 安装在原始计算机上)。
当我在机器上的 shell 中进行挂载时,/cgroup 也在那里被卸载。我在这里错过了什么吗?我认为 CLONE_NEWNS 是让我从进程中卸载文件系统而不影响主系统。
I wrote a small C program that simply does an unshare(CLONE_NEWNS) followed by system("bash").
The man page says that the process should have its own namespace. So, in the shell I tried unmount /cgroup (cgroup is mounted on the original machine).
When I do a mount in a shell on the machine, /cgroup is unmounted there too. Am I missing something here? I thought that CLONE_NEWNS was to let me unmount a file system from the process without affecting the main system.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
(顺便说一句,您不需要编写程序 - 您只需使用
unshare(1)
实用程序即可)。它仅在新命名空间中卸载文件系统,并将其挂载在原始命名空间中 - 问题是
mount
使用/etc/mtab
生成当前安装的文件系统的列表,这只是一个普通文件,可以通过新命名空间中的mount
命令进行更新。这意味着 /etc/mtab 与实际情况不同步(因为只有一个/etc/mtab
,但是有两个挂载命名空间)。请检查
/proc/mounts
来查看当前命名空间中实际安装的内容。(As an aside, you didn't need to write a program - you could just use the
unshare(1)
utility).It is unmounting the filesystem only in the new namespace, and leaving it mounted in the original - the problem is that
mount
uses/etc/mtab
to produce the list of currently-mounted filesystems, and that's just an ordinary file that can be updated by themount
command in the new namespace. This means that/etc/mtab
gets out of synch with what's really going on (since there's only one/etc/mtab
, but two mount namespaces).Check
/proc/mounts
instead, to see what's actually mounted in the current namespace.几乎可以肯定,这种行为是由于共享子树造成的,其中 /cgroup(即 /)的父挂载被标记为“共享”挂载,它将挂载和卸载事件传播到其他命名空间中的对等方(/的其他实例) 。您可以通过查看 /proc/self/mountinfo 中 /mount 的状态来验证这一点。这种行为很可能是由 systemd 建立的,它将内核默认的“私有”安装恢复为“共享”。要获得“私有”行为,您需要使用 / 私有
另请参阅 https:// /bbs.archlinux.org/viewtopic.php?id=194388 以及 https://lwn。网/文章/689856/
Almost certainly, this behavior is because of shared subtrees, where the parent mount of /cgroup (i.e., /), is marked as a "shared" mount that propagates mount and unmount events to its peers (other instances of /) in other namespaces. You can verify this by looking at the state of the / mount in /proc/self/mountinfo. This behavior has been most likely established by systemd, which reverts the kernel's default of "private" mounts to "shared". To get "private" behavior, you'd need to make / private using
See also https://bbs.archlinux.org/viewtopic.php?id=194388 and also https://lwn.net/Articles/689856/
进行了测试
我在 fedora 19 kernel 3.10 unshare --mount /bin/bash
df -h /boot/
使用的文件系统大小 Avail Use% Mounted on
/dev/sda1 485M 238M 222M 52% /boot
umount /boot/
上 第二个 shell
grep boot /proc/mounts
echo $?
1
也许我做错了什么,但结果是我所期望的
I did a test with unshare on fedora 19 kernel 3.10
unshare --mount /bin/bash
df -h /boot/
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 485M 238M 222M 52% /boot
umount /boot/
On a second shell
grep boot /proc/mounts
echo $?
1
Maybe i wrong something but the result is what i expected
取消共享在fedora中有效,而不是在ubuntu上,同时,如果您只尝试CLONE_NEWNS,它无法工作,似乎取消共享与直接调用
clone(child_main, child_stack+STACK_SIZE,
CLONE_NEWUTS | 克隆新闻CLONE_NEWPID |克隆新闻 | SIGCHLD,空);
这个调用,可以从另一个namespace看到namespace的操作
The unshare works in fedora, not on ubuntu, and at the same time, if you try just CLONE_NEWNS, it can not work, seems unshare not quite same as directly call
clone(child_main, child_stack+STACK_SIZE,
CLONE_NEWUTS | CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, NULL);
this call, namespace operations can be seen from another namespace