为什么 COW mmap 在大于 4GB 的(稀疏)文件上使用 ENOMEM 失败?

发布于 2024-09-16 14:21:16 字数 1470 浏览 8 评论 0原文

当尝试使用写时复制语义(PROT_READ | PROT_WRITE 和 MAP_PRIVATE)映射 5GB 文件时,会在 2.6.26-2-amd64 Linux 内核上发生这种情况。映射小于 4GB 的文件或仅使用 PROT_READ 效果很好。这不是 这个问题;虚拟限制大小没有限制。

这是重现问题的代码(实际代码是 Boost.Interprocess)。

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>

#include <fcntl.h>
#include <unistd.h>

main()
{
        struct stat b;
        void *base;
        int fd = open("foo.bin", O_RDWR);

        fstat(fd, &b);
        base = mmap(0, b.st_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
        if (base == MAP_FAILED) {
                perror("mmap");
                return 1;
        }
        return 0;
}

发生的情况如下:

dd if=/dev/zero of=foo.bin bs=1M seek=5000 count=1
./test-mmap
mmap: Cannot allocate memory

这是相关的 strace (新编译的 4.5.20)输出,如 nos 所要求的。

open("foo.bin", O_RDWR)                 = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5243928576, ...}) = 0
mmap(NULL, 5243928576, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = -1 ENOMEM (Cannot allocate memory)
dup(2)                                  = 4
[...]
write(4, "mmap: Cannot allocate memory\n", 29mmap: Cannot allocate memory
) = 29

This happens on a 2.6.26-2-amd64 Linux kernel when trying to mmap a 5GB file with copy-on-write semantics ( PROT_READ | PROT_WRITE and MAP_PRIVATE). Mapping files smaller than 4GB or using only PROT_READ works fine. This is not a soft resource limit issue as reported in this question; the virtual limit size is unlimited.

Here is the code that reproduces the problem (the actual code is part of Boost.Interprocess).

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>

#include <fcntl.h>
#include <unistd.h>

main()
{
        struct stat b;
        void *base;
        int fd = open("foo.bin", O_RDWR);

        fstat(fd, &b);
        base = mmap(0, b.st_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
        if (base == MAP_FAILED) {
                perror("mmap");
                return 1;
        }
        return 0;
}

and here is what happens:

dd if=/dev/zero of=foo.bin bs=1M seek=5000 count=1
./test-mmap
mmap: Cannot allocate memory

Here is the relevant strace (freshly compiled 4.5.20) output, as asked by nos.

open("foo.bin", O_RDWR)                 = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=5243928576, ...}) = 0
mmap(NULL, 5243928576, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = -1 ENOMEM (Cannot allocate memory)
dup(2)                                  = 4
[...]
write(4, "mmap: Cannot allocate memory\n", 29mmap: Cannot allocate memory
) = 29

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

天生の放荡 2024-09-23 14:21:16

尝试在 flags 字段中传递 MAP_NORESERVE,如下所示:

mmap(NULL, b.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, fd, 0);

您的交换内存和物理内存的组合可能小于请求的 5GB。

或者,您可以出于测试目的执行此操作,如果有效,您可以更改上面的代码:

# echo 0 > /proc/sys/vm/overcommit_memory

以下是手册页的相关摘录。

mmap(2):

   MAP_NORESERVE
          Do  not reserve swap space for this mapping.  When swap space is
          reserved, one has the guarantee that it is  possible  to  modify
          the  mapping.   When  swap  space  is not reserved one might get
          SIGSEGV upon a write if no physical memory  is  available.   See
          also  the  discussion of the file /proc/sys/vm/overcommit_memory
          in proc(5).  In kernels before 2.6, this flag  only  had  effect
          for private writable mappings.

过程(5):

   /proc/sys/vm/overcommit_memory
          This file contains the kernel virtual  memory  accounting  mode.
          Values are:

                 0: heuristic overcommit (this is the default)
                 1: always overcommit, never check
                 2: always check, never overcommit

          In  mode 0, calls of mmap(2) with MAP_NORESERVE are not checked,
          and the default check is very weak, leading to the risk of  get‐
          ting a process "OOM-killed".  Under Linux 2.4 any non-zero value
          implies mode 1.  In mode 2  (available  since  Linux  2.6),  the
          total  virtual  address  space on the system is limited to (SS +
          RAM*(r/100)), where SS is the size of the swap space, and RAM is
          the  size  of  the physical memory, and r is the contents of the
          file /proc/sys/vm/overcommit_ratio.

Try passing MAP_NORESERVE in the flags field like this:

mmap(NULL, b.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, fd, 0);

It's likely the combination of your swap and physical memory are less than the 5GB requested.

Alternatively you can do this for testing purposes, if it works, you can make the code change above:

# echo 0 > /proc/sys/vm/overcommit_memory

Below are the relevant extracts from the manual pages.

mmap(2):

   MAP_NORESERVE
          Do  not reserve swap space for this mapping.  When swap space is
          reserved, one has the guarantee that it is  possible  to  modify
          the  mapping.   When  swap  space  is not reserved one might get
          SIGSEGV upon a write if no physical memory  is  available.   See
          also  the  discussion of the file /proc/sys/vm/overcommit_memory
          in proc(5).  In kernels before 2.6, this flag  only  had  effect
          for private writable mappings.

proc(5):

   /proc/sys/vm/overcommit_memory
          This file contains the kernel virtual  memory  accounting  mode.
          Values are:

                 0: heuristic overcommit (this is the default)
                 1: always overcommit, never check
                 2: always check, never overcommit

          In  mode 0, calls of mmap(2) with MAP_NORESERVE are not checked,
          and the default check is very weak, leading to the risk of  get‐
          ting a process "OOM-killed".  Under Linux 2.4 any non-zero value
          implies mode 1.  In mode 2  (available  since  Linux  2.6),  the
          total  virtual  address  space on the system is limited to (SS +
          RAM*(r/100)), where SS is the size of the swap space, and RAM is
          the  size  of  the physical memory, and r is the contents of the
          file /proc/sys/vm/overcommit_ratio.
无可置疑 2024-09-23 14:21:16

从评论中引用您的内存、交换大小和过量使用设置:

MemTotal: 4063428 kB SwapTotal: 514072 kB
$ cat /proc/sys/vm/overcommit_memory
0
$ cat /proc/sys/vm/overcommit_ratio 
50

overcommit_memory 设置为 0(“启发式过量使用”)时,您无法创建大于当前可用内存的私有可写映射,并且交换总数 - 显然,由于您只有 4.5GB 内存 + 交换,所以这永远不可能是真的。

您的选择是使用 MAP_NORESERVE (如 Matt Joiner 建议),如果您确定您永远不会弄脏(写入)映射中的页面多于可用内存和交换空间为了;或者显着增加交换空间的大小。

Quoting your memory, swap size and overcommit settings from your comment:

MemTotal: 4063428 kB SwapTotal: 514072 kB
$ cat /proc/sys/vm/overcommit_memory
0
$ cat /proc/sys/vm/overcommit_ratio 
50

With overcommit_memory set to 0 ("heuristic overcommit"), you can't create a private, writeable mapping that's larger than the current free memory and swap total - clearly, since you only have 4.5GB of memory + swap, that can never be true.

Your options are either to use MAP_NORESERVE (as Matt Joiner suggests), if you're sure that you'll never dirty (write to) more pages in the mapping than you have free memory and swap for; or to significantly increase the size of your swap space.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文