Linux 内存映射文件保留大量物理内存

发布于 2024-09-24 16:08:40 字数 876 浏览 6 评论 0原文

我有一个在多个线程中描述的问题，涉及 Linux 下的内存映射和不断增长的内存消耗。

当我在Linux或MacOS X下打开一个1GB文件并将其映射到内存

me.data_begin = mmap(NULL, capacity(me), prot, MAP_SHARED, me.file.handle, 0);

并顺序读取映射的内存时，尽管我使用了posix_madvise（甚至在读取过程中多次调用它），但我的程序使用了越来越多的物理内存：

posix_madvise(me.data_begin, capacity(me), MMAP_SEQUENTIAL);

没有成功。 :-(

我尝试过：

调用 mmap 之前和之后不同的标志 MMAP_RANDOM、MMAP_DONTNEED、MMAP_NORMAL 没有成功
posix_fadvise(me.file.handle, 0,capacity(me), POSIX_FADV_DONTNEED) -> 没有成功

它在 Mac OS 下工作X !!!当我合并时

posix_madvise(.. MMAP_SEQUENTIAL)

，

msync(me.data_begin, capacity(me), MS_INVALIDATE).

驻留内存低于16M（我在16mio步骤后定期调用msync）

但是在Linux下没有任何效果有人有想法或成功故事吗？对于我在 Linux 下的问题

？大卫

原文

I have a problem that was described in multiple threads concerning memory mapping and a growing memory consumption under Linux.

When I open a 1GB file under Linux or MacOS X and map it into memory using

me.data_begin = mmap(NULL, capacity(me), prot, MAP_SHARED, me.file.handle, 0);

and sequentially read the mapped memory, my program uses more and more physical memory although I used posix_madvise (even called it multiple times during the read process):

posix_madvise(me.data_begin, capacity(me), MMAP_SEQUENTIAL);

without success. :-(

I tried:

different flags MMAP_RANDOM, MMAP_DONTNEED, MMAP_NORMAL without success
posix_fadvise(me.file.handle, 0, capacity(me), POSIX_FADV_DONTNEED) before and after calling mmap -> no success

It works under Mac OS X !!! when I combine

posix_madvise(.. MMAP_SEQUENTIAL)

and

msync(me.data_begin, capacity(me), MS_INVALIDATE).

The resident memory is below 16M (I periodically called msync after 16mio steps).

But under Linux nothing works. Does anyone has an idea or a success story for my problem under Linux?

Cheers,
David

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

彡翼 2024-10-01 16:08:40

Linux的内存管理与其他系统不同。关键原则是未使用的内存就是浪费的内存。 Linux 在很多方面都尝试最大化内存使用，从而（大多数时候）获得更好的性能。

这并不是说 Linux 中“什么都不起作用”，而是它的行为与您的预期略有不同。

当从映射文件中提取内存页时，操作系统必须决定要释放（或换出）哪些物理内存页才能使用。它将寻找更容易换出的页面（不需要立即写入磁盘）并且不太可能再次使用。

madvice() POSIX 调用用于告诉系统您的应用程序将如何使用页面。但正如其名称所示，这是一个建议，以便操作系统更好地做出分页和交换决策。这既不是政策，也不是命令。

为了演示 madvice() 在 Linux 上的效果，我修改了给学生的练习之一。请参阅此处的完整源代码。我的系统是 64 位，有 2 GB RAM，目前使用率约为 50%。使用该程序映射 2 GB 文件，按顺序读取它并丢弃所有内容。每读取 200 MB 就会报告 RSS 使用情况。 没有 madvice() 的结果：

<juliano@home> ~% ./madvtest file.dat n
     0 :     3 MB
   200 :   202 MB
   400 :   402 MB
   600 :   602 MB
   800 :   802 MB
  1000 :  1002 MB
  1200 :  1066 MB
  1400 :  1068 MB
  1600 :  1078 MB
  1800 :  1113 MB
  2000 :  1113 MB

Linux 不断将数据挤出内存，直到读取大约 1 GB 的数据。之后，它开始对进程本身施加压力（因为其他 50% 的内存被其他进程激活）并稳定下来，直到文件结束。

现在，使用 madvice()：

<juliano@home> ~% ./madvtest file.dat y
     0 :     3 MB
   200 :   202 MB
   400 :   402 MB
   600 :   494 MB
   800 :   501 MB
  1000 :   518 MB
  1200 :   530 MB
  1400 :   530 MB
  1600 :   530 MB
  1800 :   595 MB
  2000 :   788 MB

请注意，Linux 决定仅在达到 500 MB 左右时才向进程分配页面，这比不使用 madvice() 快得多。这是因为在那之后，当前在内存中的页面似乎比被该进程标记为顺序访问的页面更有价值。 VMM 中有一个阈值，定义何时开始从进程中删除旧页面。

您可能会问，为什么 Linux 不断分配大约 500 MB 的页面并且没有很快停止，因为它们被标记为顺序访问。要么系统有足够的可用内存页面，要么其他常驻页面太旧而无法保留。在将似乎不再有用的旧页面保留在内存中和引入更多页面来服务于现在正在运行的程序之间，Linux 选择了第二个选项。

即使它们被标记为顺序访问，这也只是一个建议。应用程序可能仍想返回这些页面并再次读取它们。或者系统中的另一个应用程序。 madvice() 调用仅说明应用程序本身正在做什么，Linux 会考虑更大的情况。

Linux memory management is different from other systems. The key principle is that memory that is not being used is memory being wasted. In many ways, Linux tries to maximize memory usage, resulting (most of the time) in better performance.

It is not that "nothing works" in Linux, but that its behavior is a little different than you expect.

When memory pages are pulled from the mmapped file, the operating system has to decide which physical memory pages it will release (or swap out) in order to use. It will look for pages which are easier to swap out (don't require immediate disk write) and are less likely to be used again.

The madvice() POSIX call serves to tell the system how your application will use the pages. But as the name says, it is an advice so that the operating system is better instrumented in taking paging and swapping decisions. It is neither a policy nor an order.

To demonstrate the effects of madvice() on Linux, I modified one of the exercises I give to my students. See the complete source code here. My system is 64-bit and has 2 GB of RAM, which about 50% is in use now. Using the program to mmap a 2 GB file, read it sequentially and discard everything. It reports RSS usage every 200 MB is read. The results without madvice():

<juliano@home> ~% ./madvtest file.dat n
     0 :     3 MB
   200 :   202 MB
   400 :   402 MB
   600 :   602 MB
   800 :   802 MB
  1000 :  1002 MB
  1200 :  1066 MB
  1400 :  1068 MB
  1600 :  1078 MB
  1800 :  1113 MB
  2000 :  1113 MB

Linux kept pushing things out of memory until around 1 GB was read. After that, it started pressuring the process itself (since the other 50% of memory was active by the other processes) and stabilized until the end of the file.

Now, with madvice():

<juliano@home> ~% ./madvtest file.dat y
     0 :     3 MB
   200 :   202 MB
   400 :   402 MB
   600 :   494 MB
   800 :   501 MB
  1000 :   518 MB
  1200 :   530 MB
  1400 :   530 MB
  1600 :   530 MB
  1800 :   595 MB
  2000 :   788 MB

Note that Linux decided to allocate pages to the process only until it reached around 500 MB, much sooner than without madvice(). This is because after that, the pages currently in memory seemed much more valuable than the pages that were marked as sequential access by this process. There is a threshold in the VMM that defines when to start dropping old pages from the proccess.

You may ask why Linux kept allocating pages up to around 500 MB and didn't stop much sooner, since they were marked as sequential access. It is that either the system had enough free memory pages anyways, or the other resident pages were too old to keep around. Between keeping ancient pages in memory that don't seem to be useful anymore, and bringing more pages to serve a program that is running now, Linux chooses the second option.

Even if they were marked as sequential access, it was just an advice. The application may still want to go back to those pages and read them again. Or another application in the system. The madvice() call says only what the application itself is doing, Linux takes in consideration the bigger picture.

回复收藏 0 原文

~没有更多了~