使用GetRusage测量内存使用:“重置” ru_maxrss?

发布于 2025-02-11 20:53:26 字数 3358 浏览 0 评论 0原文

在我的数值物理项目中,我想比较解决相同问题的不同方法的记忆使用情况。 我发现我可以包括&lt; sys/resource.h&gt;并使用getRusage()以在ru_maxrss < /code>(有了一些我认为我需要关心的警告)。

对于基准测试,我本质上是为我实施的所有不同方法运行类似的代码块:

int minN = 6;
int maxN = 16;
std::chrono::steady_clock::time_point start;
std::chrono::steady_clock::time_point finish;

std::cout << "Naive:" << std::endl;
for (int N = minN; N <= maxN; N+=2) {
    struct rusage usage{};
    start = std::chrono::steady_clock::now(); 
    //do work...
    finish = std::chrono::steady_clock::now();
    int ret = getrusage(RUSAGE_SELF, &usage);

    long time_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
    long max_ram_byte = usage.ru_maxrss;
    std::cout << "N = " << N << ", time = " << time_ns/1e9 << " s, ram = " << max_ram_byte << " KB" << std::endl;
}

现在,问题是ru_maxrss包含程序整个生命周期中使用的最大内存量,即如果一个“大”对象不在范围内,则不会减少。因此,整个程序的输出将看起来像这样:

Naive:
N = 6, time = 0.022541 s, ram = 8028 KB
N = 8, time = 0.0234674 s, ram = 65360 KB
N = 10, time = 0.373676 s, ram = 135284 KB
N = 12, time = 21.7536 s, ram = 631792 KB
Magnetization:
N = 6, time = 0.000166585 s, ram = 631792 KB
N = 8, time = 0.00158378 s, ram = 631792 KB
N = 10, time = 0.022255 s, ram = 631792 KB
N = 12, time = 0.405172 s, ram = 631792 KB
Momentum:
N = 6, time = 0.000175482 s, ram = 631792 KB
N = 8, time = 0.000766058 s, ram = 631792 KB
N = 10, time = 0.00658272 s, ram = 631792 KB
N = 12, time = 0.0728279 s, ram = 631792 KB
Parity:
N = 8, time = 0.000986243 s, ram = 631792 KB
N = 12, time = 0.0528302 s, ram = 631792 KB
Spin Inversion:
N = 8, time = 0.00111167 s, ram = 631792 KB
N = 12, time = 0.050363 s, ram = 631792 KB

一旦内存使用达到顶峰,报告的基准测试的内存使用量将毫无用处。我意识到,原则上,这就是getRusage()应该有效的方式。有没有办法重置此指标?还是可以建议另一种简单的方法来测量程序内部不涉及使用特定基准测试库的内存使用情况?

问候

PS:有人知道还是在哪种情况下ru_maxrss在B或KB中?对于n = 8,我存储一个带有65536 double元素的矩阵。该矩阵应主导内存使用情况,我希望它会占用大约65536字节的内存。我的基准报告报告说我使用65360 kb,作为 documentation getRusage()说结果在KB中。这非常接近我期望的估计字节数。那么,KB的结果真的是一个巧合吗?

更新: 我得到了我想要的解析/proc/proc/self/stat,我将在下面分享我的更新代码,以防将来有人找到这个。请注意,rssstat的第24个条目在页面中,因此必须将其乘以4096,以获取B中使用的RAM的近似值。

std::cout << "Naive:" << std::endl;
for (int N = minN; N <= maxN; N+=2) {
    start = std::chrono::steady_clock::now();
    // do work...
    finish = std::chrono::steady_clock::now();
    std::ifstream statFile("/proc/self/stat");
    std::string statLine;
    std::getline(statFile, statLine);
    std::istringstream iss(statLine);
    std::string entry;
    long long memUsage;
    for (int i = 1; i <= 24; i++) {
        std::getline(iss, entry, ' ');
        if (i == 24) {
            memUsage = stoi(entry);
        }
    }

    long time_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
    std::cout << "N = " << N << ", time = " << time_ns/1e9 << " s, ram = " << 4096*memUsage/1e9 << " GB" << std::endl;
}

In a numerical physics project of mine, I'd like to compare memory usage of different methods for solving the same problem.
I've found out that I can include <sys/resource.h> and use getrusage() to get the maximum amount of used memory in ru_maxrss (with some caveats that I don't think I need to care about).

For benchmarking, I essentially run code blocks like these for all the different methods I've implemented:

int minN = 6;
int maxN = 16;
std::chrono::steady_clock::time_point start;
std::chrono::steady_clock::time_point finish;

std::cout << "Naive:" << std::endl;
for (int N = minN; N <= maxN; N+=2) {
    struct rusage usage{};
    start = std::chrono::steady_clock::now(); 
    //do work...
    finish = std::chrono::steady_clock::now();
    int ret = getrusage(RUSAGE_SELF, &usage);

    long time_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
    long max_ram_byte = usage.ru_maxrss;
    std::cout << "N = " << N << ", time = " << time_ns/1e9 << " s, ram = " << max_ram_byte << " KB" << std::endl;
}

Now, the problem is that ru_maxrss contains the maximum amount of used memory for the whole lifetime of the program, i.e. it is not reduced if a "large" object goes out of scope. Thus, the output of the whole program will look something like this:

Naive:
N = 6, time = 0.022541 s, ram = 8028 KB
N = 8, time = 0.0234674 s, ram = 65360 KB
N = 10, time = 0.373676 s, ram = 135284 KB
N = 12, time = 21.7536 s, ram = 631792 KB
Magnetization:
N = 6, time = 0.000166585 s, ram = 631792 KB
N = 8, time = 0.00158378 s, ram = 631792 KB
N = 10, time = 0.022255 s, ram = 631792 KB
N = 12, time = 0.405172 s, ram = 631792 KB
Momentum:
N = 6, time = 0.000175482 s, ram = 631792 KB
N = 8, time = 0.000766058 s, ram = 631792 KB
N = 10, time = 0.00658272 s, ram = 631792 KB
N = 12, time = 0.0728279 s, ram = 631792 KB
Parity:
N = 8, time = 0.000986243 s, ram = 631792 KB
N = 12, time = 0.0528302 s, ram = 631792 KB
Spin Inversion:
N = 8, time = 0.00111167 s, ram = 631792 KB
N = 12, time = 0.050363 s, ram = 631792 KB

Once memory usage has peaked, the reported memory usage of my benchmark is useless. I realize that, in principle, this is how getrusage() is supposed to work. Is there a way to reset this metric? Or can anyone recommend another easy way to measure memory usage from inside the program that does not involve using specific benchmarking libraries?

Regards

PS: Does anyone know whether or in which cases ru_maxrss is in B or KB? For N = 8, I store a matrix with 65536 double elements. This matrix should dominate memory usage and I'd expect it to take up about 65536 Bytes of memory. My benchmark reports that I use 65360 KB, as the documentation of getrusage() says the result is in KB. This is eerily close to the estimated number of Bytes I was expecting. So is the result really in KB and this is purely a coincidence?

Update:
I got what I wanted working parsing /proc/self/stat, I'll share my updated code below in case anyone finds this in the future. Note that rss, the 24th entry of stat is in pages, so one must multiply it by 4096 to get an approximation of the used amount of RAM in B.

std::cout << "Naive:" << std::endl;
for (int N = minN; N <= maxN; N+=2) {
    start = std::chrono::steady_clock::now();
    // do work...
    finish = std::chrono::steady_clock::now();
    std::ifstream statFile("/proc/self/stat");
    std::string statLine;
    std::getline(statFile, statLine);
    std::istringstream iss(statLine);
    std::string entry;
    long long memUsage;
    for (int i = 1; i <= 24; i++) {
        std::getline(iss, entry, ' ');
        if (i == 24) {
            memUsage = stoi(entry);
        }
    }

    long time_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(finish - start).count();
    std::cout << "N = " << N << ", time = " << time_ns/1e9 << " s, ram = " << 4096*memUsage/1e9 << " GB" << std::endl;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文