为什么 realloc 会消耗大量内存?

发布于 2024-10-03 08:22:36 字数 3255 浏览 0 评论 0原文

由于源代码,这个问题有点长,我试图尽可能简化。请耐心等待,感谢您的阅读。

我有一个应用程序,它的循环可能会运行数百万次。我不想在该循环中进行数千到数百万次 malloc/free 调用,而是希望预先执行一次 malloc,然后进行数千次调用数百万次 realloc 调用。

但当我使用 realloc 时,我遇到了一个问题,即我的应用程序消耗了几 GB 内存并自行终止。如果我使用malloc,我的内存使用情况就很好。

如果我使用 valgrind 的 memtest 在较小的测试数据集上运行,它会报告 mallocrealloc 没有内存泄漏。

我已经验证我将每个 malloc-ed(然后是 realloc-ed)对象与相应的 free 相匹配。

所以,从理论上讲,我没有泄漏内存,只是使用 realloc 似乎消耗了我所有的可用 RAM,我想知道为什么以及我可以采取什么措施来解决这个问题。

我最初拥有的是这样的东西,它使用 malloc 并正常工作:

Malloc code

void A () {
    do {
        B();
    } while (someConditionThatIsTrueForMillionInstances);
}

void B () {
    char *firstString = NULL;
    char *secondString = NULL;
    char *someOtherString;

    /* populate someOtherString with data from stream, for example */

    C((const char *)someOtherString, &firstString, &secondString);

    fprintf(stderr, "first: [%s] | second: [%s]\n", firstString, secondString);

    if (firstString)
        free(firstString);
    if (secondString)
        free(secondString);
}

void C (const char *someOtherString, char **firstString, char **secondString) {
    char firstBuffer[BUFLENGTH];
    char secondBuffer[BUFLENGTH];

    /* populate buffers with some data from tokenizing someOtherString in a special way */

    *firstString = malloc(strlen(firstBuffer)+1);
    strncpy(*firstString, firstBuffer, strlen(firstBuffer)+1);

    *secondString = malloc(strlen(secondBuffer)+1);
    strncpy(*secondString, secondBuffer, strlen(secondBuffer)+1);
}

这工作正常。但我想要更快的东西。

现在我测试一个 realloc 安排,其中 malloc-s 仅一次:

Realloc code

void A () {
    char *firstString = NULL;
    char *secondString = NULL;

    do {
        B(&firstString, &secondString);
    } while (someConditionThatIsTrueForMillionInstances);

    if (firstString)
        free(firstString);
    if (secondString)
        free(secondString);
}

void B (char **firstString, char **secondString) {
    char *someOtherString;

    /* populate someOtherString with data from stream, for example */

    C((const char *)someOtherString, &(*firstString), &(*secondString));

    fprintf(stderr, "first: [%s] | second: [%s]\n", *firstString, *secondString);
}

void C (const char *someOtherString, char **firstString, char **secondString) {
    char firstBuffer[BUFLENGTH];
    char secondBuffer[BUFLENGTH];

    /* populate buffers with some data from tokenizing someOtherString in a special way */

    /* realloc should act as malloc on first pass through */

    *firstString = realloc(*firstString, strlen(firstBuffer)+1);
    strncpy(*firstString, firstBuffer, strlen(firstBuffer)+1);

    *secondString = realloc(*secondString, strlen(secondBuffer)+1);
    strncpy(*secondString, secondBuffer, strlen(secondBuffer)+1);
}

如果我查看 free -m 的输出当我使用导致百万循环条件的大型数据集运行基于 realloc 的测试时,在命令行上运行此测试时,我的内存从 4 GB 降至 0,并且应用程序崩溃。

使用 realloc 导致此问题时,我缺少什么?抱歉,如果这是一个愚蠢的问题,并提前感谢您的建议。

This question is a bit long due the source code, which I tried to simplify as much as possible. Please bear with me and thanks for reading along.

I have an application with a loop that runs potentially millions of times. Instead of several thousands to millions of malloc/free calls within that loop, I would like to do one malloc up front and then several thousands to millions of realloc calls.

But I'm running into a problem where my application consumes several GB of memory and kills itself, when I am using realloc. If I use malloc, my memory usage is fine.

If I run on smaller test data sets with valgrind's memtest, it reports no memory leaks with either malloc or realloc.

I have verified that I am matching every malloc-ed (and then realloc-ed) object with a corresponding free.

So, in theory, I am not leaking memory, it is just that using realloc seems to consume all of my available RAM, and I'd like to know why and what I can do to fix this.

What I have initially is something like this, which uses malloc and works properly:

Malloc code

void A () {
    do {
        B();
    } while (someConditionThatIsTrueForMillionInstances);
}

void B () {
    char *firstString = NULL;
    char *secondString = NULL;
    char *someOtherString;

    /* populate someOtherString with data from stream, for example */

    C((const char *)someOtherString, &firstString, &secondString);

    fprintf(stderr, "first: [%s] | second: [%s]\n", firstString, secondString);

    if (firstString)
        free(firstString);
    if (secondString)
        free(secondString);
}

void C (const char *someOtherString, char **firstString, char **secondString) {
    char firstBuffer[BUFLENGTH];
    char secondBuffer[BUFLENGTH];

    /* populate buffers with some data from tokenizing someOtherString in a special way */

    *firstString = malloc(strlen(firstBuffer)+1);
    strncpy(*firstString, firstBuffer, strlen(firstBuffer)+1);

    *secondString = malloc(strlen(secondBuffer)+1);
    strncpy(*secondString, secondBuffer, strlen(secondBuffer)+1);
}

This works fine. But I want something faster.

Now I test a realloc arrangement, which malloc-s only once:

Realloc code

void A () {
    char *firstString = NULL;
    char *secondString = NULL;

    do {
        B(&firstString, &secondString);
    } while (someConditionThatIsTrueForMillionInstances);

    if (firstString)
        free(firstString);
    if (secondString)
        free(secondString);
}

void B (char **firstString, char **secondString) {
    char *someOtherString;

    /* populate someOtherString with data from stream, for example */

    C((const char *)someOtherString, &(*firstString), &(*secondString));

    fprintf(stderr, "first: [%s] | second: [%s]\n", *firstString, *secondString);
}

void C (const char *someOtherString, char **firstString, char **secondString) {
    char firstBuffer[BUFLENGTH];
    char secondBuffer[BUFLENGTH];

    /* populate buffers with some data from tokenizing someOtherString in a special way */

    /* realloc should act as malloc on first pass through */

    *firstString = realloc(*firstString, strlen(firstBuffer)+1);
    strncpy(*firstString, firstBuffer, strlen(firstBuffer)+1);

    *secondString = realloc(*secondString, strlen(secondBuffer)+1);
    strncpy(*secondString, secondBuffer, strlen(secondBuffer)+1);
}

If I look at the output of free -m on the command-line while I run this realloc-based test with a large data set that causes the million-loop condition, my memory goes from 4 GB down to 0 and the app crashes.

What am I missing about using realloc that is causing this? Sorry if this is a dumb question, and thanks in advance for your advice.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

财迷小姐 2024-10-10 08:22:36

如果调整大小操作无法就地完成,realloc 必须将旧缓冲区的内容复制到新缓冲区。如果您不需要保留原始内存,malloc/free 对可能比 realloc 更好。

这就是为什么 realloc 可能暂时比 malloc/free 对需要更多内存。您还通过不断交错 realloc 来鼓励碎片化。即,您基本上在做:

malloc(A);
malloc(B);

while (...)
{
    malloc(A_temp);
    free(A);
    A= A_temp;
    malloc(B_temp);
    free(B);
    B= B_temp;
}

而原始代码是:

while (...)
{
    malloc(A);
    malloc(B);
    free(A);
    free(B);
}

在第二个循环的每个结束时,您已经清理了您使用的所有内存;与交错内存分配而不完全释放所有内存相比,这更有可能将全局内存堆返回到干净状态。

realloc has to copy the contents from the old buffer to the new buffer if the resizing operation cannot be done in place. A malloc/free pair can be better than a realloc if you don't need to keep around the original memory.

That's why realloc can temporarily require more memory than a malloc/free pair. You are also encouraging fragmentation by continuously interleaving reallocs. I.e., you are basically doing:

malloc(A);
malloc(B);

while (...)
{
    malloc(A_temp);
    free(A);
    A= A_temp;
    malloc(B_temp);
    free(B);
    B= B_temp;
}

Whereas the original code does:

while (...)
{
    malloc(A);
    malloc(B);
    free(A);
    free(B);
}

At the end of each of the second loop you have cleaned up all the memory you used; that's more likely to return the global memory heap to a clean state than by interleaving memory allocations without completely freeing all of them.

长安忆 2024-10-10 08:22:36

当您不想保留内存块的现有内容时,使用realloc是一个非常非常糟糕的主意。如果不出意外,您将浪费大量时间来复制要覆盖的数据。实际上,按照您使用它的方式,调整大小的块将不适合旧空间,因此它们位于堆上越来越高的地址,导致堆增长得可笑。

内存管理并不容易。糟糕的分配策略会导致碎片、糟糕的性能等。您能做的最好的事情就是避免引入超出绝对必要的限制(例如在不需要时使用realloc),释放尽可能多的内存当您完成它时,可以在一次分配中将大块关联数据一起分配,而不是分成小块。

Using realloc when you don't want to preserve the existing contents of the memory block is a very very bad idea. If nothing else, you'll waste lots of time duplicating data you're about to overwrite. In practice, the way you're using it, the resized blocks will not fit in the old space, so they get located at progressively higher and higher addresses on the heap, causing the heap to grow ridiculously.

Memory management is not easy. Bad allocation strategies lead to fragmentation, atrocious performance, etc. The best you can do is avoid introducing any more constraints than you absolutely have to (like using realloc when it's not needed), free as much memory as possible when you're done with it, and allocate large blocks of associated data together in a single allocation rather than in small pieces.

梦在深巷 2024-10-10 08:22:36

您期望 &(*firstString)firstString 相同,但实际上它是将参数的地址传递给函数而不是传递地址A 中的指针。因此,每次调用时,您都会复制 NULL,重新分配新内存,丢失指向新内存的指针,然后重复。您可以通过查看 A 末尾原始指针仍然为空来轻松验证这一点。

编辑:嗯,这是一个很棒的理论,但我对可供我测试的编译器似乎是错误的。

You are expecting &(*firstString) to be the same as firstString, but in fact it is taking the address of the argument to your function rather than passing through the address of the pointers in A. Thus every time you call you make a copy of NULL, realloc new memory, lose the pointer to the new memory, and repeat. You can easily verify this by seeing that at the end of A the original pointers are still null.

EDIT: Well, it's an awesome theory, but I seem to be wrong on the compilers I have available to me to test.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文