为什么我会遇到 C malloc 断言失败？

发布于 2024-09-04 11:05:21 字数 1226 浏览 13 评论 0原文

我正在实现一个分而治之的多项式算法，这样我就可以根据 OpenCL 实现对其进行基准测试，但我无法让 malloc 工作。当我运行该程序时，它会分配一堆内容，检查一些内容，然后将 size/2 发送给算法。然后，当我再次点击 malloc 行时，它会输出以下内容：

malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
Aborted

The line in question is:

int *mult(int size, int *a, int *b) {
    int *out,i, j, *tmp1, *tmp2, *tmp3, *tmpa1, *tmpa2, *tmpb1, *tmpb2,d, *res1, *res2;
    fprintf(stdout, "size: %d\n", size);

    out = (int *)malloc(sizeof(int) * size * 2);
}

I Checked size with a fprintf, and it is a Positive Integer (usually 50 at那个点）。我也尝试使用普通数字调用 malloc ，但仍然收到错误。我只是对正在发生的事情感到困惑，到目前为止我发现谷歌没有任何帮助。

有什么想法吗？我试图弄清楚如何编译较新的 GCC，以防出现编译器错误，但我真的很怀疑。

原文

I am implementing a divide and conquer polynomial algorithm so I can benchmark it against an OpenCL implementation, but I can't get malloc to work. When I run the program, it allocates a bunch of stuff, checks some things, then sends the size/2 to the algorithm. Then when I hit the malloc line again it spits out this:

malloc.c:3096: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
Aborted

The line in question is:

int *mult(int size, int *a, int *b) {
    int *out,i, j, *tmp1, *tmp2, *tmp3, *tmpa1, *tmpa2, *tmpb1, *tmpb2,d, *res1, *res2;
    fprintf(stdout, "size: %d\n", size);

    out = (int *)malloc(sizeof(int) * size * 2);
}

I checked size with a fprintf, and it is a positive integer (usually 50 at that point). I tried calling malloc with a plain number as well and I still get the error. I'm just stumped at what's going on, and nothing from Google I have found so far is helpful.

Any ideas what's going on? I'm trying to figure out how to compile a newer GCC in case it's a compiler error, but I really doubt it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

奶茶白久 2024-09-11 11:05:21

99.9% 的可能性是您已损坏内存（缓冲区溢出或不足、释放后写入指针、在同一指针上调用两次 free 等）。

在 Valgrind 查看您的程序在哪里做错了。

回复收藏 0 原文

鯉魚旗 2024-09-11 11:05:21

为了让您更好地理解为什么会发生这种情况，我想稍微扩展一下 @r-samuel-klatchko 的答案。

当您调用malloc时，真正发生的事情比仅仅给您一块内存来使用要复杂一些。在幕后，malloc 还保留一些有关它为您提供的内存的内务信息（最重要的是它的大小），以便当您调用 free 时，它知道诸如要释放多少内存。此信息通常保存在 malloc 返回给您的内存位置之前。可以在互联网™上找到更详尽的信息，但是（非常）基本的想法是这样的：

+------+-------------------------------------------------+
+ size |                  malloc'd memory                +
+------+-------------------------------------------------+
       ^-- location in pointer returned by malloc

在此基础上（并大大简化了事情），当您调用 malloc 时，它需要获取指向可用内存的下一部分的指针。执行此操作的一种非常简单的方法是查看它放弃的前一位内存，并在内存中向下（或向上）移动 size 字节。通过此实现，在分配 p1、p2 和 p3 后，您的内存最终会看起来像这样：

+------+----------------+------+--------------------+------+----------+
+ size |                | size |                    | size |          +
+------+----------------+------+--------------------+------+----------+
       ^- p1                   ^- p2                       ^- p3

那么，是什么导致了您的错误？

好吧，想象一下您的代码错误地写入了您分配的内存量（要么是因为您分配的内存量少于您所需要的内存量，要么是因为您在代码中的某处使用了错误的边界条件）。假设您的代码向 p2 写入了太多数据，以至于它开始覆盖 p3 的 size 字段中的内容。当您下次调用malloc时，它将查看它返回的最后一个内存位置，查看其大小字段，移动到p3 + size，然后从那里开始分配内存。然而，由于您的代码已覆盖 size，因此该内存位置不再位于先前分配的内存之后。

不用说，这会造成严重破坏！因此，malloc 的实现者放入了许多“断言”或检查，尝试进行一系列健全性检查，以在即将发生的情况（以及其他问题）时捕获它们。在您的特定情况下，这些断言被违反，因此 malloc 中止，告诉您您的代码将要做一些它实际上不应该做的事情。

如前所述，这过于简单化了，但足以说明这一点。 malloc 的 glibc 实现超过 5k 行，并且已经对如何构建良好的动态内存分配机制进行了大量研究，因此不可能在 SO 答案中涵盖所有内容。希望这能让您对问题的真正原因有所了解！

To give you a better understanding of why this happens, I'd like to expand upon @r-samuel-klatchko's answer a bit.

When you call malloc, what is really happening is a bit more complicated than just giving you a chunk of memory to play with. Under the hood, malloc also keeps some housekeeping information about the memory it has given you (most importantly, its size), so that when you call free, it knows things like how much memory to free. This information is commonly kept right before the memory location returned to you by malloc. More exhaustive information can be found on the internet™, but the (very) basic idea is something like this:

+------+-------------------------------------------------+
+ size |                  malloc'd memory                +
+------+-------------------------------------------------+
       ^-- location in pointer returned by malloc

Building on this (and simplifying things greatly), when you call malloc, it needs to get a pointer to the next part of memory that is available. One very simple way of doing this is to look at the previous bit of memory it gave away, and move size bytes further down (or up) in memory. With this implementation, you end up with your memory looking something like this after allocating p1, p2 and p3:

+------+----------------+------+--------------------+------+----------+
+ size |                | size |                    | size |          +
+------+----------------+------+--------------------+------+----------+
       ^- p1                   ^- p2                       ^- p3

So, what is causing your error?

Well, imagine that your code erroneously writes past the amount of memory you've allocated (either because you allocated less than you needed as was your problem or because you're using the wrong boundary conditions somewhere in your code). Say your code writes so much data to p2 that it starts overwriting what is in p3's size field. When you now next call malloc, it will look at the last memory location it returned, look at its size field, move to p3 + size and then start allocating memory from there. Since your code has overwritten size, however, this memory location is no longer after the previously allocated memory.

Needless to say, this can wreck havoc! The implementors of malloc have therefore put in a number of "assertions", or checks, that try to do a bunch of sanity checking to catch this (and other issues) if they are about to happen. In your particular case, these assertions are violated, and thus malloc aborts, telling you that your code was about to do something it really shouldn't be doing.

As previously stated, this is a gross oversimplification, but it is sufficient to illustrate the point. The glibc implementation of malloc is more than 5k lines, and there have been substantial amounts of research into how to build good dynamic memory allocation mechanisms, so covering it all in a SO answer is not possible. Hopefully this has given you a bit of a view of what is really causing the problem though!

回复收藏 0 原文

合久必婚 2024-09-11 11:05:21

我使用 Valgrind 的替代解决方案：

我很高兴，因为我刚刚帮助我的朋友调试了一个程序。他的程序有同样的问题（malloc() 导致中止），GDB 也有同样的错误消息。

我使用 Address Sanitizer 编译了他的程序，

gcc -Wall -g3 -fsanitize=address -o new new.c
              ^^^^^^^^^^^^^^^^^^

然后运行 gdb new。当程序因后续 malloc() 导致的 SIGABRT 终止时，会打印大量有用信息：

=================================================================
==407==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060000000b4 at pc 0x7ffffe49ed1a bp 0x7ffffffedc20 sp 0x7ffffffed3c8
WRITE of size 104 at 0x6060000000b4 thread T0
    #0 0x7ffffe49ed19  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5ed19)
    #1 0x8001dab in CreatHT2 /home/wsl/Desktop/hash/new.c:59
    #2 0x80031cf in main /home/wsl/Desktop/hash/new.c:209
    #3 0x7ffffe061b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #4 0x8001679 in _start (/mnt/d/Desktop/hash/new+0x1679)

0x6060000000b4 is located 0 bytes to the right of 52-byte region [0x606000000080,0x6060000000b4)
allocated by thread T0 here:
    #0 0x7ffffe51eb50 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb50)
    #1 0x8001d56 in CreatHT2 /home/wsl/Desktop/hash/new.c:55
    #2 0x80031cf in main /home/wsl/Desktop/hash/new.c:209
    #3 0x7ffffe061b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)

让我们看一下输出，尤其是堆栈跟踪：

第一部分表示 new.c:59 处存在无效的写入操作。该行的

memset(len,0,sizeof(int*)*p);
             ^^^^^^^^^^^^

第二部分表示发生错误写入的内存是在 new.c:55 创建的。该行写着“

if(!(len=(int*)malloc(sizeof(int)*p))){
                      ^^^^^^^^^^^

就是这样”。我只花了不到半分钟就找到了让我的朋友困惑了几个小时的错误。他设法找到了故障所在，但失败的是后续的 malloc() 调用，而无法在之前的代码中发现此错误。

总结：尝试GCC或Clang的-fsanitize=address。在调试内存问题时它非常有帮助。

My alternative solution to using Valgrind:

I'm very happy because I just helped my friend debug a program. His program had this exact problem (malloc() causing abort), with the same error message from GDB.

I compiled his program using Address Sanitizer with

gcc -Wall -g3 -fsanitize=address -o new new.c
              ^^^^^^^^^^^^^^^^^^

And then ran gdb new. When the program gets terminated by SIGABRT caused in a subsequent malloc(), a whole lot of useful information is printed:

=================================================================
==407==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6060000000b4 at pc 0x7ffffe49ed1a bp 0x7ffffffedc20 sp 0x7ffffffed3c8
WRITE of size 104 at 0x6060000000b4 thread T0
    #0 0x7ffffe49ed19  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5ed19)
    #1 0x8001dab in CreatHT2 /home/wsl/Desktop/hash/new.c:59
    #2 0x80031cf in main /home/wsl/Desktop/hash/new.c:209
    #3 0x7ffffe061b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #4 0x8001679 in _start (/mnt/d/Desktop/hash/new+0x1679)

0x6060000000b4 is located 0 bytes to the right of 52-byte region [0x606000000080,0x6060000000b4)
allocated by thread T0 here:
    #0 0x7ffffe51eb50 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb50)
    #1 0x8001d56 in CreatHT2 /home/wsl/Desktop/hash/new.c:55
    #2 0x80031cf in main /home/wsl/Desktop/hash/new.c:209
    #3 0x7ffffe061b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)

Let's take a look at the output, especially the stack trace:

The first part says there's a invalid write operation at new.c:59. That line reads

memset(len,0,sizeof(int*)*p);
             ^^^^^^^^^^^^

The second part says the memory that the bad write happened on is created at new.c:55. That line reads

if(!(len=(int*)malloc(sizeof(int)*p))){
                      ^^^^^^^^^^^

That's it. It only took me less than half a minute to locate the bug that confused my friend for a few hours. He managed to locate the failure, but it's a subsequent malloc() call that failed, without being able to spot this error in previous code.

Sum up: Try the -fsanitize=address of GCC or Clang. It can be very helpful when debugging memory issues.

回复收藏 0 原文

标点 2024-09-11 11:05:21

您可能在某个地方超出了分配的内存。
然后底层软件不会接收到它，直到您调用 malloc

可能有一个被 malloc 捕获的保护值被破坏。

编辑...添加了此边界检查帮助

http:// /www.lrde.epita.fr/~akim/ccmp/doc/bounds-checking.html

回复收藏 0 原文

绅刃 2024-09-11 11:05:21

我收到以下消息，与您的消息类似：

    program: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.

使用 malloc 时，之前调用某些方法时犯了错误。在向 unsigned char 数组添加字段时更新 sizeof() 运算符之后的因子时，错误地用“+”覆盖了乘法符号“*”。

以下是导致我的情况发生错误的代码：

    UCHAR* b=(UCHAR*)malloc(sizeof(UCHAR)+5);
    b[INTBITS]=(some calculation);
    b[BUFSPC]=(some calculation);
    b[BUFOVR]=(some calculation);
    b[BUFMEM]=(some calculation);
    b[MATCHBITS]=(some calculation);

在稍后的另一种方法中，我再次使用 malloc 并产生了上面显示的错误消息。调用是（足够简单）：

    UCHAR* b=(UCHAR*)malloc(sizeof(UCHAR)*50);

考虑在第一次调用时使用“+”号，这会导致计算错误以及之后立即初始化数组（覆盖未分配给数组的内存），给malloc的内存映射带来了一些混乱。因此第二次调用出错了。

I got the following message, similar to your one:

    program: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.

Made a mistake some method call before, when using malloc. Erroneously overwrote the multiplication sign '*' with a '+', when updating the factor after sizeof()-operator on adding a field to unsigned char array.

Here is the code responsible for the error in my case:

    UCHAR* b=(UCHAR*)malloc(sizeof(UCHAR)+5);
    b[INTBITS]=(some calculation);
    b[BUFSPC]=(some calculation);
    b[BUFOVR]=(some calculation);
    b[BUFMEM]=(some calculation);
    b[MATCHBITS]=(some calculation);

In another method later, I used malloc again and it produced the error message shown above. The call was (simple enough):

    UCHAR* b=(UCHAR*)malloc(sizeof(UCHAR)*50);

Think using the '+'-sign on the 1st call, which lead to mis-calculus in combination with immediate initialization of the array after (overwriting memory that was not allocated to the array), brought some confusion to malloc's memory map. Therefore the 2nd call went wrong.

回复收藏 0 原文