如何解决 malloc 崩溃问题
我继承了大量遗留代码。到目前为止一直运行良好。突然,在我无法在内部重现的客户试用中,它在 malloc 中崩溃了。我认为我需要添加工具,例如在 malloc 之上我有自己的 malloc,它存储有关每个 malloc 的一些元信息,例如谁进行了 malloc 调用。当它崩溃时,我可以查找元信息并查看发生了什么。几年前我做过类似的事情,但现在想不起来了……我相信人们已经想出了更好的主意。很高兴收到意见。
谢谢
I have a large body of legacy code that I inherited. It has worked fine until now. Suddenly at a customer trial that I cannot reproduce inhouse, it crashes in malloc. I think that I need to add instrumentation e.g on top of malloc I have my own malloc that stores some meta information about each malloc e.g. who has made the malloc call. When it crashes, I can then look up the meta information and see what was happening. I had done something similar years ago but cannot recall it now...I am sure people have come up with better ideas. Will be glad to have inputs.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
内存分配是否损坏?
尝试 valgrind。
Malloc 仍然崩溃。
好的,我必须假设您的意思是
SIGSEGV
(分段错误)在malloc
中触发。这通常是由堆损坏引起的。堆损坏本身不会导致分段错误,通常是由于数组访问超出数组范围而导致的。这通常与调用malloc
的位置相去甚远。malloc
在它返回给您的内存块“前面”存储一个小信息头。该信息通常包含块的大小和指向下一个块的指针。不用说,改变其中任何一个都会导致问题。通常,下一个块指针会更改为无效地址,并且下次调用 malloc 时,它最终会取消引用坏指针和分段错误。或者它不会并开始将随机内存解释为堆的一部分。最终它的运气耗尽了。请注意,如果块被释放或空闲块列表混乱,
free
也会发生同样的事情。如何捕获此类错误完全取决于您如何访问
malloc
返回的内存。单个struct
的malloc
通常不是问题;通常是数组的malloc
引起您的注意。使用负数(-1 或 -2)索引通常会为您提供当前块的块头,而索引超过数组末尾可以为您提供下一个块的头。两者都是有效的内存位置,因此不会出现分段错误。所以首先要尝试的是范围检查。您提到这出现在客户的网站上;也许是因为他们正在使用的数据集更大,或者输入数据已损坏(例如,它说分配 100 个元素,然后初始化 101),或者他们正在以不同的顺序执行操作(这将错误隐藏在你的内部测试),或者做一些你没有测试过的事情。如果没有更多细节,很难说。您应该考虑编写一些内容来检查您的输入数据。
Is memory allocation broken?
Try valgrind.
Malloc is still crashing.
Okay, I'm going to have to assume that you mean
SIGSEGV
(segmentation fault) is firing inmalloc
. This is usually caused by heap corruption. Heap corruption, that itself does not cause a segmentation fault, is usually the result of an array access outside of the array's bounds. This is usually nowhere near the point where you callmalloc
.malloc
stores a small header of information "in front of" the memory block that it returns to you. This information usually contains the size of the block and a pointer to the next block. Needless to say, changing either of these will cause problems. Usually, the next-block pointer is changed to an invalid address, and the next timemalloc
is called, it eventually dereferences the bad pointer and segmentation faults. Or it doesn't and starts interpreting random memory as part of the heap. Eventually its luck runs out.Note that
free
can have the same thing happen, if the block being released or the free block list is messed up.How you catch this kind of error depends entirely on how you access the memory that
malloc
returns. Amalloc
of a singlestruct
usually isn't a problem; it'smalloc
of arrays that usually gets you. Using a negative (-1 or -2) index will usually give you the block header for your current block, and indexing past the array end can give you the header of the next block. Both are valid memory locations, so there will be no segmentation fault.So the first thing to try is range checking. You mention that this appeared at the customer's site; maybe it's because the data set they are working with is much larger, or that the input data is corrupt (e.g. it says to allocate 100 elements and then initializes 101), or they are performing things in a different order (which hides the bug in your in-house testing), or doing something you haven't tested. It's hard to say without more specifics. You should consider writing something to sanity check your input data.
尝试 Asan
AddressSanitizer(又名 ASan)是 C/C++ 的内存错误检测器。它发现:
请找到链接以了解更多信息以及如何使用它
https://github.com /google/sanitizers/wiki/AddressSanitizer 和
https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
Try Asan
AddressSanitizer (aka ASan) is a memory error detector for C/C++. It finds:
Please find the links to know more and how to use it
https://github.com/google/sanitizers/wiki/AddressSanitizer and
https://github.com/google/sanitizers/wiki/AddressSanitizerFlags
我知道这已经很旧了,但是只要我们有指针,这样的问题就会继续存在。尽管 valgrind 是实现此目的的最佳工具,但它的学习曲线很陡,而且结果往往令人难以理解。
假设您正在开发一些 *nux,我可以建议的另一个工具是
electricfence
。引用:Electric Fence 可以帮助您检测两个常见的编程错误:
用法非常简单。只需将您的代码与附加库
lefence
链接即可当您运行应用程序时,当内存损坏时将生成一个核心文件,而不是当使用损坏的内存时。
I know this is old, but issues like this will continue to exist as long as we have pointers. Although
valgrind
is the best tool for this purpose, it has a steep learning curve and often the results are too intimidating to understand.Assuming you are working on some *nux, another tool I can suggest is
electricfence
. Quote:Electric Fence helps you detect two common programming bugs:
Usage is amazingly simple. Just link your code with an additional library
lefence
When you run the application, a corefile will be generated when memory is corrupted, instead of when corrupted memory is used.