Windows崩溃转储文件的详细内存使用分析？

发布于 2024-10-13 09:57:29 字数 4067 浏览 13 评论 0原文

我们已收到客户发来的本机（完整）故障转储文件。在 Visual Studio (2005) 调试器中打开它显示我们因尝试分配约 10MB 块的 realloc 调用而导致崩溃。转储文件异常大（1.5 GB——通常大约是 500 MB）。

因此，我们得出结论，内存“泄漏”或失控分配要么完全耗尽了进程的内存，要么至少将其碎片化得足以导致重新分配失败。 ^{（请注意，此 realloc 用于分配日志缓冲区的操作，我们对它在这里失败并不感到惊讶，因为除了一些非常大且不可更改的缓冲区之外，一次性 10MB 将是我们所做的较大分配之一 - - 问题本身可能与此特定分配无关。）}

编辑：在下面与 Lex Li 交换意见后，我应该补充一点：这是不可重现的 em> 对我们来说（目前）。这只是一个客户转储，清楚地显示了失控的内存消耗。

主要问题：

现在我们有了一个转储文件，但是我们如何找到导致内存使用过多的原因？

到目前为止我们已经做了什么：

我们使用了 DebugDiag 工具用于分析转储文件（所谓的内存压力分析器）），这就是我们得到的：

Report for DumpFM...dmp

Virtual Memory Summary
----------------------
Size of largest free VM block   62,23 MBytes 
Free memory fragmentation       81,30% 
Free Memory                     332,87 MBytes   (16,25% of Total Memory) 
Reserved Memory                 0 Bytes   (0,00% of Total Memory) 
Committed Memory                1,67 GBytes   (83,75% of Total Memory) 
Total Memory                    2,00 GBytes 
Largest free block at           0x00000000`04bc4000 

Loaded Module Summary
---------------------
Number of Modules       114 Modules 
Total reserved memory   0 Bytes 
Total committed memory  3,33 MBytes 

Thread Summary
--------------
Number of Threads       56 Thread(s) 
Total reserved memory   0 Bytes 
Total committed memory  652,00 KBytes

这只是为了了解一些背景信息。我认为更有趣的是：

Heap Summary
------------
Number of heaps         26 Heaps 
Total reserved memory   1,64 GBytes 
Total committed memory  1,61 GBytes 

Top 10 heaps by reserved memory
-------------------------------
0x01040000           1,55 GBytes        
0x00150000           64,06 MBytes        
0x010d0000           15,31 MBytes        
...

Top 10 heaps by committed memory
--------------------------------                              
0x01040000       1,54 GBytes 
0x00150000       55,17 MBytes 
0x010d0000       6,25 MBytes  
...

现在，查看堆 0x01040000 (1.5 GB)，我们看到：

Heap 5 - 0x01040000 
-------------------
Heap Name          msvcr80!_crtheap 
Heap Description   This heap is used by msvcr80 
Reserved memory      1,55 GBytes 
Committed memory     1,54 GBytes (99,46% of reserved)  
Uncommitted memory   8,61 MBytes (0,54% of reserved)  
Number of heap segments             39 segments 
Number of uncommitted ranges        41 range(s) 
Size of largest uncommitted range   8,33 MBytes 
Calculated heap fragmentation       3,27% 

Segment Information
-------------------
Base Address | Reserved Size   | Committed Size  | Uncommitted Size | Number of uncommitted ranges | Largest uncommitted block | Calculated heap fragmentation 
0x01040640        64,00 KBytes      64,00 KBytes   0 Bytes            0                              0 Bytes                     0,00% 
0x01350000     1.024,00 KBytes   1.024,00 KBytes   0 Bytes            0                              0 Bytes                     0,00% 
0x02850000     2,00 MBytes       2,00 MBytes       0 Bytes            0                              0 Bytes                     0,00% 
...

这个段信息到底是什么？

查看列出的分配：

Top 5 allocations by size
-------------------------
Allocation Size - 336          1,18 GBytes     
Allocation Size - 1120004      121,77 MBytes    
...

Top 5 allocations by count
--------------------------
Allocation Size - 336    3760923 allocation(s) 
Allocation Size - 32     1223794 allocation(s)  
...

我们可以看到，MSVCR80 堆显然拥有 3.760.923 个 336 字节的分配。这清楚地表明，我们用大量小分配清理了内存，但是我们如何获得有关这些分配来自何处的更多信息？

如果我们能够以某种方式对其中一些分配地址进行采样，然后检查这些地址在过程映像中的哪些位置正在使用，那么——假设这些分配的很大一部分是造成我们的“泄漏”的原因——我们也许可以找出在哪里这些失控的分配来自于。

不幸的是，我现在真的不知道如何从转储中获取更多信息。

如何检查此堆以查看一些“336”分配地址？

如何在转储中搜索这些地址以及如何找出哪个指针变量（如果有）在转储中保留这些地址？

有关使用 DebugDiag、WinDbg 或任何其他工具的任何提示都可能真正有帮助！另外，如果您不同意我上面的任何分析，请告诉我们！谢谢！

原文

We have received a native (full) crash dump file from a customer. Opening it in the Visual Studio (2005) debugger shows that we had a crash caused by a realloc call that tried to allocate a ~10MB block. The dump file was unusually large (1,5 GB -- normally they are more like 500 MB).

We therefore conclude that we have a memory "leak" or runaway allocations that either fully exhausted the memory of the process or at least fragmented it significantly enough for the realloc to fail. ^{(Note that this realloc was for an operation that allocated a logging buffer and we are not surprised it failed here, because 10MB in one go would be one of the larger allocations that we do apart from some very large pretty unchangeable buffers -- the problem itself likely has nothing to do with this specific allocation.)}

Edit: After the comments exchange wit Lex Li below, I should add: This is not reproducible for us (at the moment). It's just one customer dump clearly showing runaway memory consumption.

Main Question:

Now we have a dump file, but how can we locate what caused the excessive memory usage?

What we've done so far:

We have used the DebugDiag tool to analyze the dump file (the so called Memory Pressure Analyzer), and here's what we got:

Report for DumpFM...dmp

Virtual Memory Summary
----------------------
Size of largest free VM block   62,23 MBytes 
Free memory fragmentation       81,30% 
Free Memory                     332,87 MBytes   (16,25% of Total Memory) 
Reserved Memory                 0 Bytes   (0,00% of Total Memory) 
Committed Memory                1,67 GBytes   (83,75% of Total Memory) 
Total Memory                    2,00 GBytes 
Largest free block at           0x00000000`04bc4000 

Loaded Module Summary
---------------------
Number of Modules       114 Modules 
Total reserved memory   0 Bytes 
Total committed memory  3,33 MBytes 

Thread Summary
--------------
Number of Threads       56 Thread(s) 
Total reserved memory   0 Bytes 
Total committed memory  652,00 KBytes

This was just to get a bit context. Whats more interesting I believe is:

Heap Summary
------------
Number of heaps         26 Heaps 
Total reserved memory   1,64 GBytes 
Total committed memory  1,61 GBytes 

Top 10 heaps by reserved memory
-------------------------------
0x01040000           1,55 GBytes        
0x00150000           64,06 MBytes        
0x010d0000           15,31 MBytes        
...

Top 10 heaps by committed memory
--------------------------------                              
0x01040000       1,54 GBytes 
0x00150000       55,17 MBytes 
0x010d0000       6,25 MBytes  
...

Now, looking at heap 0x01040000 (1,5 GB) we see:

Heap 5 - 0x01040000 
-------------------
Heap Name          msvcr80!_crtheap 
Heap Description   This heap is used by msvcr80 
Reserved memory      1,55 GBytes 
Committed memory     1,54 GBytes (99,46% of reserved)  
Uncommitted memory   8,61 MBytes (0,54% of reserved)  
Number of heap segments             39 segments 
Number of uncommitted ranges        41 range(s) 
Size of largest uncommitted range   8,33 MBytes 
Calculated heap fragmentation       3,27% 

Segment Information
-------------------
Base Address | Reserved Size   | Committed Size  | Uncommitted Size | Number of uncommitted ranges | Largest uncommitted block | Calculated heap fragmentation 
0x01040640        64,00 KBytes      64,00 KBytes   0 Bytes            0                              0 Bytes                     0,00% 
0x01350000     1.024,00 KBytes   1.024,00 KBytes   0 Bytes            0                              0 Bytes                     0,00% 
0x02850000     2,00 MBytes       2,00 MBytes       0 Bytes            0                              0 Bytes                     0,00% 
...

What is this Segment Information anyway?

Looking at the allocations that are listed:

Top 5 allocations by size
-------------------------
Allocation Size - 336          1,18 GBytes     
Allocation Size - 1120004      121,77 MBytes    
...

Top 5 allocations by count
--------------------------
Allocation Size - 336    3760923 allocation(s) 
Allocation Size - 32     1223794 allocation(s)  
...

We can see that apparently the MSVCR80 heap holds 3.760.923 allocations at 336 bytes. This makes it pretty clear that we mopped up our memory with lots of small allocations, but how can we get some more info regarding where these allocation came from?

If we somehow could sample some of these allocation addresses and then check where in the process image these addresses are in use, then -- assuming that a large portion of these allocations are responsible for our "leak" -- we could maybe find out where these runaway allocations came from.

Unfortunately, I have really no idea how to get more info out of the dump at the moment.

How could I inspect this heap to see some of the "336" allocation addresses?

How can I search the dump for these addresses and how do I then find out which pointer variable (if any) in the dump hold on tho these addresses?

Any tips regarding usage of DebugDiag, WinDbg or any other tool could really help! Also, if you disagree with any of my analysis above, let us know! Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

独﹏钓一江月 2024-10-20 09:57:29

您可以：

查看这些 336 字节的块，看看内容是否告诉您有关分配它们的内容。为此，我通常使用 Windbg。首先运行命令 !heap -stat -h 0x01040000 ，该命令将为您提供块的大小，然后将此大小传递给 !heap -flt s size 将列出该大小的所有块。然后，您可以使用任何显示内存的命令（例如 dc）查看该块。
您无法重现该问题，但您可以查看另一个分配该大小块的转储。首先使用 gflags.exe 实用程序 (gflags -i your.exe +ust) 激活堆栈回溯功能。然后运行您的应用程序，获取转储，并使用 !heap -flt s 列出块。然后命令 !heap -p -a blockaddress 将转储分配该块的函数堆栈。

回复收藏 0 原文

忆沫 2024-10-20 09:57:29

在windbg中，您可以尝试使用!heap -l，它应该抓取堆（需要一段时间，可能有一种方法将搜索限制到特定堆以加快速度）并找到所有未在任何地方引用的繁忙块。从那里打开内存窗口 (alt+5) 并查看一些与您怀疑是泄漏的分配大小相匹配的条目。如果运气好的话，可能会有一些常见的模式可以帮助您识别数据是什么，或者更好的是一些可以立即放置的 ASCII 字符串。

不幸的是，除了尝试在使用 gflags 打开用户模式堆栈跟踪并使用 umdh 拍摄内存快照时尝试重现它之外，我真的不知道任何其他好方法。

回复收藏 0 原文