如何分析<未分类>Windbg 中的内存使用情况未分类>
这是一个在 x64 计算机上运行的 .NET v4 Windows 服务应用程序。在稳定运行几天后的某个时刻,Windows 服务内存消耗疯狂增加,直到崩溃。我能够以 1.2 GB 捕获它并捕获内存转储。这是我得到的结果
如果我在windbg中对我的转储文件运行!address -summary,我得到以下结果
!address -summary
--- Usage Summary ------ RgnCount ------- Total Size -------- %ofBusy %ofTotal
Free 821 7ff`7e834000 ( 7.998 Tb) 99.98%
<unclassified> 3696 0`6eece000 ( 1.733 Gb) 85.67% 0.02%
Image 1851 0`0ea6f000 ( 234.434 Mb) 11.32% 0.00%
Stack 1881 0`03968000 ( 57.406 Mb) 2.77% 0.00%
TEB 628 0`004e8000 ( 4.906 Mb) 0.24% 0.00%
NlsTables 1 0`00023000 ( 140.000 kb) 0.01% 0.00%
ActivationContextData 3 0`00006000 ( 24.000 kb) 0.00% 0.00%
CsrSharedMemory 1 0`00005000 ( 20.000 kb) 0.00% 0.00%
PEB 1 0`00001000 ( 4.000 kb) 0.00% 0.00%
-
-
-
--- Type Summary (for busy) -- RgnCount ----- Total Size ----- %ofBusy %ofTotal
MEM_PRIVATE 5837 0`7115a000 ( 1.767 Gb) 87.34% 0.02%
MEM_IMAGE 2185 0`0f131000 (241.191 Mb) 11.64% 0.00%
MEM_MAPPED 40 0`01531000 ( 21.191 Mb) 1.02% 0.00%
-
-
--- State Summary ------------ RgnCount ------ Total Size ---- %ofBusy %ofTotal
MEM_FREE 821 7ff`7e834000 ( 7.998 Tb) 99.98%
MEM_COMMIT 6127 0`4fd5e000 ( 1.247 Gb) 61.66% 0.02%
MEM_RESERVE 1935 0`31a5e000 (794.367 Mb) 38.34% 0.01%
-
-
--Protect Summary(for commit)- RgnCount ------ Total Size --- %ofBusy %ofTotal
PAGE_READWRITE 3412 0`3e862000 (1000.383 Mb) 48.29% 0.01%
PAGE_EXECUTE_READ 220 0`0b12f000 ( 177.184 Mb) 8.55% 0.00%
PAGE_READONLY 646 0`02fd0000 ( 47.813 Mb) 2.31% 0.00%
PAGE_WRITECOPY 410 0`01781000 ( 23.504 Mb) 1.13% 0.00%
PAGE_READWRITE|PAGE_GUARD 1224 0`012f2000 ( 18.945 Mb) 0.91% 0.00%
PAGE_EXECUTE_READWRITE 144 0`007b9000 ( 7.723 Mb) 0.37% 0.00%
PAGE_EXECUTE_WRITECOPY 70 0`001cd000 ( 1.801 Mb) 0.09% 0.00%
PAGE_EXECUTE 1 0`00004000 ( 16.000 kb) 0.00% 0.00%
-
-
--- Largest Region by Usage ----Base Address -------- Region Size ----------
Free 0`8fff0000 7fe`59050000 ( 7.994 Tb)
<unclassified> 0`80d92000 0`0f25e000 ( 242.367 Mb)
Image fe`f6255000 0`0125a000 ( 18.352 Mb)
Stack 0`014d0000 0`000fc000 (1008.000 kb)
TEB 0`7ffde000 0`00002000 ( 8.000 kb)
NlsTables 7ff`fffb0000 0`00023000 ( 140.000 kb)
ActivationContextData 0`00030000 0`00004000 ( 16.000 kb)
CsrSharedMemory 0`7efe0000 0`00005000 ( 20.000 kb)
PEB 7ff`fffdd000 0`00001000 ( 4.000 kb)
首先,为什么未分类的文件一次显示为1.73 GB,另一次显示为242 MB。 (这个问题已经得到解答。谢谢)
其次,我知道未分类可能意味着托管代码,但是根据 !eeheap,我的堆大小只有 248 MB,实际上与 242 匹配,但甚至不接近1.73GB。转储文件大小为 1.2 GB,比正常情况大得多。我从这里到哪里去找出什么在使用所有内存。托管堆世界中的任何内容都低于 248 MB,但我使用的是 1.2 GB。
谢谢
编辑
如果我这样做 !heap -si 得到以下内容
LFH Key : 0x000000171fab7f20
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
Virtual block: 00000000017e0000 - 00000000017e0000 (size 0000000000000000)
Virtual block: 0000000045bd0000 - 0000000045bd0000 (size 0000000000000000)
Virtual block: 000000006fff0000 - 000000006fff0000 (size 0000000000000000)
0000000000060000 00000002 113024 102028 113024 27343 1542 11 3 1c LFH
External fragmentation 26 % (1542 free blocks)
0000000000010000 00008000 64 4 64 1 1 1 0 0
0000000000480000 00001002 3136 1380 3136 20 8 3 0 0 LFH
0000000000640000 00041002 512 8 512 3 1 1 0 0
0000000000800000 00001002 3136 1412 3136 15 7 3 0 0 LFH
00000000009d0000 00001002 3136 1380 3136 19 7 3 0 0 LFH
00000000008a0000 00041002 512 16 512 3 1 1 0 0
0000000000630000 00001002 7232 3628 7232 18 53 4 0 0 LFH
0000000000da0000 00041002 1536 856 1536 1 1 2 0 0 LFH
0000000000ef0000 00041002 1536 944 1536 4 12 2 0 0 LFH
00000000034b0000 00001002 1536 1452 1536 6 17 2 0 0 LFH
00000000019c0000 00001002 3136 1396 3136 16 6 3 0 0 LFH
0000000003be0000 00001002 1536 1072 1536 5 7 2 0 3 LFH
0000000003dc0000 00011002 512 220 512 100 60 1 0 2
0000000002520000 00001002 512 8 512 3 2 1 0 0
0000000003b60000 00001002 339712 168996 339712 151494 976 116 0 18 LFH
External fragmentation 89 % (976 free blocks)
Virtual address fragmentation 50 % (116 uncommited ranges)
0000000003f20000 00001002 64 8 64 3 1 1 0 0
0000000003d90000 00001002 64 8 64 3 1 1 0 0
0000000003ee0000 00001002 64 16 64 11 1 1 0 0
-------------------------------------------------------------------------------------
This is a .NET v4 windows service application running on a x64 machine. At some point after days of running steadily the windows service memory consumption spikes up like crazy until it crashes. I was able to catch it at 1.2 GB and capture a memory dump. Here is what i get
If i run !address -summary in windbg on my dump file i get the follow result
!address -summary
--- Usage Summary ------ RgnCount ------- Total Size -------- %ofBusy %ofTotal
Free 821 7ff`7e834000 ( 7.998 Tb) 99.98%
<unclassified> 3696 0`6eece000 ( 1.733 Gb) 85.67% 0.02%
Image 1851 0`0ea6f000 ( 234.434 Mb) 11.32% 0.00%
Stack 1881 0`03968000 ( 57.406 Mb) 2.77% 0.00%
TEB 628 0`004e8000 ( 4.906 Mb) 0.24% 0.00%
NlsTables 1 0`00023000 ( 140.000 kb) 0.01% 0.00%
ActivationContextData 3 0`00006000 ( 24.000 kb) 0.00% 0.00%
CsrSharedMemory 1 0`00005000 ( 20.000 kb) 0.00% 0.00%
PEB 1 0`00001000 ( 4.000 kb) 0.00% 0.00%
-
-
-
--- Type Summary (for busy) -- RgnCount ----- Total Size ----- %ofBusy %ofTotal
MEM_PRIVATE 5837 0`7115a000 ( 1.767 Gb) 87.34% 0.02%
MEM_IMAGE 2185 0`0f131000 (241.191 Mb) 11.64% 0.00%
MEM_MAPPED 40 0`01531000 ( 21.191 Mb) 1.02% 0.00%
-
-
--- State Summary ------------ RgnCount ------ Total Size ---- %ofBusy %ofTotal
MEM_FREE 821 7ff`7e834000 ( 7.998 Tb) 99.98%
MEM_COMMIT 6127 0`4fd5e000 ( 1.247 Gb) 61.66% 0.02%
MEM_RESERVE 1935 0`31a5e000 (794.367 Mb) 38.34% 0.01%
-
-
--Protect Summary(for commit)- RgnCount ------ Total Size --- %ofBusy %ofTotal
PAGE_READWRITE 3412 0`3e862000 (1000.383 Mb) 48.29% 0.01%
PAGE_EXECUTE_READ 220 0`0b12f000 ( 177.184 Mb) 8.55% 0.00%
PAGE_READONLY 646 0`02fd0000 ( 47.813 Mb) 2.31% 0.00%
PAGE_WRITECOPY 410 0`01781000 ( 23.504 Mb) 1.13% 0.00%
PAGE_READWRITE|PAGE_GUARD 1224 0`012f2000 ( 18.945 Mb) 0.91% 0.00%
PAGE_EXECUTE_READWRITE 144 0`007b9000 ( 7.723 Mb) 0.37% 0.00%
PAGE_EXECUTE_WRITECOPY 70 0`001cd000 ( 1.801 Mb) 0.09% 0.00%
PAGE_EXECUTE 1 0`00004000 ( 16.000 kb) 0.00% 0.00%
-
-
--- Largest Region by Usage ----Base Address -------- Region Size ----------
Free 0`8fff0000 7fe`59050000 ( 7.994 Tb)
<unclassified> 0`80d92000 0`0f25e000 ( 242.367 Mb)
Image fe`f6255000 0`0125a000 ( 18.352 Mb)
Stack 0`014d0000 0`000fc000 (1008.000 kb)
TEB 0`7ffde000 0`00002000 ( 8.000 kb)
NlsTables 7ff`fffb0000 0`00023000 ( 140.000 kb)
ActivationContextData 0`00030000 0`00004000 ( 16.000 kb)
CsrSharedMemory 0`7efe0000 0`00005000 ( 20.000 kb)
PEB 7ff`fffdd000 0`00001000 ( 4.000 kb)
First, why would unclassified show up once as 1.73 GB and the other time as 242 MB. (This has been answered. Thank you)
Second, i understand that unclassified can mean managed code, however my heap size according to !eeheap is only 248 MB, which actually matches the 242 but not even close to the 1.73GB. The dump file size is 1.2 GB which is much higher than normal. Where do I go from here to find out what's using all the memory. Anything in the managed heap world is under 248 MB, but i'm using 1.2 GB.
Thanks
EDIT
If i do !heap -s i get the following
LFH Key : 0x000000171fab7f20
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
Virtual block: 00000000017e0000 - 00000000017e0000 (size 0000000000000000)
Virtual block: 0000000045bd0000 - 0000000045bd0000 (size 0000000000000000)
Virtual block: 000000006fff0000 - 000000006fff0000 (size 0000000000000000)
0000000000060000 00000002 113024 102028 113024 27343 1542 11 3 1c LFH
External fragmentation 26 % (1542 free blocks)
0000000000010000 00008000 64 4 64 1 1 1 0 0
0000000000480000 00001002 3136 1380 3136 20 8 3 0 0 LFH
0000000000640000 00041002 512 8 512 3 1 1 0 0
0000000000800000 00001002 3136 1412 3136 15 7 3 0 0 LFH
00000000009d0000 00001002 3136 1380 3136 19 7 3 0 0 LFH
00000000008a0000 00041002 512 16 512 3 1 1 0 0
0000000000630000 00001002 7232 3628 7232 18 53 4 0 0 LFH
0000000000da0000 00041002 1536 856 1536 1 1 2 0 0 LFH
0000000000ef0000 00041002 1536 944 1536 4 12 2 0 0 LFH
00000000034b0000 00001002 1536 1452 1536 6 17 2 0 0 LFH
00000000019c0000 00001002 3136 1396 3136 16 6 3 0 0 LFH
0000000003be0000 00001002 1536 1072 1536 5 7 2 0 3 LFH
0000000003dc0000 00011002 512 220 512 100 60 1 0 2
0000000002520000 00001002 512 8 512 3 2 1 0 0
0000000003b60000 00001002 339712 168996 339712 151494 976 116 0 18 LFH
External fragmentation 89 % (976 free blocks)
Virtual address fragmentation 50 % (116 uncommited ranges)
0000000003f20000 00001002 64 8 64 3 1 1 0 0
0000000003d90000 00001002 64 8 64 3 1 1 0 0
0000000003ee0000 00001002 64 16 64 11 1 1 0 0
-------------------------------------------------------------------------------------
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我最近遇到了非常相似的情况,并发现了一些在调查中有用的技术。没有一个是灵丹妙药,但每一个都更能说明问题。
1) SysInternals (http://technet.microsoft.com/en-us/sysinternals/dd535533) 的 vmmap.exe 很好地关联了本机内存和托管内存的信息,并将其呈现在一个漂亮的 UI 中。可以使用以下技术收集相同的信息,但这更容易,也是一个不错的起点。遗憾的是,它不适用于转储文件,您需要一个实时进程。
2) “!address -summary”输出是更详细的“!address”输出的汇总。我发现将详细输出放入 Excel 并运行一些数据透视表很有用。使用这种技术,我发现列为“”的大量字节实际上是 MEM_IMAGE 页,很可能是在加载 DLL 时加载的数据页的副本,但在数据更改时又进行了复制。我还可以过滤到较大的区域并深入研究特定地址。用牙签在记忆转储中翻找并进行大量的祈祷是痛苦的,但也能有所启发。
3) 最后,我做了上面 vmmap.exe 技术的穷人版本。我加载转储文件,打开日志,然后运行 !address、!eeheap、!heap 和 !threads。我还使用 !teb 将 ~*k 中列出的线程环境块作为目标。我关闭了日志文件并将其加载到我最喜欢的编辑器中。然后,我可以找到一个未分类的块并进行搜索,看看它是否在更详细的命令之一的输出中弹出。您可以快速关联本机堆和托管堆,以将那些堆从可疑的未分类区域中剔除。
这些都太手动了。我很想编写一个脚本,它的输出类似于我在上面的技术 3 中生成的输出,并输出适合查看 vmmap.exe 的 mmp 文件。有一天。
最后一点:我在 vmmap.exe 的输出与 !address 输出之间进行了关联,并注意到 vmmap 从各种来源识别出的这些类型的区域(类似于 !heap 和 !eeheap 使用的区域),但 !address 不知道关于。也就是说,这些是 vmmap.exe 标记的东西,但 !address 没有标记:
仍然有很多“私有”字节未计算在内,但同样,如果我能清除这些字节,我就能够缩小问题范围。
希望这能给您一些关于如何调查的想法。我在同一条船上,所以我也很感激你的发现。谢谢!
I've recently had a very similar situation and found a couple techniques useful in the investigation. None is a silver bullet, but each sheds a little more light on the problem.
1) vmmap.exe from SysInternals (http://technet.microsoft.com/en-us/sysinternals/dd535533) does a good job of correlating information on native and managed memory and presenting it in a nice UI. The same information can be gathered using the techniques below, but this is way easier and a nice place to start. Sadly, it doesn't work on dump files, you need a live process.
2) The "!address -summary" output is a rollup of the more detailed "!address" output. I found it useful to drop the detailed output into Excel and run some pivots. Using this technique I discovered that a large number of bytes that were listed as "" were actually MEM_IMAGE pages, likely copies of data pages that were loaded when the DLLs were loaded but then copied when the data was changed. I could also filter to large regions and drill in on specific addresses. Poking around in the memory dump with a toothpick and lots of praying is painful, but can be revealing.
3) Finally, I did a poor man's version of the vmmap.exe technique above. I loaded up the dump file, opened a log, and ran !address, !eeheap, !heap, and !threads. I also targeted the thread environment blocks listed in ~*k with !teb. I closed the log file and loaded it up in my favorite editor. I could then find an unclassified block and search to see if it popped up in the output from one of the more detailed commands. You can pretty quickly correlate native and managed heaps to weed those out of your suspect unclassified regions.
These are all way too manual. I'd love to write a script that would take the output similar to what I generated in technique 3 above and output an mmp file suitable for viewing the vmmap.exe. Some day.
One last note: I did a correlation between vmmap.exe's output with the !address output and noted these types of regions that vmmap couple identify from various sources (similar to what !heap and !eeheap use) but that !address didn't know about. That is, these are things that vmmap.exe labeled but !address didn't:
There were still a lot of "private" bytes unaccounted for, but again, I'm able to narrow the problem if I can weed these out.
Hope this gives you some ideas on how to investigate. I'm in the same boat so I'd appreciate what you find, too. Thanks!
“使用摘要”表明您有 3696 个未分类区域,总计 17.33 Gb
“最大区域”表明最大的未分类区域为 242 Mb。
其余未分类的区域(3695 个区域)加在一起使差异达到 17.33 Gb。
尝试执行 !heap –s 并对 Virt col 求和以查看本机堆的大小,我认为这些也属于非托管存储桶。
(注意早期版本显示来自 !address -summary 的本机堆显式)
“Usage summary” tells that you have 3696 regions of unclassified giving a total of 17.33 Gb
“Largest Region” tells that the largest of the unclassified regions is 242 Mb.
The rest of the unclassified (3695 regions) together makes the difference up to 17.33 Gb.
Try to do a !heap –s and sum up the Virt col to see the size of the native heaps, I think these also falls into the unmanaged bucket.
(NB earlier versions shows native heap explicit from !address -summary)
我保留了一份 Windows 6.11.1.404 调试工具的副本,它似乎能够显示一些对于“未分类”更有意义的内容
,在该版本中,我看到了 TEB 地址列表,然后是:
在我的“当前”版本(6.12.0)中,我看到了 TEB 地址列表。 2.633) 我从同一个转储中得到这个。我注意到两件事:
数据似乎是 HeapAlloc/RegionUsageHeap 和 VirtualAlloc/RegionUsageIsVAD 的总和。
可爱的 EFAIL 错误无疑是造成数据丢失的部分原因!
我不确定这将如何帮助您处理托管代码,但我认为它实际上回答了最初的问题;-)
I keep a copy of Debugging Tools for Windows 6.11.1.404 which seems to be able to display something more meaningful for "unclassified"
With that version, I see a list of TEB addresses and then this:
With my "current" version (6.12.2.633) I get this from the same dump. Two things I note:
The data seems to be the sum of the HeapAlloc/RegionUsageHeap and VirtualAlloc/RegionUsageIsVAD).
The lovely EFAIL error which is no doubt in part responsible for the missing data!
I'm not sure how that'll help you with your managed code, but I think it actually answers the original question ;-)
您最好的选择是使用windbg中的EEHeap和GCHandles命令(http://msdn.microsoft.com/en-us/library/bb190764.aspx)并尝试看看是否可以找到可能泄漏的内容/那样就错了。
不幸的是,您可能无法获得您正在寻找的确切帮助,因为诊断这些类型的问题几乎总是非常耗时,并且除了最简单的情况之外还需要有人对转储进行全面分析。基本上,不太可能有人能够向您指出有关堆栈溢出的直接答案。大多数情况下,人们能够向您指出可能有帮助的命令。您将必须进行大量挖掘才能找到有关正在发生的事情的更多信息。
You're best bet would be to use the EEHeap and GCHandles commands in windbg (http://msdn.microsoft.com/en-us/library/bb190764.aspx) and try to see if you can find what might be leaking/wrong that way.
Unfortunately you probably won't be able to get the exact help you're looking for due to the fact that diagnosing these types of issues is almost always very time intensive and outside of the simplest cases requires someone to do a full analysis on the dump. Basically it's unlikely that someone will be able to point you towards a direct answer on Stack overflow. Mostly people will be able to point you commands that might be helpful. You're going to have to do a lot of digging to find out more information on what is happening.
我最近花了一些时间诊断客户的问题,他们的应用程序在终止前使用了 70GB(可能是由于达到了 IIS 应用程序池回收限制,但尚未得到证实)。他们向我发送了 35 GB 的内存转储。根据我最近的经验,我可以对您提供的内容进行一些观察:
在 !heap -s 输出中,1.247 GB 中的 284 MB 显示在“提交”列中。如果您要在 DebugDiag 中打开此转储,它会告诉您堆 0x60000 有 1 GB 已提交内存。您将报告的 11 个段的提交大小加起来,发现它们加起来只有大约 102 MB,而不是 1 GB。太烦人了。
“丢失”的记忆并没有丢失。它实际上在 !heap -s 输出中暗示为“虚拟块:”行。不幸的是, !heap -s 很糟糕,无法正确显示结束地址,因此报告大小为 0。检查以下命令的输出:
它将报告正确的结束地址,从而报告准确的“区域大小”。更好的是,它提供了区域大小的简洁版本。如果将这 3 个区域的大小添加到 102 MB,则应该非常接近 1 GB。
那么它们里面有什么呢?好吧,你可以使用 dq 来查看。通过探索,您可能会找到分配它们的原因的线索。也许您的托管代码调用了一些具有本机端的第三方代码。
您可以使用
!heap 6fff0000 -x -v
找到对堆的引用。如果有引用,您可以再次使用 !address 来查看它们所在的内存区域。在我的客户问题中,我发现了一个位于“用法:堆栈”区域的引用。 “更多信息:”提示引用了堆栈的线程,该线程恰好在顶部有一些大型 basic_string 附加/复制调用。I recently spent some time diagnosing a customers issue where their app was using 70GB before terminating (likely due to hitting an IIS App Pool recycling limit, but still unconfirmed). They sent me a 35 GB memory dump. Based on my recent experience, here are some observations I can make about what you've provided:
In the !heap -s output, 284 MB of the 1.247 GB is shown in the Commit column. If you were to open this dump in DebugDiag it would tell you that heap 0x60000 has 1 GB committed memory. You'll add up the commit size of the 11 segments reported and find that they only add up to about 102 MB and not 1GB. So annoying.
The "missing" memory isn't missing. It's actually hinted at in the !heap -s output as "Virtual block:" lines. Unfortunately, !heap -s sucks and doesn't show the end address properly and therefore reports size as 0. Check the output of the following commands:
It will report the proper end address and therefore an accurate "Region Size". Even better, it gives a succinct version of the region size. If you add the size of those 3 regions to 102 MB, you should be pretty close to 1 GB.
So what's in them? Well, you can look using dq. By spelunking you might find a hint at why they were allocated. Perhaps your managed code calls some 3rd party code which has a native side.
You might be able to find references to your heap by using
!heap 6fff0000 -x -v
. If there are references you can see what memory regions they live in by using !address again. In my customer issue I found a reference that lived on a region with "Usage: Stack". A "More info: " hint referenced the stack's thread which happened to have some large basic_string append/copy calls at the top.