逻辑线程的数量令人难以置信; Windbg 看不到它们?
我有一个进程显示 ~4,294,965,900 个“当前逻辑线程”(根据性能计数器)和 ~400 个物理线程。
我使用 ADPlus (-hang) 创建了内存转储,而 Windbg (!threads) 只显示物理线程。
我如何找出所有这些逻辑线程来自哪里?
I've got a process that is showing ~4,294,965,900 "current logical threads" (according to the performance counters) and ~400 physical threads.
I've created a memory dump using ADPlus (-hang), and windbg (!threads) only shows me the physical threads.
How do I find out where all these logical threads are coming from?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
对我来说,这个数字似乎高得令人怀疑。
数字-1396表示为无符号32位整数是4,294,965,900,1396看起来更合理。
也许某个地方有错误?
That looks like a suspiciously high number to me.
The number -1396 represented as an unsigned 32-bit integer is 4,294,965,900, and 1396 looks more reasonable.
A bug somewhere, perhaps?
他们不是。它们不存在。您根本不可能拥有 40 亿个任何类型的线程,除非您在 64 位计算机上运行,哦,至少有几拍字节的 RAM< /em>.
每个线程,无论是“物理”操作系统线程还是由某个框架提供的线程,都至少需要某种标识符。如果这是一个 32 位数字,那么仅仅存储这些标识符就会占用近 16GB 的 RAM。 (当然,您还剩下大约 1600 个未使用的标识符)。如果标识符是 64 位宽,则需要 32GB RAM。最重要的是,每个线程都需要一些堆栈空间(常见的默认值为 1MB,这为我们带来了高达 4PB 的内存)。
这是一个错误。线程不存在,并且性能计数器由于某种原因向您报告垃圾值。
例如,它可能是一个负错误代码,当转换为无符号整数时,它会变成这个巨大的数字。
或者可能是其他一些错误情况。
they aren't. They don't exist. You simply can't have 4 billion threads of any kind, unless you're running on a 64-bit machine with, oh, say a couple of petabyte of RAM at the very least.
Every thread, whether it is a "physical" OS thread or is provided by some framework, need at the very least, some kind of identifier. If that's a 32-bit number then just storing these identifiers will take up nearly 16GB of RAM. (And, of course, you'll have around 1600 unused identifiers left). If the identifiers are 64 bits wide, you need 32GB RAM. On top of that, every thread needs some stack space (a common default is 1MB, which brings us up to 4 petabytes of memory).
It is a bug. The threads don't exist, and the performance counters are reporting a garbage value to you for some reason or other.
For example, it could be a negative error code which, when converted an unsigned integer, becomes this huge number.
Or it could be some other error condition.
由于您的进程正在运行托管代码,因此逻辑线程计数很可能是指 CLR 线程。 .Net在CLR逻辑线程和物理线程之间进行映射。要进一步研究这一点,您可以在 Windbg 中使用 !threads 命令。这是此命令的输出示例:
请注意,在输出的顶部,它打印出统计信息。如果您发现死线程数量过多,则可能表明存在资源泄漏。查看 此类资源泄漏的一个示例。
在 !threads 输出中,左列是非托管线程 ID(与
~
命令显示的相同),第二列是 CLR 线程 ID,第三列是操作系统线程 ID。Since your process is running managed code, chances are the logical thread count refers to CLR threads. .Net does mapping between CLR logical threads and physical threads. To investigate this further, you can use !threads command in Windbg. This is example of output from this command:
Note at the top of output it prints out statistics. If you find exessively large number of dead threads, that might indicate resource leaks. Check out one example of this type of resource leak.
In the !threads output the left column is unmanaged thread ID (same as displayed by
~
command), second column is CLR thread ID and third column is OS thread ID.这周我遇到了同样的问题,同样的症状。这是真的。是的,我的服务器令人印象深刻,128G RAM 和 24 个核心。
这里的问题确实是逻辑线程。如果 CLR 可以避免的话,它就不会创建真正的线程。我有一个像
timer.Change(10000, 10000)
一样定期重新激活的计时器,并且在计时器回调内部我的代码挂在网络上,这让 CLR 运行时知道这个“物理线程”可以被重用。然后10秒后再次触发定时器,并创建一个新的逻辑线程,依此类推。下一个问题是我的其余代码彻底使用任务,并且这些也拉动逻辑线程。将所有这些结合起来,在一两周内就会产生数十亿个逻辑线程的连锁反应。我的修复很简单:使计时器不重复出现,但仅在前一个计时器完成后重新安排下一个计时器触发:
timer.Change(10000,Timeout.Infinite)
并进行计时器回调,以便在合理的超时后取消 io。I had the same problem this week, same symptoms. It was real. Yes my server is impressive, 128G Ram and 24 cores.
The problem here was logical threads indeed. CLR doesn't create a real thread if it can avoid it. I had a Timer with periodic reactivation like
timer.Change(10000, 10000)
and inside the timer callback my code hung on network, which let CLR runtime know this 'physical thread' could be reused. Then 10 seconds later the timer is triggered again, and a new logical thread is created, and so on. The next issue is that the rest of my code uses Tasks thoroughly, and those pull also logicalthreads. Combine it all, and you have a ripple effect of billions of logical threads in a week or two.My fix was easy: make the timer not recurring, but reschedule the next timer trigger only after the previous one is finished:
timer.Change(10000,Timeout.Infinite)
and make the timer callback so it cancels the io upon some reasonable timeout.