优化堆栈遍历性能
目前,我使用 dbghelp 库遍历某些进程线程的堆栈(使用 GetThreadContext() 和 StackWalk64())并仅收集每个帧包含的返回地址。
然而,这样做的开销对于系统需求来说太大了 - 总时间约为 apx。每次堆栈行走 5 毫秒(10-15 帧)。这次包括 GetThreadContext() 和调用 StackWalk64() 来获取所有帧的循环。
无论如何,我必须找到一种更快的方法。任何人都知道我该怎么做?
编辑:
有人知道 ETW(Windows 事件跟踪)机制吗?
如果是这样,我如何跟踪在特定时间段内发生的所有上下文切换? 是否有一个事件提供者在每次上下文切换时发布一个事件?
Currently i use the dbghelp library to walk through the stack of some process' thread (using GetThreadContext() and StackWalk64()) and collect only the return addresses each frame contains.
However, the overhead of doing so is too big for the systems demands - overall time is apx. 5 msec per stack walk (with 10-15 frames). This time includes the GetThreadContext() and the loop which calls StackWalk64() to get all the frames.
Anyhow, I must find a way to do it much much faster. Anyone has any idea how can i do that?
Edit:
Does anyone know of the ETW (Event Tracing for Windows) mechanism?
If so, how can I trace all the context switches that happened in a certain period of time?
Is there an event provider that publishes an event on each context switch?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我能想到的最快的方法是通过创建一个捕获
kernelStack
字段的内核驱动程序来创建您自己的GetThreadContext
和StackWalk64
版本您尝试监视的线程的ETHREAD
结构。 这里是关于这个主题的一篇好文章。The fastest way that I can think of is to create your own version of
GetThreadContext
andStackWalk64
by creating a kernel driver that grabs thekernelStack
field ofETHREAD
structure of the thread your trying to monitor. Here is a good article on this subject.如果您使用的是 Windows Vista 或更高版本,则应该使用 ETW。您可以激活您正在谈论的所有内容,包括上下文切换和示例配置文件事件,而且非常高效。对于 X86,它基本上遍历 EBP 寄存器链,这是一个需要迭代的地址链表。在 64 位环境中,堆栈遍历器必须展开堆栈,因此效率稍低,但我可以告诉您,如果您在应用程序中进行了合理的工作量,堆栈遍历的效果将不会显示向上。它肯定不在毫秒范围内。
If you're on Windows Vista or higher, you should use ETW, period. You can activate all what you're talking about, including Context Switches and Sample Profile events, and it's pretty efficient. For X86, it's basically walking the EBP register chain, which is a linked list of addresses that it needs to iterate over. In 64-bit land, the stack walker has to unwind the stack, and so it's a little less efficient, but I can tell you if you're doing any reasonable amount of work in your application, the effects of stack walking will not show up. It's certainly not in the millisecond range.
ETW部分实际上是一个独立的问题。 Windows 性能分析工具可以捕获所有上下文切换,以及Visual Studio 分析器处于“资源争用并发分析”模式。您还可以使用 logman 将所有事件手动转储到文件中,请参阅此处的说明。
The ETW part is actually an independent question. Windows Performance Analysis Tools can capture all context-switches, as well as Visual Studio Profiler in "Resource Contention Concurrency Profiling" mode. You can also dump all events into file manually using logman, see the instructions here.