C++在分析时发现 ntdll 存在性能问题 - 如何解决它?
我正在研究一些刚体模拟。我使用 Irrlicht 引擎进行显示,并使用 openMesh 来处理网格。
现在,我使用 VerySleepy 分析了我的应用程序,并注意到大部分时间都花在以下函数中(不包括子函数中花费的时间):
RtlCompareMemoryUlong 模块“ntdll”源文件“未知”中的 RtlCompareMemoryUlong 30%
KiFastSystemCallRet 模块“ntdll”源文件中的 21% “未知”
RtlFillMemoryUlong 9% 在模块“ntdll”源文件“未知”中,
所以 50% 的时间花在这些函数上,我不会从代码中的某个地方调用它们,而且我不明白它们在做什么。我怀疑它与图形有关,因为我只显示非常简单的网格。
有人可以给我一个提示,告诉我如何弄清楚为什么调用这些函数以及如何摆脱它吗?
谢谢!
I'm working on a little rigidbody simulation. I use the Irrlicht engine for display and openMesh to work with the meshes.
Now I profiled my app using VerySleepy and noticed that most of the time is spent within the following functions (exclusive the time spent in subfunctions):
RtlCompareMemoryUlong 30% within module "ntdll" sourcefile "unknown"
KiFastSystemCallRet 21% within module "ntdll" sourcefile "unknown"
RtlFillMemoryUlong 9% within module "ntdll" sourcefile "unknown"
so 50% of the time is spent in those functions and I don't call them from somewhere in my code and i don't understand what they are doing. I doubt it's connected to the graphics, since i'm only displaying very simple meshes.
Can someone give me a hint on how to figure out why those functions are called and how to get rid of that?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
ntdll是NT内核函数。这些函数很可能被称为其他函数的内部函数来执行低级操作,因此您会看到它们花费了大量时间 - 它们是更高级别功能的子构建块。忽略它们并在其他地方(在调用堆栈上)查找性能调整;您不太可能能够摆脱应用程序中的操作系统调用。 ;)
ntdll is the NT kernel functions. Chances are those are called internal to other functions to do low level operations, hence why you're seeing a lot of time spent in them - they're the sub-building-blocks of higher level functionality. Ignore them and look elsewhere (up the callstack) for performance tweaking; you're not likely to be able to get rid of the OS calls from your application. ;)
性能问题可能是这些函数被多次调用,而不是这些函数本身。从名字就能猜出它们的用途。 KiFastSystemCallRet 特别指示您的应用程序进入内核模式。
忽略配置文件中的 ntdll 函数,仅关注您编写/控制的函数。
The performance problem is probably that these functions are being called a lot, not in these functions themselves. You can guess from the names what they're used for. KiFastSystemCallRet in particular indicates your app went into Kernel mode.
Ignore the ntdll functions in your profile, and focus only on the functions that you wrote/control.
使用更好的分析器。在 OS X 上,CPU 仪器 Xcode 附带的应用程序提供了出色的诊断信息,使发现性能问题变得容易。
您想要看到的是这段时间的调用堆栈。这将显示哪个库和函数一直在调用该操作系统函数。一旦您知道了这一点,只需减少调用该库函数的次数即可。
Use a better profiler. On OS X, the CPU Instruments app that comes with Xcode gives excellent diagnostic information that makes spotting performance problems easy.
What you want to see is the callstack during all this time. That will show you which library and function is calling that OS function all the time. Once you know that, it's simply a matter of calling into that library function less often.
RtlCompareMemory / RtlFillMemory 听起来像是它们可能是 memcmp() / memset() 的底层实现。
无论如何,您希望更改探查器的设置以在调用应用程序/库函数下显示系统调用时间,以便您可以看到调用最终来自哪里。
RtlCompareMemory / RtlFillMemory sound like they're probably the underlying implementations for memcmp() / memset().
Regardless, you want to change the settings of your profiler to show system call time under the calling app / library function so you can see where the calls are ultimately coming from.
弗兰克·克鲁格是对的。您需要在程序运行时深入了解调用堆栈。 以下是原因的简单解释。
您可能会惊讶地发现您不需要特殊工具或大量样品。
Frank Krueger is right. You need insight into the call stack as your program runs. Here's a simple explanation of why that is so.
It may surprise you that you do not need special tools or a large number of samples.
当您一直卡在系统中时,您应该将其更多地视为一种症状,而不是实际问题的一部分。
内存碎片和分页是常见的嫌疑,但也可能有多种原因。
根据我的经验,性能问题很少像您专门调用某些东西那样明显。像通常建议的那样进行优化通常在非常低的水平上是没有用的。它捕获的错误是正确的,但通常是无意的,例如分配某些内容并一遍又一遍地删除它,但对于这样的事情,您通常需要深入了解发生的所有事情才能准确找出问题所在(但就像我一样)说,令人惊讶的是,如果您经常陷入系统调用,那么它通常与内存管理相关)。
You should take it as more of a symptom than part of the actual problem when you are stuck in system all the time.
Memory fragmentation and paging out is the usual suspect, but it could be a myriad of things.
In my experience performance problems are seldom something obvious like you are calling something specifically. Optimizing like commonly suggested is usually useless at a really low level. It catches things that amount to bugs that are correct but usually unintended like allocating something and deleting it over and over but for things like this you often need to have a deep understanding of everything happening to figure out exactly where the issue is (but like I said, surprisingly often it's memory management related if you are stuck in system calls a lot).