记录 C++ 中的代码执行情况
在多次使用 gprof 和 callgrind 后,我得出了(显而易见的)结论:在处理大型数据时(例如在加载整个数据的 CAD 程序中),我无法有效地使用它们。汽车)程序。我在想,也许我可以使用一些 C/C++ MACRO 魔法并以某种方式构建一个简单(但很好)的日志记录机制。例如,可以使用以下宏调用函数:
#define CALL_FUN(fun_name, ...) \
fun_name (__VA_ARGS__);
我们可以在函数调用之前和之后添加一些时钟/计时内容,以便使用 CALL_FUN 调用的每个函数都会计时,例如
#define CALL_FUN(fun_name, ...) \
time_t(&t0); \
fun_name (__VA_ARGS__); \
time_t(&t1);
变量 t0、t1 可以在全局变量中找到记录对象。该日志记录对象还可以保存通过CALL_FUN调用的每个函数的调用图。然后,可以将该对象写入(特定格式的)文件中,并从其他程序中进行解析。
所以这是我的(第一)问题:您认为这种方法容易处理吗?如果是,如何增强,如果不是,你能提出一种更好的方法来测量时间和记录调用图吗?
一位同事提出了另一种方法来处理这个问题,即为每个函数添加特定的注释(我们关心记录)。然后,在 make 过程中,必须运行一个特殊的预处理器,解析每个源文件,为我们要记录的每个函数添加日志逻辑,使用新添加的(解析)代码创建一个新的源文件,然后构建该代码。我想到处阅读 CALL_FUN... 宏(我的建议)并不是最好的方法,他的方法可以解决这个问题。那么您对这种方法有何看法?
PS:我不太熟悉C/C++宏的陷阱,所以如果可以使用其他方法开发它,请说出来。
谢谢。
Having used gprof and callgrind many times, I have reached the (obvious) conclusion that I cannot use them efficiently when dealing with large (as in a CAD program that loads a whole car) programs. I was thinking that maybe, I could use some C/C++ MACRO magic and somehow build a simple (but nice) logging mechanism. For example, one can call a function using the following macro:
#define CALL_FUN(fun_name, ...) \
fun_name (__VA_ARGS__);
We could add some clocking/timing stuff before and after the function call, so that every function called with CALL_FUN gets timed, e.g
#define CALL_FUN(fun_name, ...) \
time_t(&t0); \
fun_name (__VA_ARGS__); \
time_t(&t1);
The variables t0, t1 could be found in a global logging object. That logging object can also hold the calling graph for each function called through CALL_FUN. Afterwards, that object can be written in a (specifically formatted) file, and be parsed from some other program.
So here comes my (first) question: Do you find this approach tractable ? If yes, how can it be enhanced, and if not, can you propose a better way to measure time and log callgraphs ?
A collegue proposed another approach to deal with this problem, which is annotating with a specific comment each function (that we care to log). Then, during the make process, a special preprocessor must be run, parse each source file, add logging logic for each function we care to log, create a new source file with the newly added (parsing) code, and build that code instead. I guess that reading CALL_FUN... macros (my proposal) all over the place is not the best approach, and his approach would solve this problem. So what is your opinion about this approach?
PS: I am not well versed in the pitfalls of C/C++ MACROs, so if this can be developed using another approach, please say it so.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
好吧,您可以使用一些 C++ 魔法来嵌入日志记录对象。就像在你的函数中一样
,你只需写
Well you could do some C++ magic to embed a logging object. something like
in your functions then you simply write
我有点晚了,但这就是我正在做的事情:
在 Windows 上有一个 /Gh 编译器开关 使编译器在每个函数的开头插入一个隐藏的 _penter 函数。还有一个用于获取 _pexit 调用< /a> 在每个函数的末尾。
您可以利用它来获取每个函数调用的回调。这是一篇包含更多详细信息和示例源代码的文章:
http://www.johnpanzer.com/aci_cuj /index.html
我在自定义日志系统中使用这种方法,将最后几千个函数调用存储在环形缓冲区中。事实证明,这对于崩溃调试非常有用(与 MiniDump 结合使用)。
对此的一些注意事项:
所有这些对我来说都非常有效。
I am bit late, but here is what I am doing for this:
On Windows there is a /Gh compiler switch which makes the compiler to insert a hidden _penter function at the start of each function. There is also a switch for getting a _pexit call at the end of each function.
You can utilizes this to get callbacks on each function call. Here is an article with more details and sample source code:
http://www.johnpanzer.com/aci_cuj/index.html
I am using this approach in my custom logging system for storing the last few thousand function calls in a ring buffer. This turned out to be useful for crash debugging (in combination with MiniDumps).
Some notes on this:
All this works suprisingly well for me.
许多优秀的工业库都将函数的声明和定义包装到 void 宏中,以防万一。如果您的代码已经是这样的——请继续使用一些简单的异步跟踪记录器来调试您的性能问题。如果没有的话,此类宏的插入后可能会非常耗时。
我可以理解在 valgrind 下运行 1Mx1M 矩阵求解器的痛苦,所以我建议从所谓的“蒙特卡罗分析方法”开始 - 启动该过程并并行运行 pstack 重复,每秒说一次。因此,您将有 N 个堆栈转储(N 可能非常重要)。然后,数学方法是计算每个堆栈的相对频率并得出最频繁的结论。在实践中,您要么立即看到瓶颈,要么如果没有,则切换到二分法、gprof,最后切换到 valgrind 的工具集。
Many nice industrial libraries have functions' declarations and definitions wrapped into void macros, just in case. If your code is already like that -- go ahead and debug your performance problems with some simple asynchronous trace logger. If no -- post-insertion of such macros can be an unacceptably time-consuming.
I can understand the pain of running an 1Mx1M matrix solver under valgrind, so I would suggest starting with so called "Monte Carlo profiling method" -- start the process and in parallel run pstack repeatedly, say each second. As a result you will have N stack dumps (N can be quite significant). Then, the mathematical approach would be to count relative frequencies of each stack and make a conclusion about the ones most frequent. In practice you either immediately see the bottleneck or, if no, you switch to bisection, gprof, and finally to valgrind's toolset.
让我假设您这样做的原因是您想要找到任何性能问题(瓶颈),以便您可以修复它们以获得更高的性能。
与测量速度或获取覆盖信息相反。
您似乎认为执行此操作的方法是记录函数调用的历史记录并测量每个调用需要多长时间。
还有一种不同的方法。
它基于这样的想法:程序主要遍历一个大的调用树。
如果时间被浪费了,那是因为调用树比必要的更加茂密,
在被浪费的时间里,造成浪费的代码在堆栈上是可见的。
它可以是终端指令,但更可能是函数调用,几乎在堆栈的任何级别。
只需在调试器下暂停程序几次,最终就会显示它。
你在多个堆栈样本上看到它所做的任何事情,如果你能改进它,都会加快程序的速度。
无论时间是否花费在 CPU、I/O 或任何其他消耗挂钟时间的事物上,它都有效。
它没有向您展示的是大量您不需要知道的内容。
它不向您显示瓶颈的唯一方法是它们非常小,
在这种情况下,代码非常接近最佳。
这里有更多解释。
Let me assume the reason you are doing this is you want to locate any performance problems (bottlenecks) so you can fix them to get higher performance.
As opposed to measuring speed or getting coverage info.
It seems you're thinking the way to do this is to log the history of function calls and measure how long each call takes.
There's a different approach.
It's based on the idea that mainly the program walks a big call tree.
If time is being wasted it is because the call tree is more bushy than necessary,
and during the time that's being wasted, the code that's doing the wasting is visible on the stack.
It can be terminal instructions, but more likely function calls, at almost any level of the stack.
Simply pausing the program under a debugger a few times will eventually display it.
Anything you see it doing, on more than one stack sample, if you can improve it, will speed up the program.
It works whether or not the time is being spent in CPU, I/O or anything else that consumes wall clock time.
What it doesn't show you is tons of stuff you don't need to know.
The only way it can not show you bottlenecks is if they are very small,
in which case the code is pretty near optimal.
Here's more of an explanation.
虽然我认为很难做比 gprof 更好的事情,但您可以创建一个特殊的 LOG 类,并在您想要记录的每个函数的开头实例化它。
现在您可以将此方法与您提到的预处理方法集成。只需在要记录的每个函数的开头添加类似的内容:
然后在调试版本中自动替换该字符串(应该不难)。
Although I think it will be hard to do anything better than gprof, you can create a special class LOG for instance and instantiate it in the beginning of each function you want to log.
Now you can integrate this approach with the preprocessing one you mentioned. Just add something like this in the beginning of each function you want to log:
and then you replace the string automatically in debug builds (shoudn't be hard).
也许您应该使用分析器。 AQTime 对于 Visual Studio 来说是一个相对较好的工具。 (如果您有 VS2010 Ultimate,则您已经拥有分析器。)
May be you should use a profiler. AQTime is a relatively good one for Visual Studio. (If you have VS2010 Ultimate, you already have a profiler.)