解决粘性问题的方法
当一个软件卡住、对用户输入没有响应并且不更新其显示时,如何准确确定它正在做什么?
我尝试过 oprofile,它记录了正在执行的函数,但它没有给我足够的线索。当我需要查看仅当样本程序卡住时发生的情况时,它会计算运行期间发生的所有情况。
问题可能涉及中断、等待网络套接字、计时器、GUI 事件处理程序或谁知道是什么。如何尽可能多地了解正在发生的事情,而不仅仅是每个线程的执行点?
感兴趣的软件在 Linux 上运行,使用 gcc 构建,主要是 C++,但可能涉及其他语言,包括解释语言,例如 Python。
现在特别值得关注的是 Firefox,我已经检查了它的源代码。 Firefox 会随机频繁地暂停所有输入和屏幕输出,每次大约 5-10 秒。即使有人给了我这个特殊问题的解决方案,我当然会接受,但仍然会问。如果可能的话,我想学习适用于任何软件的通用技术,尤其是我负责的软件。
How to determine exactly what a piece of software is doing when it is stuck, unresponsive to user input and not updating its display?
I have tried oprofile, which records what function is executing, but it's not giving me enough clues. It counts everything that happens during the time it's running, when I need to see what's happening only when the specimen program is stuck.
The problem might involve interrupts, waiting on network sockets, timers, a GUI event handler, or who knows what. How to find out as much as possible about what's going on, not just the execution points of each thread?
The soffware of interest runs on Linux, built using gcc, mostly C++ but may involve other languages including interpreted ones e.g. Python.
The particular case of concern now is Firefox, for which I have checked out source. Firefox pauses all input and screen output at random times, frequently, for about 5-10 seconds each time. Even if someone handed me the solution to this particular problem on a silver platter, sure I'll take it but still be asking. If possible, I'd like to learn general techniques that would apply to any software, especially stuff I'm responsible for.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
strace 将跟踪系统调用。这可能会给出一些有关网络套接字上阻塞的情况的指示等等。
strace will trace out the system calls. This might give some indication of what is blocking on network sockets and so on.
这种技术应该找到它。
基本上,虽然它花费这样的时间,但堆栈上几乎总是有一个函数调用层次结构等待其工作完成。
只需对堆栈进行几次采样,您就会看到它们。
添加:正如 Don Wakefield 指出的那样,pstack 实用程序非常适合这项工作。
This technique should find it.
Basically, while it's spending time like that, there's almost always a hierarchy of function calls on the stack waiting for their work to be completed.
Just sample the stack a few times and you'll see them.
ADDED: As Don Wakefield pointed out, the pstack utility could be perfect for this job.
可以获得正在运行的程序的堆栈跟踪。在命令行中,使用“ps aux”查找程序的 PID。假设它是 12345。然后运行:
当程序卡在暂停状态时(或者做任何可疑的事情时),在 gdb 中执行 ctrl-C。 gdb 中的“bt”命令打印堆栈,现在可以查看该堆栈,也可以将其粘贴到文本文件中以供以后研究。使用“c”继续执行程序(继续)。
与使用 oprofile 或其他分析器相比,这种手动技术的主要优点是我可以在感兴趣的时刻获得准确的调用序列。出现问题时的一些样本以及程序正常运行时的一些样本应该会提供有用的线索。
A stack trace can be obtained of a running program. At a command line, use "ps aux" to find the program's PID. Suppose it's 12345. Then run:
When the program is stuck in a pause (or when doing anything suspicious), do a ctrl-C in gdb. The "bt" command in gdb prints the stack, which can be admired now or pasted into a text file for later study. Resume execution of the program with "c" (continue).
The main advantage of this manual technique over using oprofile or other profilers, is I can get the exact call sequence during a moment of interest. A few samples during times of trouble, and a few when the program is running normally, should give useful clues.