针对 Linux 进程挂起问题的调试实用程序?
我有一个执行配置管理的守护进程。所有其他进程都应与该守护进程交互才能发挥作用。但是,当我执行大型操作时,几个小时后,守护进程在 2 到 3 小时内没有响应。 2-3小时后即可正常工作。
针对 Linux 进程挂起问题的调试实用程序?
如何知道linux进程在什么时候挂起?
I have a daemon process which does the configuration management. all the other processes should interact with this daemon for their functioning. But when I execute a large action, after few hours the daemon process is unresponsive for 2 to 3 hours. And After 2- 3 hours it is working normally.
Debugging utilities for Linux process hang issues?
How to get at what point the linux process hangs?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
其中每一个都可能提供一些信息,这些信息共同构成了问题的全貌。
使用 gdb 时,当应用程序被阻止时触发核心转储可能会很有用。然后您将获得一个静态快照,您可以在闲暇时使用事后调试进行分析。您可以通过脚本触发这些。您可以快速构建一组可用于测试您的理论的快照。
Each of these may give a little bit of information which together build up a picture of the issue.
When using gdb, it might be useful to trigger a core dump when the app is blocked. Then you have a static snapshot which you can analyze using post mortem debugging at your leisure. You can have these triggered by a script. The you quickly build up a set of snapshots which can be used to test your theories.
一种选择是使用
gdb
并使用attach
命令来附加到正在运行的进程。您需要加载包含相关可执行文件符号的文件(使用file
命令)One option is to use
gdb
and use theattach
command in order to attach to a running process. You will need to load a file containing the symbols of the executable in question (using thefile
command)有多种不同的方法可以实现:
侦听 UNIX 域套接字,以处理状态请求。然后,外部应用程序可以询问该应用程序是否仍然正常。如果在某个超时时间内没有得到响应,则可以认为正在查询的应用程序已死锁或已死亡。
定期触摸具有预选路径的文件。外部应用程序可以查看文件的时间戳,如果它已过时,则可以假设该应用程序已死或死锁。
您可以重复使用
alarm
系统调用,让信号终止进程(相应地使用sigaction)。只要您继续调用alarm
(即只要您的程序正在运行)它就会继续运行。一旦不这样做,信号就会触发。当进程终止时,您可以使用
fork
和waitpid
无缝地重新启动进程,如在此答案中。它不会花费任何大量资源,因为操作系统将共享内存页面。There are a number of different ways to do:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
You can use the
alarm
syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep callingalarm
(i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.You can seamlessly restart your process as it dies with
fork
andwaitpid
as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.