当 Linux 在 C 程序中崩溃时获得更好的调试
我们有一个在 MIP 内核上运行的嵌入式版本的 Linux 内核。我们编写的程序运行一个特定的测试套件。在其中一项压力测试(运行约 12 小时)期间,我们遇到了段错误。这又会生成核心转储。
不幸的是,核心转储不是很有用。崩溃发生在某些动态链接的系统库中(可能是 pthread 或 glibc)。核心转储中的回溯没有帮助,因为它只显示崩溃点而没有其他调用者(我们的用户空间应用程序是使用 -g -O0 构建的,但仍然没有回溯信息):
Cannot access memory at address 0x2aab1004
(gdb) bt
#0 0x2ab05d18 in ?? ()
warning: GDB can't find the start of the function at 0x2ab05d18.
GDB is unable to find the start of the function at 0x2ab05d18
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
This problem is most likely caused by an invalid program counter or
stack pointer.
However, if you think GDB should simply search farther back
from 0x2ab05d18 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.
另一个不幸的是我们无法运行gdb/gdb服务器。 gdb/gdbserver 在 __nptl_create_event 上不断中断。看到测试创建线程、计时器并每 5 秒销毁一次,几乎不可能长时间点击继续它们。
编辑: 另请注意,我们的工具链不支持 backtrace 和 backtrace_symbols。
因此:
是否有一种方法可以捕获段错误并生成更多回溯数据、堆栈指针、调用堆栈等?
有没有办法从 .so 文件中崩溃的核心转储中获取更多数据?
谢谢。
We have an embedded version of Linux kernel running on a MIPs core. The Programme we have written runs a particular test suite. During one of the stress tests (runs for about 12hrs) we get a seg fault. This in turn generates a core dump.
Unfortunately the core dump is not very useful. The crash is in some system library that is dynamically linked (probably pthread or glibc). The backtrace in the core dump is not helpful because it only shows the crash point and no other callers (our user space app is built with -g -O0, but still no back trace info):
Cannot access memory at address 0x2aab1004
(gdb) bt
#0 0x2ab05d18 in ?? ()
warning: GDB can't find the start of the function at 0x2ab05d18.
GDB is unable to find the start of the function at 0x2ab05d18
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
This problem is most likely caused by an invalid program counter or
stack pointer.
However, if you think GDB should simply search farther back
from 0x2ab05d18 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.
Another unfortunate-ness is that we cannot run gdb/gdbserver. gdb/gdbserver keeps breaking on __nptl_create_event. Seeing that the test creates threads, timers and destroys then every 5s it is almost impossible to sit for a long time hitting continue on them.
EDIT:
Another note, backtrace and backtrace_symbols is not supported on our toolchain.
Hence:
Is there a way of trapping seg fault and generate more backtrace data, stack pointers, call stack, etc.?
Is there a way of getting more data from a core dump that crashed in a .so file?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
崩溃时该地址是什么?
执行
info shared
,并查明是否有包含该地址的库。最可能导致您遇到麻烦的原因是:您在将其上传到目标之前是否运行了
strip libpthread.so.0
?不要这样做:GDB 要求 libpthread.so.0 不能被剥离。如果您的工具链包含带有调试符号的libpthread.so.0
(因此对于目标来说太大),请在其上运行strip -g
,而不是完整的strip
。更新:
这意味着GDB无法访问共享库列表(这将解释丢失的堆栈跟踪)。最常见的原因:实际生成核心的二进制文件与您提供给 GDB 的二进制文件不匹配。一个不太常见的原因:您的核心转储被截断(可能是由于
ulimit -c
设置得太低)。What is at that address at the time of the crash?
Do
info shared
, and find out if there is a library that contains that address.The most likely cause of your troubles: did you run
strip libpthread.so.0
before uploading it to your target? Don't do that: GDB requires libpthread.so.0 to not be stripped. If your toolchain containslibpthread.so.0
with debug symbols (and thus too large for the target), runstrip -g
on it, not a fullstrip
.Update:
This means that GDB can not access the shared library list (which would then explain the missing stack trace). The most usual cause: the binary that actually produced the
core
does not match the binary you gave to GDB. A less common cause: your core dump was truncated (perhaps due toulimit -c
being set too low).如果所有其他方法都失败,请使用调试器运行命令!
只需以正常启动命令的形式输入“gdb”,然后输入“c”ontinue 即可运行该进程。当任务出现段错误时,它将返回到交互式 gdb 提示符,而不是核心转储。然后,您应该能够获得更有意义的堆栈跟踪等。
另一个选择是使用“truss”(如果可用)。这将告诉您异常结束时正在使用哪些系统调用。
If all else fails run the command using the debugger!
Just put "gdb" in form of your normal start command and enter "c"ontinue to get the process running. When the task segfaults it will return to the interactive gdb prompt rather than core dump. You should then be able to get more meaningful stack traces etc.
Another option is to use "truss" if it is available. This will tell you which system calls were being used at the time of the abend.