当 Linux 在 C 程序中崩溃时获得更好的调试

发布于 2024-12-18 06:54:49 字数 1234 浏览 2 评论 0原文

我们有一个在 MIP 内核上运行的嵌入式版本的 Linux 内核。我们编写的程序运行一个特定的测试套件。在其中一项压力测试（运行约 12 小时）期间，我们遇到了段错误。这又会生成核心转储。

不幸的是，核心转储不是很有用。崩溃发生在某些动态链接的系统库中（可能是 pthread 或 glibc）。核心转储中的回溯没有帮助，因为它只显示崩溃点而没有其他调用者（我们的用户空间应用程序是使用 -g -O0 构建的，但仍然没有回溯信息）：

Cannot access memory at address 0x2aab1004
(gdb) bt
#0  0x2ab05d18 in ?? ()
warning: GDB can't find the start of the function at 0x2ab05d18.

    GDB is unable to find the start of the function at 0x2ab05d18
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
    This problem is most likely caused by an invalid program counter or
stack pointer.
    However, if you think GDB should simply search farther back
from 0x2ab05d18 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.

另一个不幸的是我们无法运行gdb/gdb服务器。 gdb/gdbserver 在 __nptl_create_event 上不断中断。看到测试创建线程、计时器并每 5 秒销毁一次，几乎不可能长时间点击继续它们。

编辑：另请注意，我们的工具链不支持 backtrace 和 backtrace_symbols。

因此：

是否有一种方法可以捕获段错误并生成更多回溯数据、堆栈指针、调用堆栈等？
有没有办法从 .so 文件中崩溃的核心转储中获取更多数据？

谢谢。

原文

We have an embedded version of Linux kernel running on a MIPs core. The Programme we have written runs a particular test suite. During one of the stress tests (runs for about 12hrs) we get a seg fault. This in turn generates a core dump.

Unfortunately the core dump is not very useful. The crash is in some system library that is dynamically linked (probably pthread or glibc). The backtrace in the core dump is not helpful because it only shows the crash point and no other callers (our user space app is built with -g -O0, but still no back trace info):

Cannot access memory at address 0x2aab1004
(gdb) bt
#0  0x2ab05d18 in ?? ()
warning: GDB can't find the start of the function at 0x2ab05d18.

    GDB is unable to find the start of the function at 0x2ab05d18
and thus can't determine the size of that function's stack frame.
This means that GDB may be unable to access that stack frame, or
the frames below it.
    This problem is most likely caused by an invalid program counter or
stack pointer.
    However, if you think GDB should simply search farther back
from 0x2ab05d18 for code which looks like the beginning of a
function, you can increase the range of the search using the `set
heuristic-fence-post' command.

Another unfortunate-ness is that we cannot run gdb/gdbserver. gdb/gdbserver keeps breaking on __nptl_create_event. Seeing that the test creates threads, timers and destroys then every 5s it is almost impossible to sit for a long time hitting continue on them.

EDIT:
Another note, backtrace and backtrace_symbols is not supported on our toolchain.

Hence:

Is there a way of trapping seg fault and generate more backtrace data, stack pointers, call stack, etc.?
Is there a way of getting more data from a core dump that crashed in a .so file?

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我很坚强 2024-12-25 06:54:49

GDB 在 0x2ab05d18 处找不到函数的开头

崩溃时该地址是什么？

执行info shared，并查明是否有包含该地址的库。

最可能导致您遇到麻烦的原因是：您在将其上传到目标之前是否运行了 strip libpthread.so.0 ？不要这样做：GDB 要求 libpthread.so.0 不能被剥离。如果您的工具链包含带有调试符号的 libpthread.so.0（因此对于目标来说太大），请在其上运行 strip -g，而不是完整的 strip。

更新：