如何读取段错误内核日志消息
这可能是一个非常简单的问题,我正在尝试调试一个在 kern.log
kernel: myapp[15514]: segfault at 794ef0 ip 080513b 中生成以下段错误的应用程序sp 794ef0 error 6 in myapp[8048000+24000]
这是我的问题:
是否有任何文档说明 segfault 上的差异错误号是什么,在本例中是错误 6,但我已经看到错误 4, 5
信息
at bf794ef0 ip 0805130b sp bf794ef0 and myapp[8048000+24000]
?
到目前为止,我能够使用符号进行编译,当我执行 x 0x8048000+24000
时,它会返回一个符号,这是正确的方法吗?到目前为止我的假设如下:
- sp = 堆栈指针?
- ip = 指令指针
- at = ????
- myapp[8048000+24000] = 符号地址?
This can be a very simple question, I'm am attempting to debug an application which generates the following segfault error in the kern.log
kernel: myapp[15514]: segfault at 794ef0 ip 080513b sp 794ef0 error 6 in myapp[8048000+24000]
Here are my questions:
Is there any documentation as to what are the diff error numbers on segfault, in this instance it is error 6, but i've seen error 4, 5
What is the meaning of the information
at bf794ef0 ip 0805130b sp bf794ef0 and myapp[8048000+24000]
?
So far i was able to compile with symbols, and when i do a x 0x8048000+24000
it returns a symbol, is that the correct way of doing it? My assumptions thus far are the following:
- sp = stack pointer?
- ip = instruction pointer
- at = ????
- myapp[8048000+24000] = address of symbol?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
当报告指向程序而不是共享库时,
运行addr2line -e myapp 080513b(并对给定的其他指令指针值重复)以查看错误发生的位置。更好的是,获得一个调试检测构建,并在 gdb 等调试器下重现问题。
如果它是共享库
在
libfoo.so[NNNNNN+YYYY]
部分中,NNNNNN
是加载库的位置。从指令指针 (ip
) 中减去该值,您将获得违规指令的.so
中的偏移量。然后您可以使用 objdump -DCgl libfoo.so 并在该偏移量处搜索指令。您应该能够轻松地从 asm 标签中找出它是哪个函数。如果.so
没有优化,您还可以尝试使用addr2line -e libfoo.so
。错误的含义
如下是字段的细分:
address
- 代码尝试访问的内存中的位置(可能是10
和11
> 是距我们希望设置为有效值的指针的偏移量,但该指针指向0
)ip
- 指令指针,即。尝试执行此操作的代码所在的位置sp
- 堆栈指针error
- 特定于体系结构的标志;请参阅适用于您的平台的arch/*/mm/fault.c
。When the report points to a program, not a shared library
Run
addr2line -e myapp 080513b
(and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.If it's a shared library
In the
libfoo.so[NNNNNN+YYYY]
part, theNNNNNN
is where the library was loaded. Subtract this from the instruction pointer (ip
) and you'll get the offset into the.so
of the offending instruction. Then you can useobjdump -DCgl libfoo.so
and search for the instruction at that offset. You should easily be able to figure out which function it is from the asm labels. If the.so
doesn't have optimizations you can also try usingaddr2line -e libfoo.so <offset>
.What the error means
Here's the breakdown of the fields:
address
- the location in memory the code is trying to access (it's likely that10
and11
are offsets from a pointer we expect to be set to a valid value but which is instead pointing to0
)ip
- instruction pointer, ie. where the code which is trying to do this livessp
- stack pointererror
- Architecture-specific flags; seearch/*/mm/fault.c
for your platform.根据我有限的知识,你的假设是正确的。
sp
= 堆栈指针ip
= 指令指针myapp[8048000+24000]
= 地址如果我正在调试问题,我会修改代码以生成核心转储或记录 崩溃时的堆栈回溯。您还可以在(或附加)GDB 下运行该程序。
错误代码只是页面错误的体系结构错误代码,并且似乎是特定于体系结构的。它们通常记录在内核源代码的
arch/*/mm/fault.c
中。我的Linux/arch/i386/mm/fault.c
副本对 error_code 具有以下定义:我的
Linux/arch/x86_64/mm/fault.c
副本添加了以下内容:Based on my limited knowledge, your assumptions are correct.
sp
= stack pointerip
= instruction pointermyapp[8048000+24000]
= addressIf I were debugging the problem I would modify the code to produce a core dump or log a stack backtrace on the crash. You might also run the program under (or attach) GDB.
The error code is just the architectural error code for page faults and seems to be architecture specific. They are often documented in
arch/*/mm/fault.c
in the kernel source. My copy ofLinux/arch/i386/mm/fault.c
has the following definition for error_code:My copy of
Linux/arch/x86_64/mm/fault.c
adds the following:好吧,仍然有可能不是从二进制文件而是从对象中检索信息。但您需要对象的基地址。这些信息仍然位于 coredump 中的 link_map 结构中。
因此首先要将 struct link_map 导入到 GDB 中。因此,让我们用它和调试符号编译一个程序并将其添加到 GDB 中。
link.c
get_baseaddr_from_coredump.sh
它将在一组 GDB 命令中打印整个 link_map 内容。
它本身可能看起来不太好,但是通过我们所涉及的共享对象的base_addr,您可以通过直接调试另一个 GDB 实例中所涉及的共享对象来从地址中获得更多信息。
保留第一个 gdb 以了解该符号。
注意:该脚本相当不完整,我怀疑您可能添加到 add-symbol-file 的第二个参数打印了具有此值的总和:
其中 $SO_PATH 是第一个参数添加符号文件的
希望有帮助
Well, there is still a possibility to retrieve the information, not from the binary, but from the object. But you need the base address of the object. And this information still is within the coredump, in the link_map structure.
So first you want to import the struct link_map into GDB. So lets compile a program with it with debug symbol and add it to the GDB.
link.c
get_baseaddr_from_coredump.sh
it will print you the whole link_map content, within a set of GDB command.
It itself it might seems unnesseray but with the base_addr of the shared object we are about, you might get some more information out of an address by debuging directly the involved shared object in another GDB instance.
Keep the first gdb to have an idee of the symbol.
NOTE : the script is rather incomplete i suspect you may add to the second parameter of add-symbol-file printed the sum with this value :
where $SO_PATH is the first argument of the add-symbol-file
Hope it helps