如何读取段错误内核日志消息

发布于 2024-08-20 09:12:28 字数 581 浏览 11 评论 0原文

这可能是一个非常简单的问题,我正在尝试调试一个在 kern.log

kernel: myapp[15514]: segfault at 794ef0 ip 080513b 中生成以下段错误的应用程序sp 794ef0 error 6 in myapp[8048000+24000]

这是我的问题:

  1. 是否有任何文档说明 segfault 上的差异错误号是什么,在本例中是错误 6,但我已经看到错误 4, 5

  2. 信息at bf794ef0 ip 0805130b sp bf794ef0 and myapp[8048000+24000]

到目前为止,我能够使用符号进行编译,当我执行 x 0x8048000+24000 时,它会返回一个符号,这是正确的方法吗?到目前为止我的假设如下:

  • sp = 堆栈指针?
  • ip = 指令指针
  • at = ????
  • myapp[8048000+24000] = 符号地址?

This can be a very simple question, I'm am attempting to debug an application which generates the following segfault error in the kern.log

kernel: myapp[15514]: segfault at 794ef0 ip 080513b sp 794ef0 error 6 in myapp[8048000+24000]

Here are my questions:

  1. Is there any documentation as to what are the diff error numbers on segfault, in this instance it is error 6, but i've seen error 4, 5

  2. What is the meaning of the information at bf794ef0 ip 0805130b sp bf794ef0 and myapp[8048000+24000]?

So far i was able to compile with symbols, and when i do a x 0x8048000+24000 it returns a symbol, is that the correct way of doing it? My assumptions thus far are the following:

  • sp = stack pointer?
  • ip = instruction pointer
  • at = ????
  • myapp[8048000+24000] = address of symbol?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

用心笑 2024-08-27 09:12:28

当报告指向程序而不是共享库时,

运行addr2line -e myapp 080513b(并对给定的其他指令指针值重复)以查看错误发生的位置。更好的是,获得一个调试检测构建,并在 gdb 等调试器下重现问题。

如果它是共享库

libfoo.so[NNNNNN+YYYY] 部分中,NNNNNN 是加载库的位置。从指令指针 (ip) 中减去该值,您将获得违规指令的 .so 中的偏移量。然后您可以使用 objdump -DCgl libfoo.so 并在该偏移量处搜索指令。您应该能够轻松地从 asm 标签中找出它是哪个函数。如果 .so 没有优化,您还可以尝试使用 addr2line -e libfoo.so

错误的含义

如下是字段的细分:

  • address - 代码尝试访问的内存中的位置(可能是 1011 > 是距我们希望设置为有效值的指针的偏移量,但该指针指向 0)
  • ip - 指令指针,即。尝试执行此操作的代码所在的位置
  • sp - 堆栈指针
  • error - 特定于体系结构的标志;请参阅适用于您的平台的 arch/*/mm/fault.c

When the report points to a program, not a shared library

Run addr2line -e myapp 080513b (and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.

If it's a shared library

In the libfoo.so[NNNNNN+YYYY] part, the NNNNNN is where the library was loaded. Subtract this from the instruction pointer (ip) and you'll get the offset into the .so of the offending instruction. Then you can use objdump -DCgl libfoo.so and search for the instruction at that offset. You should easily be able to figure out which function it is from the asm labels. If the .so doesn't have optimizations you can also try using addr2line -e libfoo.so <offset>.

What the error means

Here's the breakdown of the fields:

  • address - the location in memory the code is trying to access (it's likely that 10 and 11 are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0)
  • ip - instruction pointer, ie. where the code which is trying to do this lives
  • sp - stack pointer
  • error - Architecture-specific flags; see arch/*/mm/fault.c for your platform.
梅窗月明清似水 2024-08-27 09:12:28

根据我有限的知识,你的假设是正确的。

  • sp = 堆栈指针
  • ip = 指令指针
  • myapp[8048000+24000] = 地址

如果我正在调试问题,我会修改代码以生成核心转储或记录 崩溃时的堆栈回溯。您还可以在(或附加)GDB 下运行该程序。

错误代码只是页面错误的体系结构错误代码,并且似乎是特定于体系结构的。它们通常记录在内核源代码的 arch/*/mm/fault.c 中。我的 Linux/arch/i386/mm/fault.c 副本对 error_code 具有以下定义:

  • bit 0 == 0 表示未找到页面,1 表示保护故障
  • bit 1 == 0 表示已读取, 1 表示写入
  • 位 2 == 0 表示内核,1 表示用户模式

​​我的 Linux/arch/x86_64/mm/fault.c 副本添加了以下内容:

  • 位 3 == 1 表示故障是取指令

Based on my limited knowledge, your assumptions are correct.

  • sp = stack pointer
  • ip = instruction pointer
  • myapp[8048000+24000] = address

If I were debugging the problem I would modify the code to produce a core dump or log a stack backtrace on the crash. You might also run the program under (or attach) GDB.

The error code is just the architectural error code for page faults and seems to be architecture specific. They are often documented in arch/*/mm/fault.c in the kernel source. My copy of Linux/arch/i386/mm/fault.c has the following definition for error_code:

  • bit 0 == 0 means no page found, 1 means protection fault
  • bit 1 == 0 means read, 1 means write
  • bit 2 == 0 means kernel, 1 means user-mode

My copy of Linux/arch/x86_64/mm/fault.c adds the following:

  • bit 3 == 1 means fault was an instruction fetch
葬心 2024-08-27 09:12:28

如果是共享库

不幸的是,你已经被浇灭了;不可能知道在哪里
动态链接器事后将库放置在内存中

好吧,仍然有可能不是从二进制文件而是从对象中检索信息。但您需要对象的基地址。这些信息仍然位于 coredump 中的 link_map 结构中。

因此首先要将 struct link_map 导入到 GDB 中。因此,让我们用它和调试符号编译一个程序并将其添加到 GDB 中。

link.c

#include <link.h>
toto(){struct link_map * s = 0x400;}

get_baseaddr_from_coredump.sh

#!/bin/bash

BINARY=$(which myapplication)

IsBinPIE ()
{
    readelf -h $1|grep 'Type' |grep "EXEC">/dev/null || return 0
    return 1
}

Hex2Decimal ()
{
    export number="`echo "$1" | sed -e 's:^0[xX]::' | tr '[a-f]' '[A-F]'`"
    export number=`echo "ibase=16; $number" | bc`
}

GetBinaryLength ()
{
    if [ $# != 1 ]; then
    echo "Error, no argument provided"
    fi
    IsBinPIE $1 || (echo "ET_EXEC file, need a base_address"; exit 0)
    export totalsize=0
    # Get PT_LOAD's size segment out of Program Header Table (ELF format)
    export sizes="$(readelf -l $1 |grep LOAD |awk '{print $6}'|tr '\n' ' ')"
    for size in $sizes
    do Hex2Decimal "$size"; export totalsize=$(expr $number + $totalsize); export totalsize=$(expr $number + $totalsize)
    done
    return $totalsize
}

if [ $# = 1 ]; then
    echo "Using binary $1"
    IsBinPIE $1 && (echo "NOT ET_EXEC, need a base_address..."; exit 0)
    BINARY=$1
fi

gcc -g3 -fPIC -shared link.c -o link.so

GOTADDR=$(readelf -S $BINARY|grep -E '\.got.plt[ \t]'|awk '{print $4}')

echo "First do the following command :"
echo file $BINARY
echo add-symbol-file ./link.so 0x0
read
echo "Now copy/paste the following into your gdb session with attached coredump"
cat <<EOF
set \$linkmapaddr = *(0x$GOTADDR + 4)
set \$mylinkmap = (struct link_map *) \$linkmapaddr
while (\$mylinkmap != 0)
if (\$mylinkmap->l_addr)
printf "add-symbol-file .%s %#.08x\n", \$mylinkmap->l_name, \$mylinkmap->l_addr
end
set \$mylinkmap = \$mylinkmap->l_next
end

它将在一组 GDB 命令中打印整个 link_map 内容。

它本身可能看起来不太好,但是通过我们所涉及的共享对象的base_addr,您可以通过直接调试另一个 GDB 实例中所涉及的共享对象来从地址中获得更多信息。
保留第一个 gdb 以了解该符号。

注意:该脚本相当不完整,我怀疑您可能添加到 add-symbol-file 的第二个参数打印了具有此值的总和:

readelf -S $SO_PATH|grep -E '\.text[ \t]'|awk '{print $5}'

其中 $SO_PATH 是第一个参数添加符号文件的

希望有帮助

If it's a shared library

You're hosed, unfortunately; it's not possible to know where the
libraries were placed in memory by the dynamic linker after-the-fact
.

Well, there is still a possibility to retrieve the information, not from the binary, but from the object. But you need the base address of the object. And this information still is within the coredump, in the link_map structure.

So first you want to import the struct link_map into GDB. So lets compile a program with it with debug symbol and add it to the GDB.

link.c

#include <link.h>
toto(){struct link_map * s = 0x400;}

get_baseaddr_from_coredump.sh

#!/bin/bash

BINARY=$(which myapplication)

IsBinPIE ()
{
    readelf -h $1|grep 'Type' |grep "EXEC">/dev/null || return 0
    return 1
}

Hex2Decimal ()
{
    export number="`echo "$1" | sed -e 's:^0[xX]::' | tr '[a-f]' '[A-F]'`"
    export number=`echo "ibase=16; $number" | bc`
}

GetBinaryLength ()
{
    if [ $# != 1 ]; then
    echo "Error, no argument provided"
    fi
    IsBinPIE $1 || (echo "ET_EXEC file, need a base_address"; exit 0)
    export totalsize=0
    # Get PT_LOAD's size segment out of Program Header Table (ELF format)
    export sizes="$(readelf -l $1 |grep LOAD |awk '{print $6}'|tr '\n' ' ')"
    for size in $sizes
    do Hex2Decimal "$size"; export totalsize=$(expr $number + $totalsize); export totalsize=$(expr $number + $totalsize)
    done
    return $totalsize
}

if [ $# = 1 ]; then
    echo "Using binary $1"
    IsBinPIE $1 && (echo "NOT ET_EXEC, need a base_address..."; exit 0)
    BINARY=$1
fi

gcc -g3 -fPIC -shared link.c -o link.so

GOTADDR=$(readelf -S $BINARY|grep -E '\.got.plt[ \t]'|awk '{print $4}')

echo "First do the following command :"
echo file $BINARY
echo add-symbol-file ./link.so 0x0
read
echo "Now copy/paste the following into your gdb session with attached coredump"
cat <<EOF
set \$linkmapaddr = *(0x$GOTADDR + 4)
set \$mylinkmap = (struct link_map *) \$linkmapaddr
while (\$mylinkmap != 0)
if (\$mylinkmap->l_addr)
printf "add-symbol-file .%s %#.08x\n", \$mylinkmap->l_name, \$mylinkmap->l_addr
end
set \$mylinkmap = \$mylinkmap->l_next
end

it will print you the whole link_map content, within a set of GDB command.

It itself it might seems unnesseray but with the base_addr of the shared object we are about, you might get some more information out of an address by debuging directly the involved shared object in another GDB instance.
Keep the first gdb to have an idee of the symbol.

NOTE : the script is rather incomplete i suspect you may add to the second parameter of add-symbol-file printed the sum with this value :

readelf -S $SO_PATH|grep -E '\.text[ \t]'|awk '{print $5}'

where $SO_PATH is the first argument of the add-symbol-file

Hope it helps

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文