如何获取调用堆栈回溯? (深度嵌入,无库支持)

发布于 2024-09-12 20:25:23 字数 1476 浏览 5 评论 0原文

我希望我的异常处理程序和调试函数能够打印调用堆栈回溯,基本上就像 glibc 中的 backtrace() 库函数一样。不幸的是,我的 C 库(Newlib)不提供这样的调用。

我有这样的东西:

#include <unwind.h> // GCC's internal unwinder, part of libgcc
_Unwind_Reason_Code trace_fcn(_Unwind_Context *ctx, void *d)
{
    int *depth = (int*)d;
    printf("\t#%d: program counter at %08x\n", *depth, _Unwind_GetIP(ctx));
    (*depth)++;
    return _URC_NO_REASON;
}

void print_backtrace_here()
{
    int depth = 0;
    _Unwind_Backtrace(&trace_fcn, &depth);
}

它基本上可以工作,但生成的跟踪并不总是完整的。例如,如果我执行

int func3() { print_backtrace_here(); return 0; }
int func2() { return func3(); }
int func1() { return func2(); }
int main()  { return func1(); }

回溯,则仅显示 func3() 和 main()。 (这显然是一个玩具示例,但我已经检查了反汇编并确认这些函数都完整地存在,并且没有优化或内联。)

更新:我在旧版本上尝试了这个回溯代码ARM7 系统,但具有相同(或至少尽可能等效)的编译器选项和链接器脚本,并且它打印正确的完整回溯(即 func1 和 func2 不丢失),实际上它甚至回溯过去的 main 进入引导初始化代码。因此,问题可能不在于链接器脚本或编译器选项。 (此外,从反汇编中确认,在此 ARM7 测试中也没有使用帧指针)。

该代码是使用 -fomit-frame-pointer 编译的,但我的平台(裸机 ARM Cortex M3)定义了一个不使用帧指针的 ABI。 (该系统的先前版本在 ARM7 上使用旧的 APCS ABI,具有强制堆栈帧和帧指针,以及像 这里,效果很好)。

整个系统使用 -fexception 进行编译,这确保 _Unwind 使用的必要元数据包含在 ELF 文件中。 (我认为 _Unwind 是为异常处理而设计的)。

所以,我的问题是: 是否有一种“标准”、公认的方法可以使用 GCC 在嵌入式系统中获取可靠的回溯?

我不介意在必要时使用链接器脚本和 crt0 代码,但不希望必须为工具链本身创造任何机会。

谢谢!

I want my exception handlers and debug functions to be able to print call stack backtraces, basically just like the backtrace() library function in glibc. Unfortunately, my C library (Newlib) doesn't provide such a call.

I've got something like this:

#include <unwind.h> // GCC's internal unwinder, part of libgcc
_Unwind_Reason_Code trace_fcn(_Unwind_Context *ctx, void *d)
{
    int *depth = (int*)d;
    printf("\t#%d: program counter at %08x\n", *depth, _Unwind_GetIP(ctx));
    (*depth)++;
    return _URC_NO_REASON;
}

void print_backtrace_here()
{
    int depth = 0;
    _Unwind_Backtrace(&trace_fcn, &depth);
}

which basically works but the resulting traces aren't always complete. For example, if I do

int func3() { print_backtrace_here(); return 0; }
int func2() { return func3(); }
int func1() { return func2(); }
int main()  { return func1(); }

the backtrace only shows func3() and main(). (This is obv. a toy example, but I have checked the disassembly and confirmed that these functions are all here in full and not optimized out or inlined.)

Update: I tried this backtrace code on the old ARM7 system but with the same (or at least, as equivalent as possible) compiler options and linker script and it prints a correct, full backtrace (i.e. func1 and func2 aren't missing) and indeed it even backtraces up past main into the boot initialization code. So presumably the problem isn't with the linker script or compiler options. (Also, confirmed from disassembly that no frame pointer is used in this ARM7 test either).

The code is compiled with -fomit-frame-pointer, but my platform (bare metal ARM Cortex M3) defines an ABI that does not use a frame pointer anyway. (A previous version of this system used the old APCS ABI on ARM7 with forced stack frames and frame pointer, and an backtrace like the one here, which worked perfectly).

The whole system is compiled with -fexception, which ensures the necessary metadata that _Unwind uses is included in the ELF file. (_Unwind is designed for exception handling I think).

So, my question is:
Is there a "standard", accepted way of getting reliable backtraces in embedded systems using GCC?

I don't mind having to mess around with the linker scripts and crt0 code if necessary, but don't want to have to make any chances to the toolchain itself.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

月下客 2024-09-19 20:25:23

为此,您需要 -funwind-tables-fasynchronous-unwind-tables
在某些目标中,这是为了 _Unwind_Backtrace 正常工作所必需的!

For this you need -funwind-tables or -fasynchronous-unwind-tables
In some targets this is required in order for _Unwind_Backtrace work properly!

你是暖光i 2024-09-19 20:25:23

由于 ARM 平台不使用帧指针,因此您永远不知道堆栈帧有多大,也不能简单地将堆栈推出到 R14 中的单个返回值之外。

当调查我们没有调试符号的崩溃时,我们只需转储整个堆栈并查找与指令范围中的每个项目最接近的符号。它确实会产生大量误报,但对于调查崩溃仍然非常有用。

如果您运行纯 ELF 可执行文件,则可以将调试符号从发布可执行文件中分离出来。然后 gdb 可以帮助您从标准 unix 核心转储中找出发生了什么

Since ARM platforms do not use a frame pointer, you never quite know how big the stackframe is and cannot simply roll out the stack beyond the single return value in R14.

When investigating a crash for which we do not have debug symbols, we simply dump the whole stack and lookup the closest symbol to each item in the instruction range. It does generate a load of false positives but can still be very useful for investigating crashes.

If you are running pure ELF executables, you can separate debug symbols out of your release executable. gdb can then help you find out what is going on from your standard unix core dump

浮生面具三千个 2024-09-19 20:25:23

gcc 确实返回优化。在 func1() 和 func2() 中,它不会调用 func2()/func3() - 相反,它会跳转到 func2()/func3(),因此 func3() 可以立即返回到 main()。

在您的情况下, func1() 和 func2() 不需要设置堆栈帧,但如果它们这样做(例如对于局部变量),如果函数调用是最后一条指令,gcc 仍然可以进行优化 - 然后它会清理在跳转到 func3() 之前先向上堆栈。

查看生成的汇编代码即可看到它。


编辑/更新:

要验证这是否是原因,请在函数调用后执行一些编译器无法重新排序的操作(例如使用返回值)。
或者尝试使用 -O0 进行编译。

gcc does return optimization. In func1() and func2() it does not call func2()/func3() - instead of this, it jumps to func2()/func3(), so func3() can return immediately to main().

In your case, func1() and func2() do not need to setup a stack frame, but if they would do (e.g. for local variables), gcc still can do the optimization if the function call is the last instruction - it then cleans up the stack before the jump to func3().

Have a look at the generated assembler code to see it.


Edit/Update:

To verify that this is the reason, do something after the function call, that cannot be reordered by the compiler (e.g. using a return value).
Or just try compiling with -O0.

束缚m 2024-09-19 20:25:23

有些编译器(例如 GCC)会优化函数调用,就像您在示例中提到的那样。对于代码片段的操作,不需要在调用链中存储中间返回指针。从 func3() 返回到 main() 是完全可以的,因为中间函数除了调用另一个函数之外不会做任何额外的事情。

它与代码消除不同(实际上中间函数可以完全优化),并且单独的编译器参数可以控制这种优化。

如果您使用 GCC,请尝试 -fno-optimize-sibling-calls

另一个方便的 GCC 选项是 -mno-sched-prolog,它可以防止函数序言中的指令重新排序,这至关重要,如果您想逐字节解析代码,就像这里所做的那样:
http://www.kegel.com/stackcheck/checkstack-pl.txt

Some compilers, like GCC optimize function calls like you mentioned in the example. For the operation of the code fragment, it is not needed to store the intermediate return pointers in the call chain. It's perfectly OK to return from func3() to main(), as the intermediate functions don't do anything extra besides calling another function.

It's not the same as code elimination (actually the intermediate functions could be completely optimized out), and a separate compiler parameter might control this kind of optimisation.

If you use GCC, try -fno-optimize-sibling-calls

Another handy GCC option is -mno-sched-prolog, which prevents instruction reordering in the function prologue, which is vital, if you want to parse the code byte-by-byte, like it is done here:
http://www.kegel.com/stackcheck/checkstack-pl.txt

夜唯美灬不弃 2024-09-19 20:25:23

这很 hacky,但考虑到所需的代码/RAM 空间量,我发现它足够好:

假设您使用 ARM THUMB 模式,请使用以下选项进行编译:

-mtpcs-frame -mtpcs-leaf-frame  -fno-omit-frame-pointer

以下函数用于检索调用堆栈。请参阅评论以获取更多信息:

/*
 * This should be compiled with:
 *  -mtpcs-frame -mtpcs-leaf-frame  -fno-omit-frame-pointer
 *
 *  With these options, the Stack pointer is automatically pushed to the stack
 *  at the beginning of each function.
 *
 *  This function basically iterates through the current stack finding the following combination of values:
 *  - <Frame Address>
 *  - <Link Address>
 *
 *  This combination will occur for each function in the call stack
 */
static void backtrace(uint32_t *caller_list, const uint32_t *caller_list_end, const uint32_t *stack_pointer)
{
    uint32_t previous_frame_address = (uint32_t)stack_pointer;
    uint32_t stack_entry_counter = 0;

    // be sure to clear the caller_list buffer
    memset(caller_list, 0, caller_list_end-caller_list);

    // loop until the buffer is full
    while(caller_list < caller_list_end)
    {
        // Attempt to obtain next stack pointer
        // The link address should come immediately after
        const uint32_t possible_frame_address = *stack_pointer;
        const uint32_t possible_link_address = *(stack_pointer+1);

        // Have we searched past the allowable size of a given stack?
        if(stack_entry_counter > PLATFORM_MAX_STACK_SIZE/4)
        {
            // yes, so just quite
            break;
        }
        // Next check that the frame addresss (i.e. stack pointer for the function)
        // and Link address are within an acceptable range
        else if((possible_frame_address > previous_frame_address) &&
                ((possible_frame_address < previous_frame_address + PLATFORM_MAX_STACK_SIZE)) &&
               ((possible_link_address  & 0x01) != 0) && // in THUMB mode the address will be odd
                (possible_link_address > PLATFORM_CODE_SPACE_START_ADDRESS &&
                 possible_link_address < PLATFORM_CODE_SPACE_END_ADDRESS))
        {
            // We found two acceptable values

            // Store the link address
            *caller_list++ = possible_link_address;

            // Update the book-keeping registers for the next search
            previous_frame_address = possible_frame_address;
            stack_pointer = (uint32_t*)(possible_frame_address + 4);
            stack_entry_counter = 0;
        }
        else
        {
            // Keep iterating through the stack until be find an acceptable combination
            ++stack_pointer;
            ++stack_entry_counter;
        }
    }

}

您需要为您的平台更新#defines。

然后调用以下命令以使用当前调用堆栈填充缓冲区:

uint32_t callers[8];
uint32_t sp_reg;
__ASM volatile ("mov %0, sp" : "=r" (sp_reg) );
backtrace(callers, &callers[8], (uint32_t*)sp_reg);

同样,这相当hacky,但我发现它工作得很好。
缓冲区将填充调用堆栈中每个函数调用的链接地址。

This is hacky, but I've found it works good enough considering the amount of code/RAM space required:

Assuming you're using ARM THUMB mode, compile with the following options:

-mtpcs-frame -mtpcs-leaf-frame  -fno-omit-frame-pointer

The following function is used to retrieve the callstack. Refer to the comments for more info:

/*
 * This should be compiled with:
 *  -mtpcs-frame -mtpcs-leaf-frame  -fno-omit-frame-pointer
 *
 *  With these options, the Stack pointer is automatically pushed to the stack
 *  at the beginning of each function.
 *
 *  This function basically iterates through the current stack finding the following combination of values:
 *  - <Frame Address>
 *  - <Link Address>
 *
 *  This combination will occur for each function in the call stack
 */
static void backtrace(uint32_t *caller_list, const uint32_t *caller_list_end, const uint32_t *stack_pointer)
{
    uint32_t previous_frame_address = (uint32_t)stack_pointer;
    uint32_t stack_entry_counter = 0;

    // be sure to clear the caller_list buffer
    memset(caller_list, 0, caller_list_end-caller_list);

    // loop until the buffer is full
    while(caller_list < caller_list_end)
    {
        // Attempt to obtain next stack pointer
        // The link address should come immediately after
        const uint32_t possible_frame_address = *stack_pointer;
        const uint32_t possible_link_address = *(stack_pointer+1);

        // Have we searched past the allowable size of a given stack?
        if(stack_entry_counter > PLATFORM_MAX_STACK_SIZE/4)
        {
            // yes, so just quite
            break;
        }
        // Next check that the frame addresss (i.e. stack pointer for the function)
        // and Link address are within an acceptable range
        else if((possible_frame_address > previous_frame_address) &&
                ((possible_frame_address < previous_frame_address + PLATFORM_MAX_STACK_SIZE)) &&
               ((possible_link_address  & 0x01) != 0) && // in THUMB mode the address will be odd
                (possible_link_address > PLATFORM_CODE_SPACE_START_ADDRESS &&
                 possible_link_address < PLATFORM_CODE_SPACE_END_ADDRESS))
        {
            // We found two acceptable values

            // Store the link address
            *caller_list++ = possible_link_address;

            // Update the book-keeping registers for the next search
            previous_frame_address = possible_frame_address;
            stack_pointer = (uint32_t*)(possible_frame_address + 4);
            stack_entry_counter = 0;
        }
        else
        {
            // Keep iterating through the stack until be find an acceptable combination
            ++stack_pointer;
            ++stack_entry_counter;
        }
    }

}

You'll need to update #defines for your platform.

Then call the following to populate a buffer with the current call stack:

uint32_t callers[8];
uint32_t sp_reg;
__ASM volatile ("mov %0, sp" : "=r" (sp_reg) );
backtrace(callers, &callers[8], (uint32_t*)sp_reg);

Again, this is rather hacky, but I've found it to work quite well.
The buffer will be populated with link addresses of each function call in the call stack.

缱倦旧时光 2024-09-19 20:25:23

您的可执行文件是否包含使用 -g 选项编译的调试信息?我认为这是获得没有帧指针的完整堆栈跟踪所必需的。

您可能需要 -gdwarf-2 来确保它使用包含展开信息的格式。

Does your executable contain debugging information, from compiling with the -g option? I think this is required to get a full stack trace without a frame pointer.

You might need -gdwarf-2 to make sure it uses a format that includes unwind information.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文