当前位置：文江博客话题详情

当 gdb 堆栈跟踪充满“??”时，如何调试分段错误？

发布于 2024-08-24 10:26:34 字数 102 浏览 8 评论 0原文

我的可执行文件包含符号表。但堆栈跟踪似乎被覆盖了。

请问如何从该核心获取更多信息？例如，有没有办法检查堆？查看填充堆的对象实例以获得一些线索。无论如何，任何想法都会受到赞赏。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瑶笙 2024-08-31 10:26:34

我是一名 C++ 程序员，我遇到这个问题的次数比我愿意承认的还要多。您的应用程序正在破坏堆栈的很大一部分。有可能破坏堆栈的函数在返回时也会崩溃。原因是因为返回地址已被覆盖，这就是GDB的堆栈跟踪混乱的原因。

这就是我调试此问题的方法：

1）单步执行应用程序，直到它崩溃。（查找返回时崩溃的函数）。

2）一旦你确定了函数，就在函数的第一行声明一个变量：（

int canary=0;

它必须是第一行的原因是这个值必须位于堆栈的最顶部。这个“金丝雀”将在函数的返回地址之前被覆盖。）

3）在 canary 上放置一个变量监视，单步执行函数，当 canary!=0 时，您就发现了缓冲区溢出！另一种可能性是当 canary!=0 时放置一个变量断点，然后正常运行程序，这更容易一些，但并非所有 IDE 都支持变量断点。

编辑：我和我办公室的一位高级程序员交谈过，为了理解核心转储，您需要解析它所具有的内存地址。找出这些地址的一种方法是查看二进制文件的 MAP 文件，该文件是人类可读的。下面是一个使用 gcc 生成 MAP 文件的示例：

gcc -o foo -Wl,-Map,foo.map foo.c

这是拼图的一部分，但仍然很难获得崩溃函数的地址。如果您在现代平台上运行此应用程序，那么 ASLR 可能会使核心转储中的地址变得无用。 ASLR 的某些实现会随机化二进制文件的函数地址，这使得核心转储绝对毫无价值。

I am a C++ programmer for a living and I have encountered this issue more times than i like to admit. Your application is smashing HUGE part of the stack. Chances are the function that is corrupting the stack is also crashing on return. The reason why is because the return address has been overwritten, and this is why GDB's stack trace is messed up.

This is how I debug this issue:

1)Step though the application until it crashes. (Look for a function that is crashing on return).

2)Once you have identified the function, declare a variable at the VERY FIRST LINE of the function:

int canary=0;

(The reason why it must be the first line is that this value must be at the very top of the stack. This "canary" will be overwritten before the function's return address.)

3) Put a variable watch on canary, step though the function and when canary!=0, then you have found your buffer overflow! Another possibility it to put a variable breakpoint for when canary!=0 and just run the program normally, this is a little easier but not all IDE's support variable breakpoints.

EDIT: I have talked to a senior programmer at my office and in order to understand the core dump you need to resolve the memory addresses it has. One way to figure out these addresses is to look at the MAP file for the binary, which is human readable. Here is an example of generating a MAP file using gcc:

gcc -o foo -Wl,-Map,foo.map foo.c

This is a piece of the puzzle, but it will still be very difficult to obtain the address of function that is crashing. If you are running this application on a modern platform then ASLR will probably make the addresses in the core dump useless. Some implementation of ASLR will randomize the function addresses of your binary which makes the core dump absolutely worthless.

回复收藏 0 原文

深海不蓝 2024-08-31 10:26:34

你必须使用一些调试器来检测，valgrind 是可以的，
当你编译你的代码时确保你添加 -Wall 选项，它使编译器会告诉你是否有一些错误（确保你的代码中有任何警告））。

例如：gcc -Wall -g -c -o oke.o oke.c
3. 确保您还有 -g 选项来生成调试信息。您可以使用一些宏来调用调试信息。以下宏对我来说非常有用：

__LINE__ ：告诉您行

__FILE__ ：告诉您源文件

__func__ ：告诉您函数

使用我认为调试器还不够，你应该习惯最大化编译器的能力。

希望这会有所帮助

回复收藏 0 原文

昔日梦未散 2024-08-31 10:26:34

TL;DR：函数中非常大的局部变量声明是在堆栈上分配的，在某些平台和编译器组合上，可能会溢出并损坏堆栈。

只是为了添加此问题的另一个潜在原因。我最近正在调试一个非常相似的问题。使用应用程序和核心文件运行 gdb 会产生如下结果：

Core was generated by `myExecutable myArguments'.
Program terminated with signal 6, Aborted.
#0  0x00002b075174ba45 in ?? ()
(gdb)

这是极其无益且令人失望的。经过几个小时的互联网搜索后，我找到了一个论坛，其中讨论了我们使用的特定编译器（英特尔编译器）的默认堆栈大小如何比其他编译器更小，并且大的局部变量可能会溢出并损坏堆栈。看看我们的代码，我找到了罪魁祸首：

void MyClass::MyMethod {
   ...
   char charBuffer[MAX_BUFFER_SIZE];
   ...

}

宾果！我发现 MAX_BUFFER_SIZE 设置为 10000000，因此在堆栈上分配了 10MB 局部变量！在更改实现以使用共享指针并动态创建缓冲区后，突然程序开始完美运行。

TL;DR: extremely large local variable declarations in functions are allocated on the stack, which, on certain platform and compiler combinations, can overrun and corrupt the stack.

Just to add another potential cause to this issue. I was recently debugging a very similar issue. Running gdb with the application and core file would produce results such as:

Core was generated by `myExecutable myArguments'.
Program terminated with signal 6, Aborted.
#0  0x00002b075174ba45 in ?? ()
(gdb)

That was extremely unhelpful and disappointing. After hours of scouring the internet, I found a forum that talked about how the particular compiler we were using (Intel compiler) had a smaller default stack size than other compilers, and that large local variables could overrun and corrupt the stack. Looking at our code, I found the culprit:

void MyClass::MyMethod {
   ...
   char charBuffer[MAX_BUFFER_SIZE];
   ...

}

Bingo! I found MAX_BUFFER_SIZE was set to 10000000, thus a 10MB local variable was being allocated on the stack! After changing the implementation to use a shared_ptr and create the buffer dynamically, suddenly the program started working perfectly.

回复收藏 0 原文

倾城花音 2024-08-31 10:26:34

尝试使用 Valgrind 内存调试器运行。

回复收藏 0 原文

杯别 2024-08-31 10:26:34

为了确认，您的可执行文件是否在发布模式下编译，即没有调试符号......这可以解释为什么有？尝试使用 -g 开关重新编译，该开关“包括调试信息并将其嵌入到可执行文件中”。除此之外，我不知道为什么你有“??”...

回复收藏 0 原文

如痴如狂 2024-08-31 10:26:34

并不真地。当然，你可以在记忆中挖掘并观察事物。但是如果没有堆栈跟踪，您将不知道如何到达当前位置或参数值是什么。

然而，堆栈已损坏的事实告诉您需要查找写入堆栈的代码。

覆盖堆栈数组。这可以通过明显的方式来完成，或者通过使用错误大小的参数或错误类型的指针来调用函数或系统调用。
在函数返回后使用对该函数的局部堆栈变量的指针或引用。
将指向堆栈值的指针转换为错误大小的指针并使用它。

如果您有 Unix 系统，“valgrind”是查找其中一些问题的好工具。

回复收藏 0 原文

榆西 2024-08-31 10:26:34

我假设既然你说“我的可执行文件包含符号表”，你用 -g 编译和链接，并且你的二进制文件没有被删除。

我们只能确认这一点：
strings -a |grep function_name_you_know_should_exist

还可以尝试在核心上使用 pstack，看看它是否可以更好地获取调用堆栈。在这种情况下，听起来你的 gdb 与你的 gcc/g++ 版本相比已经过时了。

回复收藏 0 原文

青春有你 2024-08-31 10:26:34

听起来你在机器上使用的 glibc 版本与生产崩溃时的 corefile 版本不同。获取“ldd ./appname”输出的文件并将它们加载到您的计算机上，然后告诉 gdb 去哪里查找；

set solib-absolute-prefix /path/to/libs

Sounds like you're not using the identical glibc version on your machine as the corefile was when it crashed on production. Get the files output by "ldd ./appname" and load them onto your machine, then tell gdb where to look;