调试没有符号的核心文件
我有一个 C 应用程序,已部署到客户站点。 它在 HP-UX 上编译并运行。 用户报告了崩溃,我们获得了核心转储。 到目前为止,我还无法重现家里的崩溃情况。
正如您所怀疑的,核心文件/部署的可执行文件完全没有任何类型的符号。 当我在 gdb 中加载它并执行 bt 时,我得到的最好结果是:
(gdb) bt
#0 0xc0199470 in ?? ()
我可以在文件上执行“字符串核心”,但我的理解是我得到的只是可执行文件中的所有字符串,因此它似乎几乎不可能追踪到那里的任何东西。
我确实有可执行文件的调试版本(用 -g 编译),不幸的是它比发布版本新几个月。 如果我尝试使用该集线器启动 gdb,我会看到以下内容:
warning: exec file is newer than core file.
Core was generated by `program_name'.
Program terminated with signal 11, Segmentation fault.
__dld_list is not valid according to __dld_flags.
#0 0xc0199470 in ?? ()
(gdb) bt
#0 0xc0199470 in ?? ()
虽然编译调试版本并将其部署在客户站点然后等待另一次崩溃是可行的,但由于多种原因,这将相对困难且不可取。
我对代码非常熟悉,并且根据客户的错误报告,对代码中的何处崩溃有相对较好的了解。
有什么方法可以从这个核心转储中收集更多信息吗? 通过字符串或其他调试器或其他什么? 谢谢。
I have a C application we have deployed to a customers site. It was compiled and runs on HP-UX. The user has reported a crash and we have obtained a core dump. So far, I've been unable to duplicate the crash in house.
As you would suspect, the core file/deployed executable is completely devoid of any sort of symbols. When I load it up in gdb and do a bt, the best I get is this:
(gdb) bt
#0 0xc0199470 in ?? ()
I can do a 'strings core' on the file, but my understanding is that all I get there is all the strings in the executable, so it seems semi-impossible to track down anything there.
I do have a debug version (compiled with -g) of the executable, which is unfortunately a couple of months newer than the released version. If I try to start gdb with that hub, I see this:
warning: exec file is newer than core file.
Core was generated by `program_name'.
Program terminated with signal 11, Segmentation fault.
__dld_list is not valid according to __dld_flags.
#0 0xc0199470 in ?? ()
(gdb) bt
#0 0xc0199470 in ?? ()
While it would be feasible to compile a debug version and deploy it at the customer's site and then wait for another crash, it would be relatively difficult and undesirable for a number of reasons.
I am quite familiar with the code and have a relatively good idea of where in code it is crashing based on the customer's bug report.
Is there ANY way I can glean any more information from this core dump? Via strings or another debugger or anything? Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
尝试针对核心文件运行“pmap”(如果 hp/ux 有此工具)。 这应该报告核心文件中所有模块的起始地址。 有了这些信息,您应该能够获取故障位置的地址并找出哪个库崩溃了。 崩溃地址和库中已知函数的地址之间的进一步地址比较(“nm”与库应该得到的)可以帮助您确定哪个函数崩溃了。
即使您确实设法识别堆栈顶部的函数,该函数也不太可能是问题的根源...希望它实际上在您的代码中崩溃了,而不是标准 C字符串库。 重建堆栈跟踪是此时最好的事情。
Try running a "pmap" against the core file (if hp/ux has this tool). This should report the starting addresses of all modules in the core file. With this info, you should be able to take the address of the failure location and figure out what library crashed. Further address comparison between the crash address and the addresses of the known functions in the library ("nm" against the library should get that) may help you determine what function crashed.
Even if you do manage to identify the function at the top of the stack, it isn't very likely that this function is the source of the problem... hopefully it has actually crashed in your code and not, say, the standard C string library. Rebuilding the stack trace is the next-best thing at that point.
这里没有太多信息。 二进制文件被剥离。但是查看分段错误...您应该寻找可能覆盖一块内存的位置。
这只是一个建议。 可能会有很多问题。
顺便说一句,如果您无法在本地计算机上重现,那么客户的数据量可能会成为问题。
There is not much information here. The binary is stripped.But looking at segmentation fault...you should look for places where there is a possibility that you are overwriting a piece of memory.
This is just a suggestion. There can be many problems.
BTW, if you are not able to reproduce in your local machine then the volume of data on customers' might be a problem.
我认为核心文件不应该包含符号。 您需要能够构建一个与您交付给客户的程序完全相同的程序版本,但带有 -g。 如果您剥离调试可执行文件,它应该与发布的版本相同。 只有这样 gdb 才能给你任何有用的东西。
I don't think the core file is supposed to contain symbols. You need to able to build a version of your program that is exactly the same as what you shipped to your customer, but with -g. If you strip your debug executable, it should be identical to the shipped version. Only then can gdb give you anything useful.
gdb 的这种类型的响应:
也可能发生在堆栈因缓冲区溢出而被破坏的情况下,其中返回地址在内存中被覆盖,因此程序计数器被设置为看似随机的区域。
这是即使使用相应符号数据库进行构建也可能导致符号查找错误(或奇怪的回溯)的方法之一。 如果在获得符号表后仍然出现此问题,则问题可能是客户的数据导致代码出现一些问题。
This type of response from gdb:
can also happen in the case that the stack was smashed by a buffer overrun, where the return address was overwritten in memory, so the program counter gets set to a seemingly random area.
This is one of the ways that even a build with a corresponding symbol database can cause a symbol lookup error (or strange looking backtraces). If you still get this after you have the symbol table, your problem is likely that your customer's data is causing some issues with your code.
对于未来:
对于这种情况:
您知道一般区域,因此要查看您是否正确,请转到堆栈跟踪并找到汇编代码 - 观察它并查看您是否认为它与您的源代码匹配(如果您知道是什么源生成了该程序集,这会更容易) 。 如果看起来正确,那么你的假设就得到了一些验证。 您也许可以通过查看堆栈来找出局部变量的值(因为您知道传入和声明的内容)。
For the future:
For this situation:
You know the general area, so to see if you are right, go to the stack trace and find the assembly code -- eyeball it and see if you think it matches your source (this is easier if you have some idea what source generated this assembly). If it looks right, then you have some verification on your hypothesis. You might be able to figure out the values of the local variables by looking at the stack (since you know what you passed in and declared).
在 gdb 下,“信息寄存器”应该为您提供崩溃时足够的执行状态,以便与可执行文件和相关共享库的反汇编一起使用。 我通常使用 objdump 进行反汇编,将输出重定向到文件,然后在我最喜欢的编辑器中打开该文件 - 这对于在弄清楚事情时进行记录很有用。 gdb 的“info target”和“info sharedlib”对于确定共享库的加载位置也很有用。
有了寄存器状态、堆栈内容和反汇编,再加上一点运气,重建调用堆栈应该很简单(如果很乏味的话)(当然,除非堆栈因缓冲区溢出或类似的灾难而被破坏......在这种情况下可能需要显灵板或水晶球。)
您还可以将使用 -g 构建的新版本的反汇编与剥离版本的反汇编相关联。
Under gdb, "info registers" should give you enough of the execution state at the time of the crash to use with a disassembly of the executable and and relevant shared libraries. I usually use objdump to disassemble, redirect output to a file, then bring up the file in my favorite editor - this is useful for keeping notes as things are figured out. Also gdb's "info target" and "info sharedlib" can be useful for figuring out where shared libraries are loaded.
With register state, stack contents, and disassembly in hand along with a little luck, it should be straightforward (if tedious) to reconstruct the callstack (unless, of course, the stack has been trashed by a buffer overrun or similar catastrophe... might need an Ouija board or crystal ball in that case.)
You might also be able to correlate a a disassembly of the newer version built with -g against the disassembly of the stripped version.
您是否拥有用于编译旧版本的准确源代码(例如,通过源代码树中的标签或类似的东西)? 也许您可以使用它进行重建,并可能深入了解崩溃发生的位置?
Do you have the exact source that you used to compile the old version (eg; through a tag in the source tree or something like that)? Maybe you could rebuild using that, and possibly get an insight into where the crash occured?