Linux 和 Solaris Unix:函数末尾的 Coredump
在重负载条件下,我们非常随机地观察到核心转储。当我们加载核心文件并查看核心转储的位置时,它始终指向函数的最后一行,即右大括号的行号。
该函数有一些遗留的 goto 语句。当我们之前遇到类似问题时,我们将所有本地对象的创建移至函数顶部,这似乎解决了 Solaris Unix 10 上的问题。(我们的怀疑和一些示例测试表明,当执行 goto 语句时,某些这些局部变量从未被创建,但它们的析构函数总是被调用,因此将它们一直移动到顶部以确保它们始终被正确构造。但问题仍然发生在 Linux 上,而我们在 Solaris 上不再看到这个问题。
更新了堆栈跟踪:
#0 0x008a5206 in raise () from /lib/libc.so.6
#1 0x008a6bd1 in abort () from /lib/libc.so.6
#2 0x008de3bb in __libc_message () from /lib/libc.so.6
#3 0x00966634 in __stack_chk_fail () from /lib/libc.so.6
#4 0x08e9ebf5 in our_function (this=0xd2f2c380)
at sourcefilename.cc:9887
有人遇到类似的问题吗?非常感谢任何帮助或指导来理解和解决问题。非常感谢。
We are observing a core dump quite randomly, under heavy load conditions. When we load the core file and look at the location of the core dump it is always pointing to the last line of the function, precisely the line number of the closing brace.
The function has some legacy goto statements. When we had similar issue earlier, we moved creation of all local objects to the top of the function and that appeared to have fixed the issue on Solaris Unix 10. (Our suspicion and some sample tests showed that when goto statements were executed, some of these local variables were never created but their destructors were always invoked. So moving them all the way to the top ensured that they are always constructed properly). But the problem is still happening on the Linux, while we don't see this issue any more on Solaris.
Updated with stack trace :
#0 0x008a5206 in raise () from /lib/libc.so.6
#1 0x008a6bd1 in abort () from /lib/libc.so.6
#2 0x008de3bb in __libc_message () from /lib/libc.so.6
#3 0x00966634 in __stack_chk_fail () from /lib/libc.so.6
#4 0x08e9ebf5 in our_function (this=0xd2f2c380)
at sourcefilename.cc:9887
Anybody encountered similar issue? Greatly appreciate any help or pointers to understand and fix the issue. Thanks a ton.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我怀疑你在向下增长的堆栈中溢出了缓冲区(大多数堆栈向下增长;我不知道Linux或Solaris是否在所有体系结构上使用向下堆栈,但肯定是其中一些)。此时,它会覆盖返回地址,并且程序计数器跳转到非法地址,从而在函数返回的精确位置产生崩溃。
只需使用 Valgrind,它可能会告诉您发生了什么(或者更确切地说,超限在哪里)。
I suspect you're overrunning a buffer in a growing-downwards stack (most stacks grow downwards; I don't know whether Linux or Solaris use downwards stacks on all architectures, but definitely some of them). At this point, it overwrites the return address, and the program counter jumps to an illegal address, generating the crash at precisely where the function returns.
Just Valgrind it, it will probably tell you what's happening (or rather, where the overrun is).