Ruby/Glibc coredump(双重释放或损坏)
我正在使用我自己用 Ruby 编写的分布式持续集成工具。它使用 Mike Perham 的“政治”分支来分配任务。 “政治”模块正在使用 mDNS 部分的线程。
我时不时地会遇到一个我不明白的核心转储:
*** glibc detected *** ruby: double free or corruption (fasttop): 0x086d8600 ***
======= Backtrace: =========
/lib/libc.so.6[0xb7cef494]
/lib/libc.so.6[0xb7cf0b93]
/lib/libc.so.6(cfree+0x6d)[0xb7cf3c7d]
/usr/lib/libruby18.so.1.8[0xb7e8adf8]
/usr/lib/libruby18.so.1.8(ruby_xmalloc+0x85)[0xb7e8b395]
/usr/lib/libruby18.so.1.8[0xb7e5065e]
...
/usr/lib/libruby18.so.1.8[0xb7e717f4]
/usr/lib/libruby18.so.1.8[0xb7e74296]
/usr/lib/libruby18.so.1.8(rb_yield+0x27)[0xb7e7fb57]
======= Memory map: ========
...
我正在 Gentoo 上运行,并使用“-gdbg”重建了 Ruby 和 Glibc,并关闭了条带化以获得有意义的核心:
...
Core was generated by `ruby /home/develop/dcc/bin/dcc-worker'.
Program terminated with signal 6, Aborted.
#0 0xb7f20410 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7f20410 in __kernel_vsyscall ()
#1 0xb7cacb60 in *__GI___open_catalog (cat_name=0x6 <Address 0x6 out of bounds>, nlspath=0xbf9d6f00 " ", env_var=0x0, catalog=0x1) at open_catalog.c:237
#2 0xb7cae498 in __sigdelset (set=0x6) from /lib/libc.so.6
#3 *__GI_sigfillset (set=0x6) at ../signal/sigfillset.c:42
#4 0xb7ce952d in freopen64 (filename=0x2 <Address 0x2 out of bounds>, mode=0xb7db02c8 "\" total=\"%zu\" count=\"%zu\"/>\n", fp=0x9) at freopen64.c:47
#5 0xb7cef494 in _IO_str_init_readonly (sf=0x86d8600, ptr=0xb7eef5a9 "te\213V\b\205\322\017\204\220", size=-1210273804) at strops.c:88
#6 0xb7cf0b93 in mALLINFo (av=0xb) at malloc.c:5865
#7 0xb7cf3c7d in __libc_calloc (n=141395456, elem_size=3214793136) at malloc.c:4019
#8 0xb7e8adf8 in ?? () at gc.c:1390 from /usr/lib/libruby18.so.1.8
#9 0x086d8600 in ?? ()
#10 0xb7e89400 in rb_gc_disable () at gc.c:256
#11 0xb7e8b395 in add_freelist () at gc.c:1087
#12 gc_sweep () at gc.c:1186
#13 garbage_collect () at gc.c:1524
#14 0xb7e5065e in ?? () from /usr/lib/libruby18.so.1.8
#15 0x00000340 in ?? ()
#16 0x00000000 in ?? ()
(gdb)
嗯???对我来说,这看起来完全是 Ruby 实习生。关于 stackoverflow 上的其他“双重释放或损坏”问题,我发现线程可能是问题的一部分。
而且问题不会发生在完全相同的位置。我有另一个回溯,它更长,但崩溃也在 garbage_collect
中,但路径略有不同:
(gdb) bt
#0 0xffffe430 in __kernel_vsyscall ()
#1 0xf7c8b8c0 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0xf7c8d1f5 in *__GI_abort () at abort.c:88
#3 0xf7cc7e35 in __libc_message (do_abort=2, fmt=0xf7d8daa8 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:170
#4 0xf7ccdd24 in malloc_printerr (action=2, str=0xf7d8dbec "double free or corruption (fasttop)", ptr=0x911f5d0) at malloc.c:6197
#5 0xf7ccf403 in _int_free (av=0xf7daa380, p=0x911f5c8) at malloc.c:4750
#6 0xf7cd24ad in *__GI___libc_free (mem=0x911f5d0) at malloc.c:3716
#7 0xf7e68768 in obj_free () at gc.c:1366
#8 gc_sweep () at gc.c:1174
#9 garbage_collect () at gc.c:1524
#10 0xf7e68be5 in rb_newobj () at gc.c:436
#11 0xf7eb9840 in str_alloc (klass=0) at string.c:67
... (150 lines of rb_eval/call/yield etc.)
有没有人建议如何隔离并解决这个问题?
I am using a distributed continuous integration tool which I have written by myself in Ruby. It uses a fork of Mike Perham's "politics" for distribution of the tasks. The "politics" module is using threads for the mDNS part.
Every now and then I encounter a core dump which I don't understand:
*** glibc detected *** ruby: double free or corruption (fasttop): 0x086d8600 ***
======= Backtrace: =========
/lib/libc.so.6[0xb7cef494]
/lib/libc.so.6[0xb7cf0b93]
/lib/libc.so.6(cfree+0x6d)[0xb7cf3c7d]
/usr/lib/libruby18.so.1.8[0xb7e8adf8]
/usr/lib/libruby18.so.1.8(ruby_xmalloc+0x85)[0xb7e8b395]
/usr/lib/libruby18.so.1.8[0xb7e5065e]
...
/usr/lib/libruby18.so.1.8[0xb7e717f4]
/usr/lib/libruby18.so.1.8[0xb7e74296]
/usr/lib/libruby18.so.1.8(rb_yield+0x27)[0xb7e7fb57]
======= Memory map: ========
...
I am running on Gentoo and have rebuild Ruby and Glibc with "-gdbg" and turned off the striping to get a meaningful core:
...
Core was generated by `ruby /home/develop/dcc/bin/dcc-worker'.
Program terminated with signal 6, Aborted.
#0 0xb7f20410 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7f20410 in __kernel_vsyscall ()
#1 0xb7cacb60 in *__GI___open_catalog (cat_name=0x6 <Address 0x6 out of bounds>, nlspath=0xbf9d6f00 " ", env_var=0x0, catalog=0x1) at open_catalog.c:237
#2 0xb7cae498 in __sigdelset (set=0x6) from /lib/libc.so.6
#3 *__GI_sigfillset (set=0x6) at ../signal/sigfillset.c:42
#4 0xb7ce952d in freopen64 (filename=0x2 <Address 0x2 out of bounds>, mode=0xb7db02c8 "\" total=\"%zu\" count=\"%zu\"/>\n", fp=0x9) at freopen64.c:47
#5 0xb7cef494 in _IO_str_init_readonly (sf=0x86d8600, ptr=0xb7eef5a9 "te\213V\b\205\322\017\204\220", size=-1210273804) at strops.c:88
#6 0xb7cf0b93 in mALLINFo (av=0xb) at malloc.c:5865
#7 0xb7cf3c7d in __libc_calloc (n=141395456, elem_size=3214793136) at malloc.c:4019
#8 0xb7e8adf8 in ?? () at gc.c:1390 from /usr/lib/libruby18.so.1.8
#9 0x086d8600 in ?? ()
#10 0xb7e89400 in rb_gc_disable () at gc.c:256
#11 0xb7e8b395 in add_freelist () at gc.c:1087
#12 gc_sweep () at gc.c:1186
#13 garbage_collect () at gc.c:1524
#14 0xb7e5065e in ?? () from /usr/lib/libruby18.so.1.8
#15 0x00000340 in ?? ()
#16 0x00000000 in ?? ()
(gdb)
Hmm??? For me this looks like it's totally Ruby intern. On other "double free or corruption" problems here at stackoverflow I have seen that maybe threads are part of the problem.
Also the problem does not occur at the exactly same position. I have another backtrace which is much longer but the crash is also in garbage_collect
but with a slightly different path:
(gdb) bt
#0 0xffffe430 in __kernel_vsyscall ()
#1 0xf7c8b8c0 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0xf7c8d1f5 in *__GI_abort () at abort.c:88
#3 0xf7cc7e35 in __libc_message (do_abort=2, fmt=0xf7d8daa8 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:170
#4 0xf7ccdd24 in malloc_printerr (action=2, str=0xf7d8dbec "double free or corruption (fasttop)", ptr=0x911f5d0) at malloc.c:6197
#5 0xf7ccf403 in _int_free (av=0xf7daa380, p=0x911f5c8) at malloc.c:4750
#6 0xf7cd24ad in *__GI___libc_free (mem=0x911f5d0) at malloc.c:3716
#7 0xf7e68768 in obj_free () at gc.c:1366
#8 gc_sweep () at gc.c:1174
#9 garbage_collect () at gc.c:1524
#10 0xf7e68be5 in rb_newobj () at gc.c:436
#11 0xf7eb9840 in str_alloc (klass=0) at string.c:67
... (150 lines of rb_eval/call/yield etc.)
Has anyone a suggestion how to isolate and maybe solve this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
快速、简单,但没有那么有用:
导出 MALLOC_CHECK_=2
。这会导致 glibc 在 free() 期间执行一些额外级别的检查,以避免堆损坏。一旦检测到损坏,它就会abort()
并提供核心转储,而不是等到出现由损坏引起的实际问题。不是那么快速和简单,但更有帮助(如果你让它工作的话):valgrind。
Quick, easy, and not as helpful:
export MALLOC_CHECK_=2
. This causes glibc to do some extra level of checking duringfree()
, to avoid heap corruption. It willabort()
and give a core dump as soon as it detects corruption, instead of waiting until there's an actual problem caused by the corruption.Not quite as quick and easy, but much more helpful (if you get it working): valgrind.
Valgrind 可以轻松发现堆损坏问题。在 valgrind 下使用 Ruby 1.8 时会报告一些虚假错误,但可以使用 此 ruby 补丁(并使用 --enable-valgrind 进行配置)或使用 valgrind 抑制文件。要在 valgrind 下运行 ruby 程序,只需在命令前加上 valgrind 前缀:
如果崩溃的进程是您正在运行的进程的子进程,请使用 valgrind --trace-children=yes代码>.特别注意无效写入,这是堆损坏的迹象。
Valgrind makes it easy to find heap corruption issues. There are some spurious errors reported when using Ruby 1.8 under valgrind, but they can be eliminated using this ruby patch (and configuring with --enable-valgrind) or using a valgrind suppression file. To run your ruby program under valgrind, just prefix the command with
valgrind
:If the crashing process is a child of the process you are running, use
valgrind --trace-children=yes
. Look in particular for invalid writes, which are a sign of heap corruption.我在一个名为 rd_test 的简单“C”程序中遇到了同样的错误;它只会使用 read(2) 从给定的输入文件(可能是设备文件)读取给定数量的字节。
实际错误结果是 1 字节的缓冲区溢出(正如我所做的那样)
...
buf[n]='\0';
...
其中“n”是读入缓冲区“buf”的字节数)。
愚蠢的我。
但是,问题是我从来没有发现这一点,直到我用 valgrind 运行它!
所以恕我直言,valgrind 绝对值得在这样的情况下运行。
一旦我摆脱了令人讨厌的错误,“双重释放或损坏”错误就消失了。
I got this very same error in a simple 'C' program called rd_test; it would just read a given number of bytes using read(2) from a given input file (could be a device file).
The actual bug turned out to be a buffer overflow of 1 byte (as i did
...
buf[n]='\0';
...
where 'n' is the number of bytes read into the buffer 'buf').
Silly me.
BUT, the thing is I never caught that until I ran it with valgrind!
So IMHO valgrind is definitely worth running on cases like this.
The 'double free or corruption' error went away as soon as i got rid of the offending bug.
我收到了同样的错误消息,不是在 ruby 中,而是在 zenity-program 中。
我发现这与我关闭两次打开的管道有关!
检查您是否没有释放两次或多次相同的堆内存,并再次关闭已经关闭的文件或管道。
祝你好运
I got the same error message , not in ruby but in a zenity-program .
I discovered it had something todo with me closing two times an open pipe !
Check if You dont free two-or more times the same heap-memory , closing again already closed files or pipes .
Goodluck