伯姆 GC++垃圾收集器：堆部分太多增加 MAXHINCR 或 MAX_HEAP_SECTS

发布于 2024-10-12 15:27:26 字数 3656 浏览 10 评论 0原文

我在应用程序中使用 Boehm C++ 垃圾收集器。该应用程序使用 Levenshtein 确定性有限自动机 Python 程序来计算两个字符串之间的 Levenshtein 距离。我已使用 gcc 4.1.2 将 Python 程序移植到 Centos Linux 版本上的 C++。

最近，我注意到运行应用程序超过 10 分钟后，收到 SIGABRT 错误消息：堆部分过多：增加 MAXHINCR 或 MAX_HEAP_SECTS。我想知道是否有人知道如何解决或解决这个问题。

这是我的 gdb 堆栈跟踪。谢谢。

  Program received signal SIGABRT, Aborted.
(gdb) bt
#0  0x002ed402 in __kernel_vsyscall ()
#1  0x00b1bdf0 in raise () from /lib/libc.so.6
#2  0x00b1d701 in abort () from /lib/libc.so.6
#3  0x00e28db4 in GC_abort (msg=0xf36de0 "Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS")
    at ../Source/misc.c:1079
#4  0x00e249a0 in GC_add_to_heap (p=0xb7cb7000, bytes=65536) at ../Source/alloc.c:812
#5  0x00e24e45 in GC_expand_hp_inner (n=16) at ../Source/alloc.c:966
#6  0x00e24fc5 in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0) at ../Source/alloc.c:1032
#7  0x00e2519a in GC_allocobj (sz=6, kind=1) at ../Source/alloc.c:1087
#8  0x00e31e90 in GC_generic_malloc_inner (lb=20, k=1) at ../Source/malloc.c:138
#9  0x00e31fde in GC_generic_malloc (lb=20, k=1) at ../Source/malloc.c:194
#10 0x00e322b8 in GC_malloc (lb=20) at ../Source/malloc.c:319
#11 0x00df5ab5 in gc::operator new (size=20) at ../Include/gc_cpp.h:275
#12 0x00de7cb7 in __automata_combined_test2__::DFA::levenshtein_automata (this=0xb7b49080, term=0xb7cb5d20, k=1) 
at ../Source/automata_combined_test2.cpp:199
#13 0x00e3a085 in cDedupe::AccurateNearCompare (this=0x8052cd8, 
    Str1_=0x81f1a1d "GEMMA     OSTRANDER GEM 10   
DICARLO", ' ' <repeats 13 times>, "01748SUE       WOLFE     SUE 268  POND", ' ' <repeats 16 times>, 
"01748REGINA    SHAKIN    REGI16   JAMIE", ' ' <repeats 15 times>, "01748KATHLEEN  MAZUR     CATH10   JAMIE    "
..., 
    Str2_=0x81f2917 "LINDA     ROBISON   LIN 53   CHESTNUT", ' ' <repeats 12 times>, 
"01748MICHELLE  LITAVIS   MICH15   BLUEBERRY", ' ' <repeats 11 times>, "01748JOAN      TITUS     JO  6    SMITH", 
' ' <repeats 15 times>, "01748MELINDA   MCDOWELL  MEL 24   SMITH    "..., Size_=10,

更新：

我查看了 Boehm Garbage Collector 源代码和头文件，并意识到：Too much heapsections:Increase MAXHINCR or MAX_HEAP_SECTS 错误消息可以通过添加 ‑DLARGE_CONFIG 来修复我的 GNUmakefile 中的 CFLAGS 部分。

我测试了对 GNUMakfile 的这一更改，发现不再出现 Too much heapsections:Increase MAXHINCR or MAX_HEAP_SECTS 错误消息。但是我遇到了一个新的分段错误（核心转储）。使用 gdb，我发现 GDB 分段错误发生在第 20 行的以下函数中（我已对其进行了注释）：

set<tuple2<__ss_int, __ss_int> *> *NFA::next_state(set<tuple2<__ss_int, __ss_int> *> *states, str *input) {
    tuple2<__ss_int, __ss_int> *state;
    set<tuple2<__ss_int, __ss_int> *>::for_in_loop __3;
    set<tuple2<__ss_int, __ss_int> *> *__0, *dest_states;
    dict<str *, set<tuple2<__ss_int, __ss_int> *> *> *state_transitions;
    __iter<tuple2<__ss_int, __ss_int> *> *__1;
    __ss_int __2;

    dest_states = (new set<tuple2<__ss_int, __ss_int> *>());

    FOR_IN_NEW(state,states,0,2,3)
        state_transitions = (this->transitions)->get(state, ((dict<str *, set<tuple2<__ss_int, __ss_int> *> *> *)((new dict<void *, void *>()))));

    dest_states->update(state_transitions->get(input, new set<tuple2<__ss_int, __ss_int> *>()));
    dest_states->update(state_transitions->get(NFA::ANY, new set<tuple2<__ss_int, __ss_int> *>()));
    END_FOR

    return (new set<tuple2<__ss_int, __ss_int> *>(this->_expand(dest_states),1));//line20  
}

我想知道是否可以修改此函数来修复分段错误。谢谢。

原文

I am using the Boehm C++ Garbage collector in an application. The application uses the Levenshtein Deterministic Finite Automata Python program to calculate the Levenshtein distance between two string. I have ported the Python program to C++ on version of Centos Linux using gcc 4.1.2.

Recently, I noticed that after I run the application more than 10 minutes, I get the SIGABRT error message: Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS. I was wondering if anyone knew how to fix or work around this problem.

Here is my gdb stack trace. Thank you.

  Program received signal SIGABRT, Aborted.
(gdb) bt
#0  0x002ed402 in __kernel_vsyscall ()
#1  0x00b1bdf0 in raise () from /lib/libc.so.6
#2  0x00b1d701 in abort () from /lib/libc.so.6
#3  0x00e28db4 in GC_abort (msg=0xf36de0 "Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS")
    at ../Source/misc.c:1079
#4  0x00e249a0 in GC_add_to_heap (p=0xb7cb7000, bytes=65536) at ../Source/alloc.c:812
#5  0x00e24e45 in GC_expand_hp_inner (n=16) at ../Source/alloc.c:966
#6  0x00e24fc5 in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0) at ../Source/alloc.c:1032
#7  0x00e2519a in GC_allocobj (sz=6, kind=1) at ../Source/alloc.c:1087
#8  0x00e31e90 in GC_generic_malloc_inner (lb=20, k=1) at ../Source/malloc.c:138
#9  0x00e31fde in GC_generic_malloc (lb=20, k=1) at ../Source/malloc.c:194
#10 0x00e322b8 in GC_malloc (lb=20) at ../Source/malloc.c:319
#11 0x00df5ab5 in gc::operator new (size=20) at ../Include/gc_cpp.h:275
#12 0x00de7cb7 in __automata_combined_test2__::DFA::levenshtein_automata (this=0xb7b49080, term=0xb7cb5d20, k=1) 
at ../Source/automata_combined_test2.cpp:199
#13 0x00e3a085 in cDedupe::AccurateNearCompare (this=0x8052cd8, 
    Str1_=0x81f1a1d "GEMMA     OSTRANDER GEM 10   
DICARLO", ' ' <repeats 13 times>, "01748SUE       WOLFE     SUE 268  POND", ' ' <repeats 16 times>, 
"01748REGINA    SHAKIN    REGI16   JAMIE", ' ' <repeats 15 times>, "01748KATHLEEN  MAZUR     CATH10   JAMIE    "
..., 
    Str2_=0x81f2917 "LINDA     ROBISON   LIN 53   CHESTNUT", ' ' <repeats 12 times>, 
"01748MICHELLE  LITAVIS   MICH15   BLUEBERRY", ' ' <repeats 11 times>, "01748JOAN      TITUS     JO  6    SMITH", 
' ' <repeats 15 times>, "01748MELINDA   MCDOWELL  MEL 24   SMITH    "..., Size_=10,

Update:

I looked at Boehm Garbage Collector source and header files and realized that the: Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS error message could be fixed by adding ‑DLARGE_CONFIG to the CFLAGS section in my GNUmakefile.

I tested this change to my GNUmakfile and found that the Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS error message no longer occuured. However I am getting a new segmentation fault (core dump). Using gdb I found that the GDB segmentation fault occured in the following function at line 20 (which I have annotated):

set<tuple2<__ss_int, __ss_int> *> *NFA::next_state(set<tuple2<__ss_int, __ss_int> *> *states, str *input) {
    tuple2<__ss_int, __ss_int> *state;
    set<tuple2<__ss_int, __ss_int> *>::for_in_loop __3;
    set<tuple2<__ss_int, __ss_int> *> *__0, *dest_states;
    dict<str *, set<tuple2<__ss_int, __ss_int> *> *> *state_transitions;
    __iter<tuple2<__ss_int, __ss_int> *> *__1;
    __ss_int __2;

    dest_states = (new set<tuple2<__ss_int, __ss_int> *>());

    FOR_IN_NEW(state,states,0,2,3)
        state_transitions = (this->transitions)->get(state, ((dict<str *, set<tuple2<__ss_int, __ss_int> *> *> *)((new dict<void *, void *>()))));

    dest_states->update(state_transitions->get(input, new set<tuple2<__ss_int, __ss_int> *>()));
    dest_states->update(state_transitions->get(NFA::ANY, new set<tuple2<__ss_int, __ss_int> *>()));
    END_FOR

    return (new set<tuple2<__ss_int, __ss_int> *>(this->_expand(dest_states),1));//line20  
}

I was wondering if it was possible to modify this function to fix the segmentation fault. Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

深海不蓝 2024-10-19 15:27:26

我终于想出了修复 GC 内存不足分段错误的方法。我替换了 python 程序中的 setdefault 和 get 结构。例如，我将 self.transitions.setdefault(src, {}).setdefault(input, set()).add(dest) python 语句替换为：

 if src not in self.transitions:
    self.transitions[src] = {}
 result = self.transitions[src]
 if input not in result:
    result[input] = set()
 result[input].add(dest)

另外，我将 python 语句替换

new_states = self.transitions.get(state, {}).get(NFA.EPSILON, set()).difference(states)

为

        if state not in self.transitions:
           self.transitions[state] = {}
        result = self.transitions[state]    
        if NFA.EPSILON not in result:
           result[NFA.EPSILON] = set()
        cook = result[NFA.EPSILON]      
        new_states = cook.difference(states)

：最后，我确定将 __shedkin__.init() 放在 for 或 while 循环之外。 __shedskin__.init() 调用 GC 分配器。所有这些改变的目的都是为了减轻GC分配器的压力。

我已经通过 300 万次 GC 分配器调用测试了这些更改，但尚未出现分段错误。谢谢。

I finally figured out to fix the GC out of memory segmentation fault. I replaced the setdefault and the get constructs in the python program. For example, I replaced the self.transitions.setdefault(src, {}).setdefault(input, set()).add(dest) python statement with:

 if src not in self.transitions:
    self.transitions[src] = {}
 result = self.transitions[src]
 if input not in result:
    result[input] = set()
 result[input].add(dest)

Also, I replaced the python statement:

new_states = self.transitions.get(state, {}).get(NFA.EPSILON, set()).difference(states)

with:

        if state not in self.transitions:
           self.transitions[state] = {}
        result = self.transitions[state]    
        if NFA.EPSILON not in result:
           result[NFA.EPSILON] = set()
        cook = result[NFA.EPSILON]      
        new_states = cook.difference(states)

Finally, I made sure to put __shedkin__.init() outside of the for or while loop. __shedskin__.init() calls the GC allocator. The purpose of all of these changes is to reduce the pressure on the GC allocator.

I have tested these changes with 3 million calls to the GC allocator and I have yet to get a segmentation fault. Thank you.

回复收藏 0 原文

~没有更多了~