伯姆 GC++垃圾收集器:堆部分太多增加 MAXHINCR 或 MAX_HEAP_SECTS

发布于 2024-10-12 15:27:26 字数 3656 浏览 2 评论 0原文

我在应用程序中使用 Boehm C++ 垃圾收集器。该应用程序使用 Levenshtein 确定性有限自动机 Python 程序来计算两个字符串之间的 Levenshtein 距离。我已使用 gcc 4.1.2 将 Python 程序移植到 Centos Linux 版本上的 C++。

最近,我注意到运行应用程序超过 10 分钟后,收到 SIGABRT 错误消息:堆部分过多:增加 MAXHINCR 或 MAX_HEAP_SECTS。我想知道是否有人知道如何解决或解决这个问题。

这是我的 gdb 堆栈跟踪。谢谢。

  Program received signal SIGABRT, Aborted.
(gdb) bt
#0  0x002ed402 in __kernel_vsyscall ()
#1  0x00b1bdf0 in raise () from /lib/libc.so.6
#2  0x00b1d701 in abort () from /lib/libc.so.6
#3  0x00e28db4 in GC_abort (msg=0xf36de0 "Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS")
    at ../Source/misc.c:1079
#4  0x00e249a0 in GC_add_to_heap (p=0xb7cb7000, bytes=65536) at ../Source/alloc.c:812
#5  0x00e24e45 in GC_expand_hp_inner (n=16) at ../Source/alloc.c:966
#6  0x00e24fc5 in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0) at ../Source/alloc.c:1032
#7  0x00e2519a in GC_allocobj (sz=6, kind=1) at ../Source/alloc.c:1087
#8  0x00e31e90 in GC_generic_malloc_inner (lb=20, k=1) at ../Source/malloc.c:138
#9  0x00e31fde in GC_generic_malloc (lb=20, k=1) at ../Source/malloc.c:194
#10 0x00e322b8 in GC_malloc (lb=20) at ../Source/malloc.c:319
#11 0x00df5ab5 in gc::operator new (size=20) at ../Include/gc_cpp.h:275
#12 0x00de7cb7 in __automata_combined_test2__::DFA::levenshtein_automata (this=0xb7b49080, term=0xb7cb5d20, k=1) 
at ../Source/automata_combined_test2.cpp:199
#13 0x00e3a085 in cDedupe::AccurateNearCompare (this=0x8052cd8, 
    Str1_=0x81f1a1d "GEMMA     OSTRANDER GEM 10   
DICARLO", ' ' <repeats 13 times>, "01748SUE       WOLFE     SUE 268  POND", ' ' <repeats 16 times>, 
"01748REGINA    SHAKIN    REGI16   JAMIE", ' ' <repeats 15 times>, "01748KATHLEEN  MAZUR     CATH10   JAMIE    "
..., 
    Str2_=0x81f2917 "LINDA     ROBISON   LIN 53   CHESTNUT", ' ' <repeats 12 times>, 
"01748MICHELLE  LITAVIS   MICH15   BLUEBERRY", ' ' <repeats 11 times>, "01748JOAN      TITUS     JO  6    SMITH", 
' ' <repeats 15 times>, "01748MELINDA   MCDOWELL  MEL 24   SMITH    "..., Size_=10, 

更新:

我查看了 Boehm Garbage Collector 源代码和头文件,并意识到:Too much heapsections:Increase MAXHINCR or MAX_HEAP_SECTS 错误消息可以通过添加 ‑DLARGE_CONFIG 来修复我的 GNUmakefile 中的 CFLAGS 部分。

我测试了对 GNUMakfile 的这一更改,发现不再出现 Too much heapsections:Increase MAXHINCR or MAX_HEAP_SECTS 错误消息。但是我遇到了一个新的分段错误(核心转储)。使用 gdb,我发现 GDB 分段错误发生在第 20 行的以下函数中(我已对其进行了注释):

set<tuple2<__ss_int, __ss_int> *> *NFA::next_state(set<tuple2<__ss_int, __ss_int> *> *states, str *input) {
    tuple2<__ss_int, __ss_int> *state;
    set<tuple2<__ss_int, __ss_int> *>::for_in_loop __3;
    set<tuple2<__ss_int, __ss_int> *> *__0, *dest_states;
    dict<str *, set<tuple2<__ss_int, __ss_int> *> *> *state_transitions;
    __iter<tuple2<__ss_int, __ss_int> *> *__1;
    __ss_int __2;

    dest_states = (new set<tuple2<__ss_int, __ss_int> *>());

    FOR_IN_NEW(state,states,0,2,3)
        state_transitions = (this->transitions)->get(state, ((dict<str *, set<tuple2<__ss_int, __ss_int> *> *> *)((new dict<void *, void *>()))));

    dest_states->update(state_transitions->get(input, new set<tuple2<__ss_int, __ss_int> *>()));
    dest_states->update(state_transitions->get(NFA::ANY, new set<tuple2<__ss_int, __ss_int> *>()));
    END_FOR

    return (new set<tuple2<__ss_int, __ss_int> *>(this->_expand(dest_states),1));//line20  
}

我想知道是否可以修改此函数来修复分段错误。谢谢。

I am using the Boehm C++ Garbage collector in an application. The application uses the Levenshtein Deterministic Finite Automata Python program to calculate the Levenshtein distance between two string. I have ported the Python program to C++ on version of Centos Linux using gcc 4.1.2.

Recently, I noticed that after I run the application more than 10 minutes, I get the SIGABRT error message: Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS. I was wondering if anyone knew how to fix or work around this problem.

Here is my gdb stack trace. Thank you.

  Program received signal SIGABRT, Aborted.
(gdb) bt
#0  0x002ed402 in __kernel_vsyscall ()
#1  0x00b1bdf0 in raise () from /lib/libc.so.6
#2  0x00b1d701 in abort () from /lib/libc.so.6
#3  0x00e28db4 in GC_abort (msg=0xf36de0 "Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS")
    at ../Source/misc.c:1079
#4  0x00e249a0 in GC_add_to_heap (p=0xb7cb7000, bytes=65536) at ../Source/alloc.c:812
#5  0x00e24e45 in GC_expand_hp_inner (n=16) at ../Source/alloc.c:966
#6  0x00e24fc5 in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0) at ../Source/alloc.c:1032
#7  0x00e2519a in GC_allocobj (sz=6, kind=1) at ../Source/alloc.c:1087
#8  0x00e31e90 in GC_generic_malloc_inner (lb=20, k=1) at ../Source/malloc.c:138
#9  0x00e31fde in GC_generic_malloc (lb=20, k=1) at ../Source/malloc.c:194
#10 0x00e322b8 in GC_malloc (lb=20) at ../Source/malloc.c:319
#11 0x00df5ab5 in gc::operator new (size=20) at ../Include/gc_cpp.h:275
#12 0x00de7cb7 in __automata_combined_test2__::DFA::levenshtein_automata (this=0xb7b49080, term=0xb7cb5d20, k=1) 
at ../Source/automata_combined_test2.cpp:199
#13 0x00e3a085 in cDedupe::AccurateNearCompare (this=0x8052cd8, 
    Str1_=0x81f1a1d "GEMMA     OSTRANDER GEM 10   
DICARLO", ' ' <repeats 13 times>, "01748SUE       WOLFE     SUE 268  POND", ' ' <repeats 16 times>, 
"01748REGINA    SHAKIN    REGI16   JAMIE", ' ' <repeats 15 times>, "01748KATHLEEN  MAZUR     CATH10   JAMIE    "
..., 
    Str2_=0x81f2917 "LINDA     ROBISON   LIN 53   CHESTNUT", ' ' <repeats 12 times>, 
"01748MICHELLE  LITAVIS   MICH15   BLUEBERRY", ' ' <repeats 11 times>, "01748JOAN      TITUS     JO  6    SMITH", 
' ' <repeats 15 times>, "01748MELINDA   MCDOWELL  MEL 24   SMITH    "..., Size_=10, 

Update:

I looked at Boehm Garbage Collector source and header files and realized that the: Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS error message could be fixed by adding ‑DLARGE_CONFIG to the CFLAGS section in my GNUmakefile.

I tested this change to my GNUmakfile and found that the Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS error message no longer occuured. However I am getting a new segmentation fault (core dump). Using gdb I found that the GDB segmentation fault occured in the following function at line 20 (which I have annotated):

set<tuple2<__ss_int, __ss_int> *> *NFA::next_state(set<tuple2<__ss_int, __ss_int> *> *states, str *input) {
    tuple2<__ss_int, __ss_int> *state;
    set<tuple2<__ss_int, __ss_int> *>::for_in_loop __3;
    set<tuple2<__ss_int, __ss_int> *> *__0, *dest_states;
    dict<str *, set<tuple2<__ss_int, __ss_int> *> *> *state_transitions;
    __iter<tuple2<__ss_int, __ss_int> *> *__1;
    __ss_int __2;

    dest_states = (new set<tuple2<__ss_int, __ss_int> *>());

    FOR_IN_NEW(state,states,0,2,3)
        state_transitions = (this->transitions)->get(state, ((dict<str *, set<tuple2<__ss_int, __ss_int> *> *> *)((new dict<void *, void *>()))));

    dest_states->update(state_transitions->get(input, new set<tuple2<__ss_int, __ss_int> *>()));
    dest_states->update(state_transitions->get(NFA::ANY, new set<tuple2<__ss_int, __ss_int> *>()));
    END_FOR

    return (new set<tuple2<__ss_int, __ss_int> *>(this->_expand(dest_states),1));//line20  
}

I was wondering if it was possible to modify this function to fix the segmentation fault. Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

深海不蓝 2024-10-19 15:27:26

我终于想出了修复 GC 内存不足分段错误的方法。我替换了 python 程序中的 setdefault 和 get 结构。例如,我将 self.transitions.setdefault(src, {}).setdefault(input, set()).add(dest) python 语句替换为:

 if src not in self.transitions:
    self.transitions[src] = {}
 result = self.transitions[src]
 if input not in result:
    result[input] = set()
 result[input].add(dest)

另外,我将 python 语句替换

new_states = self.transitions.get(state, {}).get(NFA.EPSILON, set()).difference(states)

        if state not in self.transitions:
           self.transitions[state] = {}
        result = self.transitions[state]    
        if NFA.EPSILON not in result:
           result[NFA.EPSILON] = set()
        cook = result[NFA.EPSILON]      
        new_states = cook.difference(states) 

:最后,我确定将 __shedkin__.init() 放在 for 或 while 循环之外。 __shedskin__.init() 调用 GC 分配器。所有这些改变的目的都是为了减轻GC分配器的压力。

我已经通过 300 万次 GC 分配器调用测试了这些更改,但尚未出现分段错误。谢谢。

I finally figured out to fix the GC out of memory segmentation fault. I replaced the setdefault and the get constructs in the python program. For example, I replaced the self.transitions.setdefault(src, {}).setdefault(input, set()).add(dest) python statement with:

 if src not in self.transitions:
    self.transitions[src] = {}
 result = self.transitions[src]
 if input not in result:
    result[input] = set()
 result[input].add(dest)

Also, I replaced the python statement:

new_states = self.transitions.get(state, {}).get(NFA.EPSILON, set()).difference(states)

with:

        if state not in self.transitions:
           self.transitions[state] = {}
        result = self.transitions[state]    
        if NFA.EPSILON not in result:
           result[NFA.EPSILON] = set()
        cook = result[NFA.EPSILON]      
        new_states = cook.difference(states) 

Finally, I made sure to put __shedkin__.init() outside of the for or while loop. __shedskin__.init() calls the GC allocator. The purpose of all of these changes is to reduce the pressure on the GC allocator.

I have tested these changes with 3 million calls to the GC allocator and I have yet to get a segmentation fault. Thank you.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文