如何去调试一个协程库?
背景是这样的,我去模仿Swoole
的代码写了一个协程库,上下文切换用的是boost.asm
的代码。但是在上下文切换的时候,也就是在boost.asm
里面的jump_fcontext
里面出了问题,报错是段错误:
打日志发现应该是在resume某个协程的时候出了问题。而且我看报错的那一行汇编代码用到了sp寄存器,所以我猜测是我在某个地方不小心释放掉了协程栈。但是这个问题我无从下手,所以想请教一下前辈们我改如何去解决这个问题,有什么好的工具或者思路吗?
如果前辈们需要用到源码分析一下,代码在:https://github.com/huanghanta...
我发现这个问题是我写的测试文件https://github.com/huanghanta...
在45行的位置,我加了一行co::sleep
,切换出了上下文。
然后我用ab
去压测这个服务器:
ab -c 100 -n 10000 127.0.0.1:80/
大概压测10
到20
次才会触发这个段错误。
(如果不加上co::sleep就不会有这个段错误,而且在1000
个连接,100W
请求都是没有问题的,所以应该是可以排除上下文切换的问题)
我测试了一下,代码是没有明显的内存泄漏的。压测完稳定的时候一直是处于4.6M左右。
使用valgrind
检查内存情况得到如下结果:
~/codeDir/cppCode/fsw/examples # valgrind --track-origins=yes ./a.out
==85446== Memcheck, a memory error detector
==85446== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==85446== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==85446== Command: ./a.out
==85446==
==85446== Warning: client switching stacks? SP change: 0x1fff0008e0 --> 0x4abb708
==85446== to suppress, use: --max-stackframe=137343816152 or greater
==85446== Warning: client switching stacks? SP change: 0x4abb590 --> 0x1fff0008e0
==85446== to suppress, use: --max-stackframe=137343816528 or greater
==85446== Warning: client switching stacks? SP change: 0x1fff0008c0 --> 0x4abb590
==85446== to suppress, use: --max-stackframe=137343816496 or greater
==85446== further instances of this message will not be shown.
==85446== Invalid write of size 8
==85446== at 0x4E31A66: uv_timer_init (timer.c:64)
==85446== by 0x4CB406D: fsw::Coroutine::sleep(double) (coroutine.cc:82)
==85446== by 0x10933F: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446== by 0x109517: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446== by 0x4CB3DE4: fsw::Context::context_func(void*) (context.cc:58)
==85446== by 0x4CB6830: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==85446== Address 0x50540a0 is 32 bytes inside a block of size 152 free'd
==85446== at 0x489DBDF: operator delete(void*) (vg_replace_malloc.c:576)
==85446== by 0x4CB40DD: fsw::Coroutine::sleep(double) (coroutine.cc:86)
==85446== by 0x10933F: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446== by 0x109517: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446== by 0x4CB3DE4: fsw::Context::context_func(void*) (context.cc:58)
==85446== by 0x4CB6830: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==85446== Block was alloc'd at
==85446== at 0x489CCF5: operator new(unsigned long) (vg_replace_malloc.c:334)
==85446== by 0x4CB402E: fsw::Coroutine::sleep(double) (coroutine.cc:74)
==85446== by 0x10933F: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446== by 0x109517: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446== by 0x4CB3DE4: fsw::Context::context_func(void*) (context.cc:58)
==85446== by 0x4CB6830: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==85446==
但是,我通过这些信息还是不知道如何解决。看样子是定时器那一块出错了。问题代码应该是:
static void sleep_timeout(uv_timer_t *timer)
{
fswTrace("coroutine[%ld] sleep timeout", ((Coroutine *) timer->data)->get_cid());
((Coroutine *) timer->data)->resume();
}
int Coroutine::sleep(double seconds)
{
uv_timer_t *timer;
Coroutine *co = Coroutine::get_current();
fswTrace("coroutine[%ld] sleep", co->cid);
try
{
timer = new uv_timer_t();
}
catch(const std::bad_alloc& e)
{
fswError("%s", e.what());
}
timer->data = co;
uv_timer_init(uv_default_loop(), timer);
uv_timer_start(timer, sleep_timeout, seconds * 1000, 0);
co->yield();
delete timer;
timer = nullptr;
return 0;
}
刚才我编写了一个新的测试代码:
#include <iostream>
#include "fsw/coroutine.h"
#include "fsw/fsw.h"
using namespace fsw;
using namespace std;
int main(int argc, char const *argv[])
{
fsw_event_init();
while (true)
{
Coroutine::create([](void *arg)
{
Coroutine *co = Coroutine::get_current();
int cid = co->get_cid();
cout << cid << endl;
Coroutine::sleep(0.5);
cout << cid << endl;
});
fsw_event_wait();
}
return 0;
}
也是报了写入非法内存的错误:
~/codeDir/cppCode/fsw/examples # valgrind --track-origins=yes ./a.out
==87209== Memcheck, a memory error detector
==87209== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==87209== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==87209== Command: ./a.out
==87209==
==87209== Warning: client switching stacks? SP change: 0x1fff0008e0 --> 0x4abb708
==87209== to suppress, use: --max-stackframe=137343816152 or greater
1
[2019-09-10 07:40:24] TRACE sleep: coroutine[1] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:40:24] TRACE sleep: coroutine[1] new timer[0x4abb850] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
[2019-09-10 07:40:24] TRACE yield: coroutine[1] yield in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 39.
==87209== Warning: client switching stacks? SP change: 0x4abb600 --> 0x1fff0008e0
==87209== to suppress, use: --max-stackframe=137343816416 or greater
[2019-09-10 07:40:24] TRACE sleep_timeout: coroutine[1] sleep timeout in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 62.
[2019-09-10 07:40:24] TRACE resume: coroutine[1] resume in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 47.
==87209== Warning: client switching stacks? SP change: 0x1fff000880 --> 0x4abb600
==87209== to suppress, use: --max-stackframe=137343816320 or greater
==87209== further instances of this message will not be shown.
1
[2019-09-10 07:40:24] TRACE resume: coroutine[1] end in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 53.
2
[2019-09-10 07:40:24] TRACE sleep: coroutine[2] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:40:24] TRACE sleep: coroutine[2] new timer[0x5054080] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
==87209== Invalid write of size 8
==87209== at 0x4E1DA66: uv_timer_init (timer.c:64)
==87209== by 0x4CB4330: fsw::Coroutine::sleep(double) (coroutine.cc:83)
==87209== by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209== by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209== by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87209== by 0x4CB6C00: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87209== Address 0x4abb870 is 32 bytes inside a block of size 152 free'd
==87209== at 0x489DBDF: operator delete(void*) (vg_replace_malloc.c:576)
==87209== by 0x4CB43A0: fsw::Coroutine::sleep(double) (coroutine.cc:87)
==87209== by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209== by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209== by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87209== by 0x4CB6C00: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87209== Block was alloc'd at
==87209== at 0x489CCF5: operator new(unsigned long) (vg_replace_malloc.c:334)
==87209== by 0x4CB428F: fsw::Coroutine::sleep(double) (coroutine.cc:74)
==87209== by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209== by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209== by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87209== by 0x4CB6C00: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87209==
看样子是协程使用了被delete
的timer
。但是我打印日志,确定只有当协程sleep timeout
之后,才会去释放timer
:
~/codeDir/cppCode/fsw/examples # valgrind --track-origins=yes ./a.out
==87271== Memcheck, a memory error detector
==87271== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==87271== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==87271== Command: ./a.out
==87271==
==87271== Warning: client switching stacks? SP change: 0x1fff0008e0 --> 0x4abb708
==87271== to suppress, use: --max-stackframe=137343816152 or greater
1
[2019-09-10 07:50:29] TRACE sleep: coroutine[1] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:50:29] TRACE sleep: coroutine[1] new timer[0x4abb850] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
[2019-09-10 07:50:29] TRACE yield: coroutine[1] yield in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 39.
==87271== Warning: client switching stacks? SP change: 0x4abb600 --> 0x1fff0008e0
==87271== to suppress, use: --max-stackframe=137343816416 or greater
[2019-09-10 07:50:29] TRACE sleep_timeout: coroutine[1] sleep timeout in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 62.
[2019-09-10 07:50:29] TRACE resume: coroutine[1] resume in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 47.
==87271== Warning: client switching stacks? SP change: 0x1fff000880 --> 0x4abb600
==87271== to suppress, use: --max-stackframe=137343816320 or greater
==87271== further instances of this message will not be shown.
[2019-09-10 07:50:30] TRACE sleep: coroutine[1] free timer[0x4abb850] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 87.
1
[2019-09-10 07:50:30] TRACE resume: coroutine[1] end in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 53.
2
[2019-09-10 07:50:30] TRACE sleep: coroutine[2] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:50:30] TRACE sleep: coroutine[2] new timer[0x5054080] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
==87271== Invalid write of size 8
==87271== at 0x4E1DA66: uv_timer_init (timer.c:64)
==87271== by 0x4CB4330: fsw::Coroutine::sleep(double) (coroutine.cc:83)
==87271== by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271== by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271== by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87271== by 0x4CB6C60: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87271== Address 0x4abb870 is 32 bytes inside a block of size 152 free'd
==87271== at 0x489DBDF: operator delete(void*) (vg_replace_malloc.c:576)
==87271== by 0x4CB4402: fsw::Coroutine::sleep(double) (coroutine.cc:88)
==87271== by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271== by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271== by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87271== by 0x4CB6C60: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87271== Block was alloc'd at
==87271== at 0x489CCF5: operator new(unsigned long) (vg_replace_malloc.c:334)
==87271== by 0x4CB428F: fsw::Coroutine::sleep(double) (coroutine.cc:74)
==87271== by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271== by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271== by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87271== by 0x4CB6C60: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87271==
[2019-09-10 07:50:30] TRACE yield: coroutine[2] yield in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 39.
[2019-09-10 07:50:30] TRACE sleep_timeout: coroutine[2] sleep timeout in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 62.
[2019-09-10 07:50:30] TRACE resume: coroutine[2] resume in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 47.
[2019-09-10 07:50:30] TRACE sleep: coroutine[2] free timer[0x5054080] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 87.
2
[2019-09-10 07:50:30] TRACE resume: coroutine[2] end in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 53.
3
[2019-09-10 07:50:30] TRACE sleep: coroutine[3] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:50:30] TRACE sleep: coroutine[3] new timer[0x5054260] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
[2019-09-10 07:50:30] TRACE yield: coroutine[3] yield in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 39.
[2019-09-10 07:50:31] TRACE sleep_timeout: coroutine[3] sleep timeout in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 62.
[2019-09-10 07:50:31] TRACE resume: coroutine[3] resume in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 47.
[2019-09-10 07:50:31] TRACE sleep: coroutine[3] free timer[0x5054260] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 87.
3
[2019-09-10 07:50:31] TRACE resume: coroutine[3] end in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 53.
4
[2019-09-10 07:50:31] TRACE sleep: coroutine[4] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:50:31] TRACE sleep: coroutine[4] new timer[0x5054440] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
[2019-09-10 07:50:31] TRACE yield: coroutine[4] yield in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 39.
^C==87271==
==87271== Process terminating with default action of signal 2 (SIGINT)
==87271== at 0x40213D0: epoll_pwait (in /lib/ld-musl-x86_64.so.1)
==87271== by 0x10930F: main (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271==
==87271== HEAP SUMMARY:
==87271== in use at exit: 2,172,436 bytes in 15 blocks
==87271== total heap usage: 33 allocs, 18 frees, 8,465,308 bytes allocated
==87271==
==87271== LEAK SUMMARY:
==87271== definitely lost: 0 bytes in 0 blocks
==87271== indirectly lost: 0 bytes in 0 blocks
==87271== possibly lost: 0 bytes in 0 blocks
==87271== still reachable: 2,172,436 bytes in 15 blocks
==87271== suppressed: 0 bytes in 0 blocks
==87271== Rerun with --leak-check=full to see details of leaked memory
==87271==
==87271== For counts of detected and suppressed errors, rerun with: -v
==87271== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 0 from 0)
~/codeDir/cppCode/fsw/examples #
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通过调试,我发现是我使用
libuv
的timer
的方式又问题,释放timer
的时候不对。目前正在寻找正确释放timer
的方式。发现是无法正常释放
libuv
的定时器的,因为想要正常释放libuv
的定时器,必须要用uv_close
的回调函数来释放。而回调函数的执行,必须要用libuv
的那套事件循环,也就是要执行uv_run
这个函数。但是,我有一套自己的事件循环,所以,释放libuv
的timer
是不可能的了。因此,解决方案只能是自己去实现一个定时器。