使用共享库的分段错误
我有一个共享库(即 libXXX.so)与关联的 cpp/h 文件。 它们包含许多函数指针(指向 .so 函数入口点)和一个将该函数包装为该类的方法的类。
即: .h 文件:
typedef void* handle;
/* wrapper functions */
handle okUsbFrontPanel_Construct();
void okUsbFrontPanel_Destruct(handle hnd);
/* wrapper class */
class okCUsbFrontPanel
{
public:
handle h;
public:
okCUsbFrontPanel();
~okCUsbFrontPanel();
};
.cpp 文件
/* class methods */
okCUsbFrontPanel::okCUsbFrontPanel()
{ h=okUsbFrontPanel_Construct(); }
okCUsbFrontPanel::~okCUsbFrontPanel()
{ okUsbFrontPanel_Destruct(h); }
/* function pointers */
typedef handle (*OKUSBFRONTPANEL_CONSTRUCT_FN) (void);
typedef void (*OKUSBFRONTPANEL_DESTRUCT_FN) (handle);
OKUSBFRONTPANEL_CONSTRUCT_FN _okUsbFrontPanel_Construct = NULL;
OKUSBFRONTPANEL_DESTRUCT_FN _okUsbFrontPanel_Destruct = NULL;
/* load lib function */
Bool LoadLib(char *libname){
void *hLib = dlopen(libname, RTLD_NOW);
if(hLib){
_okUsbFrontPanel_Construct = ( OKUSBFRONTPANEL_CONSTRUCT_FN ) dlsym(hLib, "okUsbFrontPanel_Construct");
_okUsbFrontPanel_Destruct = ( OKUSBFRONTPANEL_DESTRUCT_FN ) dlsym( hLib, "okUsbFrontPanel_Destruct" );
}
}
/* construct function */
handle okUsbFrontPanel_Construct(){
if (_okUsbFrontPanel_Construct){
handle h = (*_okUsbFrontPanel_Construct)(); //calls function pointer
return h;
}
return(NULL);
}
void okUsbFrontPanel_Destruct(handle hnd)
{
if (_okUsbFrontPanel_Destruct)
(*_okUsbFrontPanel_Destruct)(hnd);
}
然后我有另一个共享库(由我自己制作),它调用:
LoadLib("libXXX.so");
okCusbFrontPanel *device = new okCusbFrontPanel();
导致分段错误。 分段错误似乎发生在
handle h = (*_okUsbFrontPanel_Construct)();
但有一个奇怪的行为:一旦到达,
(*_okUsbFrontPanel_Construct)();
我就会递归到 okUsbFrontPanel_Construct()。
有人有什么主意吗?
编辑:这是通过 gdb 运行获得的回溯。
#0 0x007590b0 in _IO_new_do_write () from /lib/tls/libc.so.6
#1 0x00759bb8 in _IO_new_file_overflow () from /lib/tls/libc.so.6
#2 0x0075a83d in _IO_new_file_xsputn () from /lib/tls/libc.so.6
#3 0x00736db7 in vfprintf () from /lib/tls/libc.so.6
#4 0x0073ecd0 in printf () from /lib/tls/libc.so.6
#5 0x02cb68ca in okCUsbFrontPanel (this=0x9d0ae28) at okFrontPanelDLL.cpp:167
#6 0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#7 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
#8 0x02cb68db in okCUsbFrontPanel (this=0x9d0ade8) at okFrontPanelDLL.cpp:169
#9 0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#10 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
#11 0x02cb68db in okCUsbFrontPanel (this=0x9d0ada8) at okFrontPanelDLL.cpp:169
#12 0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#13 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
等等... 恕我直言,由于某种堆栈溢出,我遇到了段错误。 有太多的递归调用并且出了问题。
顺便说一下,我使用的是 Scientific Linux 4 发行版(基于 RH4)。
编辑2:
函数 okUsbFrontPanel_Construct 的 libokFrontPanel.so 的 objdump 输出:
00009316 <okUsbFrontPanel_Construct>:
9316: 55 push ebp
9317: 89 e5 mov ebp,esp
9319: 56 push esi
931a: 53 push ebx
931b: 83 ec 30 sub esp,0x30
931e: e8 44 f4 ff ff call 8767 <__i686.get_pc_thunk.bx>
9323: 81 c3 dd bd 00 00 add ebx,0xbddd
9329: c7 04 24 38 00 00 00 mov DWORD PTR [esp],0x38
9330: e8 93 ec ff ff call 7fc8 <_Znwj@plt>
9335: 89 45 e4 mov DWORD PTR [ebp-28],eax
9338: 8b 45 e4 mov eax,DWORD PTR [ebp-28]
933b: 89 04 24 mov DWORD PTR [esp],eax
933e: e8 65 ed ff ff call 80a8 <_ZN16okCUsbFrontPanelC1Ev@plt>
9343: 8b 45 e4 mov eax,DWORD PTR [ebp-28]
9346: 89 45 f4 mov DWORD PTR [ebp-12],eax
9349: 8b 45 f4 mov eax,DWORD PTR [ebp-12]
934c: 89 45 e0 mov DWORD PTR [ebp-32],eax
934f: eb 1f jmp 9370 <okUsbFrontPanel_Construct+0x5a>
9351: 89 45 dc mov DWORD PTR [ebp-36],eax
9354: 8b 75 dc mov esi,DWORD PTR [ebp-36]
9357: 8b 45 e4 mov eax,DWORD PTR [ebp-28]
935a: 89 04 24 mov DWORD PTR [esp],eax
935d: e8 d6 f2 ff ff call 8638 <_ZdlPv@plt>
9362: 89 75 dc mov DWORD PTR [ebp-36],esi
9365: 8b 45 dc mov eax,DWORD PTR [ebp-36]
9368: 89 04 24 mov DWORD PTR [esp],eax
936b: e8 a8 f0 ff ff call 8418 <_Unwind_Resume@plt>
9370: 8b 45 e0 mov eax,DWORD PTR [ebp-32]
9373: 83 c4 30 add esp,0x30
9376: 5b pop ebx
9377: 5e pop esi
9378: 5d pop ebp
9379: c3 ret
在 933e 确实有一个对 <_ZN16okCUsbFrontPanelC1Ev@plt> 的调用。这个调用是否与我的 .cpp 中的调用混淆了?
I have a shared library (namely libXXX.so) with a cpp/h file associated.
They contains a number of function pointers ( to point to .so function entrypoint) and a class to wrap this functions as methods of the said class.
ie: .h file:
typedef void* handle;
/* wrapper functions */
handle okUsbFrontPanel_Construct();
void okUsbFrontPanel_Destruct(handle hnd);
/* wrapper class */
class okCUsbFrontPanel
{
public:
handle h;
public:
okCUsbFrontPanel();
~okCUsbFrontPanel();
};
.cpp file
/* class methods */
okCUsbFrontPanel::okCUsbFrontPanel()
{ h=okUsbFrontPanel_Construct(); }
okCUsbFrontPanel::~okCUsbFrontPanel()
{ okUsbFrontPanel_Destruct(h); }
/* function pointers */
typedef handle (*OKUSBFRONTPANEL_CONSTRUCT_FN) (void);
typedef void (*OKUSBFRONTPANEL_DESTRUCT_FN) (handle);
OKUSBFRONTPANEL_CONSTRUCT_FN _okUsbFrontPanel_Construct = NULL;
OKUSBFRONTPANEL_DESTRUCT_FN _okUsbFrontPanel_Destruct = NULL;
/* load lib function */
Bool LoadLib(char *libname){
void *hLib = dlopen(libname, RTLD_NOW);
if(hLib){
_okUsbFrontPanel_Construct = ( OKUSBFRONTPANEL_CONSTRUCT_FN ) dlsym(hLib, "okUsbFrontPanel_Construct");
_okUsbFrontPanel_Destruct = ( OKUSBFRONTPANEL_DESTRUCT_FN ) dlsym( hLib, "okUsbFrontPanel_Destruct" );
}
}
/* construct function */
handle okUsbFrontPanel_Construct(){
if (_okUsbFrontPanel_Construct){
handle h = (*_okUsbFrontPanel_Construct)(); //calls function pointer
return h;
}
return(NULL);
}
void okUsbFrontPanel_Destruct(handle hnd)
{
if (_okUsbFrontPanel_Destruct)
(*_okUsbFrontPanel_Destruct)(hnd);
}
Then I have another shared library (made by myself) which calls:
LoadLib("libXXX.so");
okCusbFrontPanel *device = new okCusbFrontPanel();
resulting in a Segmentation fault.
The segmentation fault seems to happen at
handle h = (*_okUsbFrontPanel_Construct)();
but with a strange behaviour: once I reach
(*_okUsbFrontPanel_Construct)();
I get a recursion to okUsbFrontPanel_Construct().
Does anyone have any idea?
EDIT: here is a backtrace obtained by a run with gdb.
#0 0x007590b0 in _IO_new_do_write () from /lib/tls/libc.so.6
#1 0x00759bb8 in _IO_new_file_overflow () from /lib/tls/libc.so.6
#2 0x0075a83d in _IO_new_file_xsputn () from /lib/tls/libc.so.6
#3 0x00736db7 in vfprintf () from /lib/tls/libc.so.6
#4 0x0073ecd0 in printf () from /lib/tls/libc.so.6
#5 0x02cb68ca in okCUsbFrontPanel (this=0x9d0ae28) at okFrontPanelDLL.cpp:167
#6 0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#7 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
#8 0x02cb68db in okCUsbFrontPanel (this=0x9d0ade8) at okFrontPanelDLL.cpp:169
#9 0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#10 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
#11 0x02cb68db in okCUsbFrontPanel (this=0x9d0ada8) at okFrontPanelDLL.cpp:169
#12 0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#13 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
and so on...
IMHO I get a seg fault becouse of a sort of stack overflow. There are too many recursive call and something goes wrong..
By the way I'm on a Scientific Linux 4 distro (based on RH4).
EDIT2:
an objdump of libokFrontPanel.so for function okUsbFrontPanel_Construct outputs:
00009316 <okUsbFrontPanel_Construct>:
9316: 55 push ebp
9317: 89 e5 mov ebp,esp
9319: 56 push esi
931a: 53 push ebx
931b: 83 ec 30 sub esp,0x30
931e: e8 44 f4 ff ff call 8767 <__i686.get_pc_thunk.bx>
9323: 81 c3 dd bd 00 00 add ebx,0xbddd
9329: c7 04 24 38 00 00 00 mov DWORD PTR [esp],0x38
9330: e8 93 ec ff ff call 7fc8 <_Znwj@plt>
9335: 89 45 e4 mov DWORD PTR [ebp-28],eax
9338: 8b 45 e4 mov eax,DWORD PTR [ebp-28]
933b: 89 04 24 mov DWORD PTR [esp],eax
933e: e8 65 ed ff ff call 80a8 <_ZN16okCUsbFrontPanelC1Ev@plt>
9343: 8b 45 e4 mov eax,DWORD PTR [ebp-28]
9346: 89 45 f4 mov DWORD PTR [ebp-12],eax
9349: 8b 45 f4 mov eax,DWORD PTR [ebp-12]
934c: 89 45 e0 mov DWORD PTR [ebp-32],eax
934f: eb 1f jmp 9370 <okUsbFrontPanel_Construct+0x5a>
9351: 89 45 dc mov DWORD PTR [ebp-36],eax
9354: 8b 75 dc mov esi,DWORD PTR [ebp-36]
9357: 8b 45 e4 mov eax,DWORD PTR [ebp-28]
935a: 89 04 24 mov DWORD PTR [esp],eax
935d: e8 d6 f2 ff ff call 8638 <_ZdlPv@plt>
9362: 89 75 dc mov DWORD PTR [ebp-36],esi
9365: 8b 45 dc mov eax,DWORD PTR [ebp-36]
9368: 89 04 24 mov DWORD PTR [esp],eax
936b: e8 a8 f0 ff ff call 8418 <_Unwind_Resume@plt>
9370: 8b 45 e0 mov eax,DWORD PTR [ebp-32]
9373: 83 c4 30 add esp,0x30
9376: 5b pop ebx
9377: 5e pop esi
9378: 5d pop ebp
9379: c3 ret
at 933e there is indeed a call to <_ZN16okCUsbFrontPanelC1Ev@plt>.Is this call that gets confused with the one inside my .cpp?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
现在您已经发布了 GDB 输出,很清楚您的问题是什么。
您在
libokFrontPanel.so
和libLoadLibrary.so
中定义了相同的符号(由于缺乏更好的名称 - 它如此)当事物被正确命名时更容易解释),并且这导致了无限递归。默认情况下,在 UNIX 上(与 Windows 不同),所有共享库(和主可执行文件)中的所有全局符号都进入单个“加载程序符号名称空间”。
除此之外,这意味着如果您在主可执行文件中定义
malloc
,malloc
将被所有共享库调用,包括libc
(尽管libc
有自己的malloc
定义)。所以,发生的事情是这样的:在
libLoadLibrary.so
中,您定义了okCUsbFrontPanel
构造函数。 我断言在libokFrontPanel.so
中也有该确切符号的定义。 对此构造函数的所有调用(默认情况下)都会转到第一个定义(动态加载器首先观察到的定义),即使 libokFrontPanel.so 的创建者无意为了这件事的发生。 循环是(按照GDB
打印它们的顺序 - 最里面的框架在顶部):从
#3
对构造函数的调用旨在转到符号 #4 -okCUsbFrontPanel
构造函数内部libokFrontPanel.so
。 相反,它转到了之前在 libLoadLibrary.so 中看到的定义:您“抢占”了符号 #4,从而形成了无限递归循环。道德:不要在多个库中定义相同的符号,除非您了解运行时加载程序决定哪些符号引用绑定到哪些定义的规则。
编辑:回答问题的“编辑2”:
是的,从
okUsbFrontPanel_Construct
对_ZN16okCUsbFrontPanelC1Ev
的调用将转到okFrontPanelDLL.cpp
内该方法的定义。检查 objdump -d okFrontPanelDLL.o 可能会有所启发。
Now that you've posted
GDB
output, it's clear exactly what your problem is.You are defining the same symbols in
libokFrontPanel.so
and in thelibLoadLibrary.so
(for lack of a better name -- it is so much easier to explain things when they are named properly), and that is causing the infinite recursion.By default on UNIX (unlike on Windows) all global symbols from all shared libraries (and the main executable) go into single "loader symbol name space".
Among other things, this means that if you define
malloc
in the main executable, thatmalloc
will be called by all shared libraries, includinglibc
(even thoughlibc
has its ownmalloc
definition).So, here is what's happening: in
libLoadLibrary.so
you definedokCUsbFrontPanel
constructor. I assert that there is also a definition of that exact symbol inlibokFrontPanel.so
. All calls to this constructor (by default) go to the first definition (the one that the dynamic loader first observed), even though the creators oflibokFrontPanel.so
did not intend for this to happen. The loop is (in the same orderGDB
printed them -- innermost frame on top):The call to constructor from
#3
was intended to go to symbol #4 --okCUsbFrontPanel
constructor insidelibokFrontPanel.so
. Instead it went to previously seen definition insidelibLoadLibrary.so
: you "preempted" symbol #4, and thus formed an infinite recursion loop.Moral: do not define the same symbols in multiple libraries, unless you understand the rules by which the runtime loader decides which symbol references are bound to which definitions.
EDIT: To answer 'EDIT2' of the question:
Yes, the call to
_ZN16okCUsbFrontPanelC1Ev
fromokUsbFrontPanel_Construct
is going to the definition of that method inside yourokFrontPanelDLL.cpp
.It might be illuminating to examine
objdump -d okFrontPanelDLL.o
与 Norman Ramsey 所说的相反,诊断段错误的首选工具是 GDB,而不是 valgrind。
后者仅对某些有用 > 各种段错误(主要与堆损坏有关;这里似乎不是这种情况)。
我的水晶球说您的
dlopen()
失败(如果/当发生这种情况时,您应该打印dlerror()
!),并且您的_okUsbFrontPanel_Construct
保持NULL
。 在 GDB 中,您将立即能够判断该猜测是否正确。我的猜测与您的陈述相矛盾,即您“递归到 okUsbFrontPanel_Construct()”。 但是,如果你不使用 GDB 查看,你怎么知道你得到了这样的递归呢?
Contrary to what Norman Ramsey says, a tool of choice for diagnosing segfaults is
GDB
, notvalgrind
.The latter is only useful for certain kinds of segfaults (mostly these related to heap corruption; which doesn't appear to be the case here).
My crystal ball says that your
dlopen()
fails (you should printdlerror()
if/when that happens!), and that your_okUsbFrontPanel_Construct
remainsNULL
. InGDB
you will immediately be able to tell whether that guess is correct.My guess contradicts your statement that you "get a recursion to okUsbFrontPanel_Construct()". But just how can you know that you get such recursion, if you didn't look with
GDB
?诊断段错误的首选工具是 valgrind。 如果您误用了指针或内存,valgrind 会发现问题并在段错误发生之前为您提供堆栈跟踪。 在常见问题解答中,valgrind 声称只要您不调用 dlclose() 就可以正常处理共享库。
如果您以前从未使用过 valgrind,我想您会惊讶于它的简单和强大。 您只需使用“valgrind”作为命令行的第一个单词,它就会发现您的内存错误。 好东西! Vladislav Vyshemirsky 的博客上有一个简短示例会话。
The tool of choice for diagnosing segfaults is valgrind. If you are misusing pointers or memory valgrind will find the problem and give you a stack trace well before the segfault occurs. On the FAQ, valgrind claims to handle shared libraries OK as long as you don't call
dlclose()
.If you have never used valgrind before I think you will be astonished at how easy and powerful it is. You just use 'valgrind' as the first word of your command line, and it finds your memory errors. Great stuff! There's a short example session on Vladislav Vyshemirsky's blog.