使用共享库的分段错误

发布于 2024-07-29 04:57:46 字数 5378 浏览 2 评论 0原文

我有一个共享库(即 libXXX.so)与关联的 cpp/h 文件。 它们包含许多函数指针(指向 .so 函数入口点)和一个将该函数包装为该类的方法的类。

即: .h 文件:

typedef void* handle;
/* wrapper functions */
handle okUsbFrontPanel_Construct();
void okUsbFrontPanel_Destruct(handle hnd);

/* wrapper class */
class okCUsbFrontPanel
{
public:
  handle h;
public:
  okCUsbFrontPanel();
  ~okCUsbFrontPanel();
};

.cpp 文件

/* class methods */
okCUsbFrontPanel::okCUsbFrontPanel()
  { h=okUsbFrontPanel_Construct(); }
okCUsbFrontPanel::~okCUsbFrontPanel()
  { okUsbFrontPanel_Destruct(h); }
/* function pointers */
typedef handle  (*OKUSBFRONTPANEL_CONSTRUCT_FN) (void);
typedef void    (*OKUSBFRONTPANEL_DESTRUCT_FN)  (handle);
OKUSBFRONTPANEL_CONSTRUCT_FN    _okUsbFrontPanel_Construct = NULL;
OKUSBFRONTPANEL_DESTRUCT_FN _okUsbFrontPanel_Destruct = NULL;
/* load lib function */
Bool LoadLib(char *libname){
  void *hLib = dlopen(libname, RTLD_NOW);
  if(hLib){
    _okUsbFrontPanel_Construct = ( OKUSBFRONTPANEL_CONSTRUCT_FN ) dlsym(hLib, "okUsbFrontPanel_Construct");
    _okUsbFrontPanel_Destruct = ( OKUSBFRONTPANEL_DESTRUCT_FN ) dlsym( hLib, "okUsbFrontPanel_Destruct" );
  }
}
/* construct function */
handle okUsbFrontPanel_Construct(){
  if (_okUsbFrontPanel_Construct){
    handle h = (*_okUsbFrontPanel_Construct)(); //calls function pointer
    return h;
  }
  return(NULL);
}

void okUsbFrontPanel_Destruct(handle hnd)
{
  if (_okUsbFrontPanel_Destruct)
    (*_okUsbFrontPanel_Destruct)(hnd);
}

然后我有另一个共享库(由我自己制作),它调用:

LoadLib("libXXX.so");
okCusbFrontPanel *device = new okCusbFrontPanel();

导致分段错误。 分段错误似乎发生在

handle h = (*_okUsbFrontPanel_Construct)();

但有一个奇怪的行为:一旦到达,

(*_okUsbFrontPanel_Construct)(); 

我就会递归到 okUsbFrontPanel_Construct()。

有人有什么主意吗?

编辑:这是通过 gdb 运行获得的回溯。

#0  0x007590b0 in _IO_new_do_write () from /lib/tls/libc.so.6
#1  0x00759bb8 in _IO_new_file_overflow () from /lib/tls/libc.so.6
#2  0x0075a83d in _IO_new_file_xsputn () from /lib/tls/libc.so.6
#3  0x00736db7 in vfprintf () from /lib/tls/libc.so.6
#4  0x0073ecd0 in printf () from /lib/tls/libc.so.6
#5  0x02cb68ca in okCUsbFrontPanel (this=0x9d0ae28) at okFrontPanelDLL.cpp:167
#6  0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#7  0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
#8  0x02cb68db in okCUsbFrontPanel (this=0x9d0ade8) at okFrontPanelDLL.cpp:169
#9  0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#10 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
#11 0x02cb68db in okCUsbFrontPanel (this=0x9d0ada8) at okFrontPanelDLL.cpp:169
#12 0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#13 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107

等等... 恕我直言,由于某种堆栈溢出,我遇到了段错误。 有太多的递归调用并且出了问题。

顺便说一下,我使用的是 Scientific Linux 4 发行版(基于 RH4)。

编辑2:

函数 okUsbFrontPanel_Construct 的 libokFrontPanel.so 的 objdump 输出:

00009316 <okUsbFrontPanel_Construct>:
9316:   55                      push   ebp  
9317:   89 e5                   mov    ebp,esp
9319:   56                      push   esi
931a:   53                      push   ebx
931b:   83 ec 30                sub    esp,0x30
931e:   e8 44 f4 ff ff          call   8767 <__i686.get_pc_thunk.bx>
9323:   81 c3 dd bd 00 00       add    ebx,0xbddd
9329:   c7 04 24 38 00 00 00    mov    DWORD PTR [esp],0x38
9330:   e8 93 ec ff ff          call   7fc8 <_Znwj@plt>
9335:   89 45 e4                mov    DWORD PTR [ebp-28],eax
9338:   8b 45 e4                mov    eax,DWORD PTR [ebp-28]
933b:   89 04 24                mov    DWORD PTR [esp],eax
933e:   e8 65 ed ff ff          call   80a8 <_ZN16okCUsbFrontPanelC1Ev@plt>
9343:   8b 45 e4                mov    eax,DWORD PTR [ebp-28]
9346:   89 45 f4                mov    DWORD PTR [ebp-12],eax
9349:   8b 45 f4                mov    eax,DWORD PTR [ebp-12]
934c:   89 45 e0                mov    DWORD PTR [ebp-32],eax
934f:   eb 1f                   jmp    9370 <okUsbFrontPanel_Construct+0x5a>
9351:   89 45 dc                mov    DWORD PTR [ebp-36],eax
9354:   8b 75 dc                mov    esi,DWORD PTR [ebp-36]
9357:   8b 45 e4                mov    eax,DWORD PTR [ebp-28]
935a:   89 04 24                mov    DWORD PTR [esp],eax
935d:   e8 d6 f2 ff ff          call   8638 <_ZdlPv@plt>
9362:   89 75 dc                mov    DWORD PTR [ebp-36],esi
9365:   8b 45 dc                mov    eax,DWORD PTR [ebp-36]
9368:   89 04 24                mov    DWORD PTR [esp],eax
936b:   e8 a8 f0 ff ff          call   8418 <_Unwind_Resume@plt>
9370:   8b 45 e0                mov    eax,DWORD PTR [ebp-32]
9373:   83 c4 30                add    esp,0x30
9376:   5b                      pop    ebx
9377:   5e                      pop    esi
9378:   5d                      pop    ebp
9379:   c3                      ret    

在 933e 确实有一个对 <_ZN16okCUsbFrontPanelC1Ev@plt> 的调用。这个调用是否与我的 .cpp 中的调用混淆了?

I have a shared library (namely libXXX.so) with a cpp/h file associated.
They contains a number of function pointers ( to point to .so function entrypoint) and a class to wrap this functions as methods of the said class.

ie: .h file:

typedef void* handle;
/* wrapper functions */
handle okUsbFrontPanel_Construct();
void okUsbFrontPanel_Destruct(handle hnd);

/* wrapper class */
class okCUsbFrontPanel
{
public:
  handle h;
public:
  okCUsbFrontPanel();
  ~okCUsbFrontPanel();
};

.cpp file

/* class methods */
okCUsbFrontPanel::okCUsbFrontPanel()
  { h=okUsbFrontPanel_Construct(); }
okCUsbFrontPanel::~okCUsbFrontPanel()
  { okUsbFrontPanel_Destruct(h); }
/* function pointers */
typedef handle  (*OKUSBFRONTPANEL_CONSTRUCT_FN) (void);
typedef void    (*OKUSBFRONTPANEL_DESTRUCT_FN)  (handle);
OKUSBFRONTPANEL_CONSTRUCT_FN    _okUsbFrontPanel_Construct = NULL;
OKUSBFRONTPANEL_DESTRUCT_FN _okUsbFrontPanel_Destruct = NULL;
/* load lib function */
Bool LoadLib(char *libname){
  void *hLib = dlopen(libname, RTLD_NOW);
  if(hLib){
    _okUsbFrontPanel_Construct = ( OKUSBFRONTPANEL_CONSTRUCT_FN ) dlsym(hLib, "okUsbFrontPanel_Construct");
    _okUsbFrontPanel_Destruct = ( OKUSBFRONTPANEL_DESTRUCT_FN ) dlsym( hLib, "okUsbFrontPanel_Destruct" );
  }
}
/* construct function */
handle okUsbFrontPanel_Construct(){
  if (_okUsbFrontPanel_Construct){
    handle h = (*_okUsbFrontPanel_Construct)(); //calls function pointer
    return h;
  }
  return(NULL);
}

void okUsbFrontPanel_Destruct(handle hnd)
{
  if (_okUsbFrontPanel_Destruct)
    (*_okUsbFrontPanel_Destruct)(hnd);
}

Then I have another shared library (made by myself) which calls:

LoadLib("libXXX.so");
okCusbFrontPanel *device = new okCusbFrontPanel();

resulting in a Segmentation fault.
The segmentation fault seems to happen at

handle h = (*_okUsbFrontPanel_Construct)();

but with a strange behaviour: once I reach

(*_okUsbFrontPanel_Construct)(); 

I get a recursion to okUsbFrontPanel_Construct().

Does anyone have any idea?

EDIT: here is a backtrace obtained by a run with gdb.

#0  0x007590b0 in _IO_new_do_write () from /lib/tls/libc.so.6
#1  0x00759bb8 in _IO_new_file_overflow () from /lib/tls/libc.so.6
#2  0x0075a83d in _IO_new_file_xsputn () from /lib/tls/libc.so.6
#3  0x00736db7 in vfprintf () from /lib/tls/libc.so.6
#4  0x0073ecd0 in printf () from /lib/tls/libc.so.6
#5  0x02cb68ca in okCUsbFrontPanel (this=0x9d0ae28) at okFrontPanelDLL.cpp:167
#6  0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#7  0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
#8  0x02cb68db in okCUsbFrontPanel (this=0x9d0ade8) at okFrontPanelDLL.cpp:169
#9  0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#10 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
#11 0x02cb68db in okCUsbFrontPanel (this=0x9d0ada8) at okFrontPanelDLL.cpp:169
#12 0x03cac343 in okUsbFrontPanel_Construct () from /opt/atlas/tdaq/tdaq-02-00-00/installed/i686-slc4-gcc34-dbg/lib/libokFrontPanel.so
#13 0x02cb8f36 in okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107

and so on...
IMHO I get a seg fault becouse of a sort of stack overflow. There are too many recursive call and something goes wrong..

By the way I'm on a Scientific Linux 4 distro (based on RH4).

EDIT2:

an objdump of libokFrontPanel.so for function okUsbFrontPanel_Construct outputs:

00009316 <okUsbFrontPanel_Construct>:
9316:   55                      push   ebp  
9317:   89 e5                   mov    ebp,esp
9319:   56                      push   esi
931a:   53                      push   ebx
931b:   83 ec 30                sub    esp,0x30
931e:   e8 44 f4 ff ff          call   8767 <__i686.get_pc_thunk.bx>
9323:   81 c3 dd bd 00 00       add    ebx,0xbddd
9329:   c7 04 24 38 00 00 00    mov    DWORD PTR [esp],0x38
9330:   e8 93 ec ff ff          call   7fc8 <_Znwj@plt>
9335:   89 45 e4                mov    DWORD PTR [ebp-28],eax
9338:   8b 45 e4                mov    eax,DWORD PTR [ebp-28]
933b:   89 04 24                mov    DWORD PTR [esp],eax
933e:   e8 65 ed ff ff          call   80a8 <_ZN16okCUsbFrontPanelC1Ev@plt>
9343:   8b 45 e4                mov    eax,DWORD PTR [ebp-28]
9346:   89 45 f4                mov    DWORD PTR [ebp-12],eax
9349:   8b 45 f4                mov    eax,DWORD PTR [ebp-12]
934c:   89 45 e0                mov    DWORD PTR [ebp-32],eax
934f:   eb 1f                   jmp    9370 <okUsbFrontPanel_Construct+0x5a>
9351:   89 45 dc                mov    DWORD PTR [ebp-36],eax
9354:   8b 75 dc                mov    esi,DWORD PTR [ebp-36]
9357:   8b 45 e4                mov    eax,DWORD PTR [ebp-28]
935a:   89 04 24                mov    DWORD PTR [esp],eax
935d:   e8 d6 f2 ff ff          call   8638 <_ZdlPv@plt>
9362:   89 75 dc                mov    DWORD PTR [ebp-36],esi
9365:   8b 45 dc                mov    eax,DWORD PTR [ebp-36]
9368:   89 04 24                mov    DWORD PTR [esp],eax
936b:   e8 a8 f0 ff ff          call   8418 <_Unwind_Resume@plt>
9370:   8b 45 e0                mov    eax,DWORD PTR [ebp-32]
9373:   83 c4 30                add    esp,0x30
9376:   5b                      pop    ebx
9377:   5e                      pop    esi
9378:   5d                      pop    ebp
9379:   c3                      ret    

at 933e there is indeed a call to <_ZN16okCUsbFrontPanelC1Ev@plt>.Is this call that gets confused with the one inside my .cpp?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

诗化ㄋ丶相逢 2024-08-05 04:57:46

现在您已经发布了 GDB 输出,很清楚您的问题是什么。

您在 libokFrontPanel.solibLoadLibrary.so 中定义了相同的符号(由于缺乏更好的名称 - 它如此)当事物被正确命名时更容易解释),并且导致了无限递归。

默认情况下,在 UNIX 上(与 Windows 不同),所有共享库(和主可执行文件)中的所有全局符号都进入单个“加载程序符号名称空间”。

除此之外,这意味着如果您在主可执行文件中定义 malloc malloc 将被所有共享库调用,包括 libc (尽管 libc 有自己的 malloc 定义)。

所以,发生的事情是这样的:在 libLoadLibrary.so 中,您定义了 okCUsbFrontPanel 构造函数。 我断言在 libokFrontPanel.so 中也有该确切符号的定义。 对此构造函数的所有调用(默认情况下)都会转到第一个定义(动态加载器首先观察到的定义),即使 libokFrontPanel.so 的创建者无意为了这件事的发生。 循环是(按照 GDB 打印它们的顺序 - 最里面的框架在顶部):

 #1 okCUsbFrontPanel () at okFrontPanelDLL.cpp:169
 #3 okUsbFrontPanel_Construct () from libokFrontPanel.so
 #2 okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
 #1 okCUsbFrontPanel () at okFrontPanelDLL.cpp:169

#3 对构造函数的调用旨在转到符号 #4 - okCUsbFrontPanel 构造函数内部 libokFrontPanel.so。 相反,它转到了之前在 libLoadLibrary.so 中看到的定义:您“抢占”了符号 #4,从而形成了无限递归循环。

道德:不要在多个库中定义相同的符号,除非您了解运行时加载程序决定哪些符号引用绑定到哪些定义的规则。

编辑:回答问题的“编辑2”:
是的,从 okUsbFrontPanel_Construct_ZN16okCUsbFrontPanelC1Ev 的调用将转到 okFrontPanelDLL.cpp 内该方法的定义。
检查 objdump -d okFrontPanelDLL.o 可能会有所启发。

Now that you've posted GDB output, it's clear exactly what your problem is.

You are defining the same symbols in libokFrontPanel.so and in the libLoadLibrary.so (for lack of a better name -- it is so much easier to explain things when they are named properly), and that is causing the infinite recursion.

By default on UNIX (unlike on Windows) all global symbols from all shared libraries (and the main executable) go into single "loader symbol name space".

Among other things, this means that if you define malloc in the main executable, that malloc will be called by all shared libraries, including libc (even though libc has its own malloc definition).

So, here is what's happening: in libLoadLibrary.so you defined okCUsbFrontPanel constructor. I assert that there is also a definition of that exact symbol in libokFrontPanel.so. All calls to this constructor (by default) go to the first definition (the one that the dynamic loader first observed), even though the creators of libokFrontPanel.so did not intend for this to happen. The loop is (in the same order GDB printed them -- innermost frame on top):

 #1 okCUsbFrontPanel () at okFrontPanelDLL.cpp:169
 #3 okUsbFrontPanel_Construct () from libokFrontPanel.so
 #2 okUsbFrontPanel_Construct () at okFrontPanelDLL.cpp:1107
 #1 okCUsbFrontPanel () at okFrontPanelDLL.cpp:169

The call to constructor from #3 was intended to go to symbol #4 -- okCUsbFrontPanel constructor inside libokFrontPanel.so. Instead it went to previously seen definition inside libLoadLibrary.so: you "preempted" symbol #4, and thus formed an infinite recursion loop.

Moral: do not define the same symbols in multiple libraries, unless you understand the rules by which the runtime loader decides which symbol references are bound to which definitions.

EDIT: To answer 'EDIT2' of the question:
Yes, the call to _ZN16okCUsbFrontPanelC1Ev from okUsbFrontPanel_Construct is going to the definition of that method inside your okFrontPanelDLL.cpp.
It might be illuminating to examine objdump -d okFrontPanelDLL.o

不忘初心 2024-08-05 04:57:46

与 Norman Ramsey 所说的相反,诊断段错误的首选工具是 GDB,而不是 valgrind。
后者仅对某些有用 > 各种段错误(主要与堆损坏有关;这里似乎不是这种情况)。

我的水晶球说您的 dlopen() 失败(如果/当发生这种情况时,您应该打印 dlerror()!),并且您的 _okUsbFrontPanel_Construct保持NULL。 在 GDB 中,您将立即能够判断该猜测是否正确。

我的猜测与您的陈述相矛盾,即您“递归到 okUsbFrontPanel_Construct()”。 但是,如果你不使用 GDB 查看,你怎么知道你得到了这样的递归呢?

Contrary to what Norman Ramsey says, a tool of choice for diagnosing segfaults is GDB, not valgrind.
The latter is only useful for certain kinds of segfaults (mostly these related to heap corruption; which doesn't appear to be the case here).

My crystal ball says that your dlopen() fails (you should print dlerror() if/when that happens!), and that your _okUsbFrontPanel_Construct remains NULL. In GDB you will immediately be able to tell whether that guess is correct.

My guess contradicts your statement that you "get a recursion to okUsbFrontPanel_Construct()". But just how can you know that you get such recursion, if you didn't look with GDB?

落在眉间の轻吻 2024-08-05 04:57:46

诊断段错误的首选工具是 valgrind。 如果您误用了指针或内存,valgrind 会发现问题并在段错误发生之前为您提供堆栈跟踪。 在常见问题解答中,valgrind 声称只要您不调用 dlclose() 就可以正常处理共享库。

如果您以前从未使用过 valgrind,我想您会惊讶于它的简单和强大。 您只需使用“valgrind”作为命令行的第一个单词,它就会发现您的内存错误。 好东西! Vladislav Vyshemirsky 的博客上有一个简短示例会话

The tool of choice for diagnosing segfaults is valgrind. If you are misusing pointers or memory valgrind will find the problem and give you a stack trace well before the segfault occurs. On the FAQ, valgrind claims to handle shared libraries OK as long as you don't call dlclose().

If you have never used valgrind before I think you will be astonished at how easy and powerful it is. You just use 'valgrind' as the first word of your command line, and it finds your memory errors. Great stuff! There's a short example session on Vladislav Vyshemirsky's blog.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文