Ld 神奇地覆盖静态链接符号
几天来我们正在处理非常奇怪的问题。
我无法理解它是如何发生的 - 当第三方(MATLAB)程序使用我们的共享库时,它会以某种方式用它自己的符号覆盖我们的一些符号(准确地说是升压)。这些符号是静态链接的并且(!!)本地的。
事情是这样的——我们使用 boost 1.47,MATLAB 使用 boost 1.40。目前,库在从我们的库调用它们的 boost(正则表达式)时会调用段错误。
所以,神奇之处在于:
- 我们没有库依赖项,ldd:
linux-vdso.so.1 => (0x00007fff4abff000) libpthread.so.0 => /lib/libpthread.so.0 (0x00007f1a3fd65000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f1a3fa51000) libm.so.6 => /lib/libm.so.6 (0x00007f1a3f7cd000) libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f1a3f5bf000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f1a3f3a8000) libc.so.6 => /lib/libc.so.6 (0x00007f1a3f024000) /lib64/ld-linux-x86-64.so.2 (0x00007f1a414f9000) librt.so.1 => /lib/librt.so.1 (0x00007f1a3ee1c000)
- 没有从我们的库导出 Cxx 符号(我们的公共符号是 POC C 以实现二进制兼容性),nm:
nm -g --defined-only libmysharedlib.so addr1 T OurCSymbol1 addr2 T OurCSymbol2 addr3 T OurCSymbol3 ...
- 仍然使用它们的 boost。如何? Stacktrace(路径剪切):
[ 0] 0x00007f21fddbb0a9 bin/libmwfl.so+00454825 fl::sysdep::linux::unwind_stack(void const**, unsigned long, unsigned long, fl::diag::thread_context const&)+000009 [ 1] 0x00007f21fdd74111 bin/glnxa64/libmwfl.so+00164113 fl::diag::stacktrace_base::capture(fl::diag::thread_context const&, unsigned long)+000161 [ 2] 0x00007f21fdd7d42d bin/glnxa64/libmwfl.so+00201773 [ 3] 0x00007f21fdd7d6b4 bin/glnxa64/libmwfl.so+00202420 fl::diag::terminate_log(char const*, fl::diag::thread_context const&, bool)+000100 [ 4] 0x00007f21fce525a7 bin/glnxa64/libmwmcr.so+00365991 [ 5] 0x00007f21fb9eb8f0 lib/libpthread.so.0+00063728 [ 6] 0x00007f21f3e939a9 libboost_regex.so.1.40.0+00342441 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_all_states()+000073 [ 7] 0x00007f21f3eb6546 bin/glnxa64/libboost_regex.so.1.40.0+00484678 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_imp()+000758 [ 8] 0x00007f21c04ad595 lib/libmysharedlib.so+04855189 bool boost::regex_match, std::allocator > >, char, boost::regex_traits > >(__gnu_cxx::__normal_iterator, __gnu_cxx::__normal_iterator, boost::match_results, std::allocator > > >&, boost::basic_regex > > const&, boost::regex_constants::_match_flags)+000245 [ 9] 0x00007f21c04a71c7 lib/libmysharedlib.so+04829639 myfunc2()+000183 [ 10] 0x00007f21c01b41e3 lib/libmysharedlib.so+01737187 myfunc1()+000307
众所周知,MATLAB 仅使用 RTLD_NOW 标志执行 dlopen。
各位,请跟我一起思考一下。 现在我什至不顾一切地不想解决这个问题,而是简单地了解 ld&elf 的行为。
编辑: 小问题:我如何理解,如果没有特殊的链接器选项,linux .so 库中的符号永远不会通过地址链接?那么即使是静态链接的本地符号也会在运行时解析吗?
For a few days we are dealing with very strange problem.
I can't understand how it even happens - when a third-party (MATLAB) program uses our shared library, it somehow overrides some of our symbols (boost, to be precise) with it's own. Those symbols are statically linked and (!!) local.
Here is the deal - we use boost 1.47, MATLAB has boost 1.40. Currently, library call segfaults on a call from OUR library to their boost (regex).
So, here is the magic:
- We have no library dependencies, ldd:
linux-vdso.so.1 => (0x00007fff4abff000) libpthread.so.0 => /lib/libpthread.so.0 (0x00007f1a3fd65000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f1a3fa51000) libm.so.6 => /lib/libm.so.6 (0x00007f1a3f7cd000) libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f1a3f5bf000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f1a3f3a8000) libc.so.6 => /lib/libc.so.6 (0x00007f1a3f024000) /lib64/ld-linux-x86-64.so.2 (0x00007f1a414f9000) librt.so.1 => /lib/librt.so.1 (0x00007f1a3ee1c000)
- No Cxx symbols (our public symbols are POC C for binary compatibility) are exported from our library, nm:
nm -g --defined-only libmysharedlib.so addr1 T OurCSymbol1 addr2 T OurCSymbol2 addr3 T OurCSymbol3 ...
- Still, it uses their boost. HOW? Stacktrace (paths cut):
[ 0] 0x00007f21fddbb0a9 bin/libmwfl.so+00454825 fl::sysdep::linux::unwind_stack(void const**, unsigned long, unsigned long, fl::diag::thread_context const&)+000009 [ 1] 0x00007f21fdd74111 bin/glnxa64/libmwfl.so+00164113 fl::diag::stacktrace_base::capture(fl::diag::thread_context const&, unsigned long)+000161 [ 2] 0x00007f21fdd7d42d bin/glnxa64/libmwfl.so+00201773 [ 3] 0x00007f21fdd7d6b4 bin/glnxa64/libmwfl.so+00202420 fl::diag::terminate_log(char const*, fl::diag::thread_context const&, bool)+000100 [ 4] 0x00007f21fce525a7 bin/glnxa64/libmwmcr.so+00365991 [ 5] 0x00007f21fb9eb8f0 lib/libpthread.so.0+00063728 [ 6] 0x00007f21f3e939a9 libboost_regex.so.1.40.0+00342441 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_all_states()+000073 [ 7] 0x00007f21f3eb6546 bin/glnxa64/libboost_regex.so.1.40.0+00484678 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_imp()+000758 [ 8] 0x00007f21c04ad595 lib/libmysharedlib.so+04855189 bool boost::regex_match, std::allocator > >, char, boost::regex_traits > >(__gnu_cxx::__normal_iterator, __gnu_cxx::__normal_iterator, boost::match_results, std::allocator > > >&, boost::basic_regex > > const&, boost::regex_constants::_match_flags)+000245 [ 9] 0x00007f21c04a71c7 lib/libmysharedlib.so+04829639 myfunc2()+000183 [ 10] 0x00007f21c01b41e3 lib/libmysharedlib.so+01737187 myfunc1()+000307
It's known, that MATLAB does dlopen with RTLD_NOW flag only.
People, think with me please.
Now i'm desperate not to even fix this, but to simply understand ld&elf behavior.
edit:
Small additional question: how i understood, without special linker options, symbols in linux .so libraries are never linked by address? So even statically linked local symbols are resolved in runtime?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
查看 ld 的
-Bsymbolic
选项。如果指定了 -Bsymbolic,则在创建共享时
对象 ld 将尝试将对全局符号的引用绑定到定义
在共享库中。默认情况下将绑定推迟到运行时。
通过一个例子可能会更清楚。
假设
example.o
包含对定义在中的全局函数的引用global.o
和两个共享对象
normal.so
和symbolic.so
构建为如下:
反汇编
normal.so
的代码显示对globalfn
实际上是在遍历过程链接表,并且因此,调用的最终目的地是在运行时确定的。
而在
symbolic.so
中,调用始终调用以下定义共享对象内的
globalfn
。Check out the
-Bsymbolic
option for ld.If
-Bsymbolic
is specified, then at the time of creating a sharedobject ld will attempt to bind references to global symbols to definitions
within the shared library. The default is to defer binding to runtime.
This may be clearer with an example.
Say
example.o
contains a reference to a global function defined inglobal.o
,and two shared objects,
normal.so
andsymbolic.so
, are built asfollows:
Disassembling the code for
normal.so
shows that the call toglobalfn
is actually going through the procedure linkage table, andthus the final destination of the call is determined at runtime.
Whereas in
symbolic.so
, the call always invokes the definition ofglobalfn
within the shared object.你正在调用未定义的行为,这是一种“医生,当我这样做时会很痛”的情况。 Matlab 可执行文件已包含类
boost::re_detail::perl_matcher
的外部函数省略了>
。当 Matlab 加载您的共享库时,动态链接器会发现您的共享库以与现有定义冲突的方式定义了完全相同的符号。未定义的行为。解决方案是构建一个与 Matlab 一起使用的库版本,该版本使用与 Matlab 相同版本的 Boost。
You are invoking undefined behavior, which is a "Doctor, it hurts when I do this" kind of situation. The Matlab executable already contains external functions for class
boost::re_detail::perl_matcher< elided >
. When Matlab loads your shared library the dynamic linker sees that your shared library defines those exact same symbols in a way that conflicts with the existing definitions. Undefined behavior.The solution is to build a version of your library for use with Matlab that uses the same version of Boost as does Matlab.