Ld 神奇地覆盖静态链接符号

发布于 2024-12-01 15:51:25 字数 2752 浏览 1 评论 0原文

几天来我们正在处理非常奇怪的问题。

我无法理解它是如何发生的 - 当第三方(MATLAB)程序使用我们的共享库时,它会以某种方式用它自己的符号覆盖我们的一些符号(准确地说是升压)。这些符号是静态链接的并且(!!)本地的。

事情是这样的——我们使用 boost 1.47,MATLAB 使用 boost 1.40。目前,库在从我们的库调用它们的 boost(正则表达式)时会调用段错误。

所以,神奇之处在于:

  • 我们没有库依赖项,ldd:
    linux-vdso.so.1 =>  (0x00007fff4abff000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x00007f1a3fd65000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f1a3fa51000)
    libm.so.6 => /lib/libm.so.6 (0x00007f1a3f7cd000)
    libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f1a3f5bf000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f1a3f3a8000)
    libc.so.6 => /lib/libc.so.6 (0x00007f1a3f024000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1a414f9000)
    librt.so.1 => /lib/librt.so.1 (0x00007f1a3ee1c000)
  • 没有从我们的库导出 Cxx 符号(我们的公共符号是 POC C 以实现二进制兼容性),nm:
nm -g --defined-only libmysharedlib.so

addr1 T OurCSymbol1
addr2 T OurCSymbol2
addr3 T OurCSymbol3
...
  • 仍然使用它们的 boost。如何? Stacktrace(路径剪切):
[  0] 0x00007f21fddbb0a9 bin/libmwfl.so+00454825 fl::sysdep::linux::unwind_stack(void const**, unsigned long, unsigned long, fl::diag::thread_context const&)+000009
[  1] 0x00007f21fdd74111 bin/glnxa64/libmwfl.so+00164113 fl::diag::stacktrace_base::capture(fl::diag::thread_context const&, unsigned long)+000161
[  2] 0x00007f21fdd7d42d bin/glnxa64/libmwfl.so+00201773
[  3] 0x00007f21fdd7d6b4 bin/glnxa64/libmwfl.so+00202420 fl::diag::terminate_log(char const*, fl::diag::thread_context const&, bool)+000100
[  4] 0x00007f21fce525a7 bin/glnxa64/libmwmcr.so+00365991
[  5] 0x00007f21fb9eb8f0 lib/libpthread.so.0+00063728
[  6] 0x00007f21f3e939a9 libboost_regex.so.1.40.0+00342441 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_all_states()+000073
[  7] 0x00007f21f3eb6546 bin/glnxa64/libboost_regex.so.1.40.0+00484678 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_imp()+000758
[  8] 0x00007f21c04ad595 lib/libmysharedlib.so+04855189 bool boost::regex_match, std::allocator > >, char, boost::regex_traits > >(__gnu_cxx::__normal_iterator, __gnu_cxx::__normal_iterator, boost::match_results, std::allocator > > >&, boost::basic_regex > > const&, boost::regex_constants::_match_flags)+000245
[  9] 0x00007f21c04a71c7 lib/libmysharedlib.so+04829639 myfunc2()+000183
[ 10] 0x00007f21c01b41e3 lib/libmysharedlib.so+01737187 myfunc1()+000307

众所周知,MATLAB 仅使用 RTLD_NOW 标志执行 dlopen。

各位,请跟我一起思考一下。 现在我什至不顾一切地不想解决这个问题,而是简单地了解 ld&elf 的行为。

编辑: 小问题:我如何理解,如果没有特殊的链接器选项,linux .so 库中的符号永远不会通过地址链接?那么即使是静态链接的本地符号也会在运行时解析吗?

For a few days we are dealing with very strange problem.

I can't understand how it even happens - when a third-party (MATLAB) program uses our shared library, it somehow overrides some of our symbols (boost, to be precise) with it's own. Those symbols are statically linked and (!!) local.

Here is the deal - we use boost 1.47, MATLAB has boost 1.40. Currently, library call segfaults on a call from OUR library to their boost (regex).

So, here is the magic:

  • We have no library dependencies, ldd:
    linux-vdso.so.1 =>  (0x00007fff4abff000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x00007f1a3fd65000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f1a3fa51000)
    libm.so.6 => /lib/libm.so.6 (0x00007f1a3f7cd000)
    libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f1a3f5bf000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f1a3f3a8000)
    libc.so.6 => /lib/libc.so.6 (0x00007f1a3f024000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1a414f9000)
    librt.so.1 => /lib/librt.so.1 (0x00007f1a3ee1c000)
  • No Cxx symbols (our public symbols are POC C for binary compatibility) are exported from our library, nm:
nm -g --defined-only libmysharedlib.so

addr1 T OurCSymbol1
addr2 T OurCSymbol2
addr3 T OurCSymbol3
...
  • Still, it uses their boost. HOW? Stacktrace (paths cut):
[  0] 0x00007f21fddbb0a9 bin/libmwfl.so+00454825 fl::sysdep::linux::unwind_stack(void const**, unsigned long, unsigned long, fl::diag::thread_context const&)+000009
[  1] 0x00007f21fdd74111 bin/glnxa64/libmwfl.so+00164113 fl::diag::stacktrace_base::capture(fl::diag::thread_context const&, unsigned long)+000161
[  2] 0x00007f21fdd7d42d bin/glnxa64/libmwfl.so+00201773
[  3] 0x00007f21fdd7d6b4 bin/glnxa64/libmwfl.so+00202420 fl::diag::terminate_log(char const*, fl::diag::thread_context const&, bool)+000100
[  4] 0x00007f21fce525a7 bin/glnxa64/libmwmcr.so+00365991
[  5] 0x00007f21fb9eb8f0 lib/libpthread.so.0+00063728
[  6] 0x00007f21f3e939a9 libboost_regex.so.1.40.0+00342441 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_all_states()+000073
[  7] 0x00007f21f3eb6546 bin/glnxa64/libboost_regex.so.1.40.0+00484678 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_imp()+000758
[  8] 0x00007f21c04ad595 lib/libmysharedlib.so+04855189 bool boost::regex_match, std::allocator > >, char, boost::regex_traits > >(__gnu_cxx::__normal_iterator, __gnu_cxx::__normal_iterator, boost::match_results, std::allocator > > >&, boost::basic_regex > > const&, boost::regex_constants::_match_flags)+000245
[  9] 0x00007f21c04a71c7 lib/libmysharedlib.so+04829639 myfunc2()+000183
[ 10] 0x00007f21c01b41e3 lib/libmysharedlib.so+01737187 myfunc1()+000307

It's known, that MATLAB does dlopen with RTLD_NOW flag only.

People, think with me please.
Now i'm desperate not to even fix this, but to simply understand ld&elf behavior.

edit:
Small additional question: how i understood, without special linker options, symbols in linux .so libraries are never linked by address? So even statically linked local symbols are resolved in runtime?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

岁月打碎记忆 2024-12-08 15:51:25

查看 ld-Bsymbolic 选项。

如果指定了 -Bsymbolic,则在创建共享时
对象 ld 将尝试将对全局符号的引用绑定到定义
共享库中。默认情况下将绑定推迟到运行时。

通过一个例子可能会更清楚。

假设 example.o 包含对定义在中的全局函数的引用
global.o

$ nm example.o | grep ' U'
     U _GLOBAL_OFFSET_TABLE_
     U globalfn
$ nm global.o | grep ' T'
00000000 T globalfn

和两个共享对象 normal.sosymbolic.so 构建为
如下:

$ cc -fPIC -c example.c
$ cc -c global.c
$ rm -f archive.a; ar cr archive.a global.o
$ ld -shared -o normal.so example.o archive.a
$ ld -Bsymbolic -shared -o symbolic.so example.o archive.a

反汇编 normal.so 的代码显示对
globalfn 实际上是在遍历过程链接表,并且
因此,调用的最终目的地是在运行时确定的。

$ objdump --disassemble normal.so
...snip...
00000194 <example>:
...snip...
 1a6:   e8 d9 ff ff ff          call   184 <globalfn@plt>
...snip...
$ readelf -r normal.so

Relocation section '.rel.plt' at offset 0x16c contains 1 entries:
Offset     Info    Type            Sym.Value  Sym. Name
00001244  00000207 R_386_JUMP_SLOT   000001b8   globalfn

而在 symbolic.so 中,调用始终调用以下定义
共享对象内的globalfn

$ objdump --disassemble symbolic.so
...snip...
0000016c <shared>:
...snip...
 17e:   e8 0d 00 00 00          call   190 <globalfn>
...snip...
$ readelf -r symbolic.so

There are no relocations in this file.

Check out the -Bsymbolic option for ld.

If -Bsymbolic is specified, then at the time of creating a shared
object ld will attempt to bind references to global symbols to definitions
within the shared library. The default is to defer binding to runtime.

This may be clearer with an example.

Say example.o contains a reference to a global function defined in
global.o,

$ nm example.o | grep ' U'
     U _GLOBAL_OFFSET_TABLE_
     U globalfn
$ nm global.o | grep ' T'
00000000 T globalfn

and two shared objects, normal.so and symbolic.so, are built as
follows:

$ cc -fPIC -c example.c
$ cc -c global.c
$ rm -f archive.a; ar cr archive.a global.o
$ ld -shared -o normal.so example.o archive.a
$ ld -Bsymbolic -shared -o symbolic.so example.o archive.a

Disassembling the code for normal.so shows that the call to
globalfn is actually going through the procedure linkage table, and
thus the final destination of the call is determined at runtime.

$ objdump --disassemble normal.so
...snip...
00000194 <example>:
...snip...
 1a6:   e8 d9 ff ff ff          call   184 <globalfn@plt>
...snip...
$ readelf -r normal.so

Relocation section '.rel.plt' at offset 0x16c contains 1 entries:
Offset     Info    Type            Sym.Value  Sym. Name
00001244  00000207 R_386_JUMP_SLOT   000001b8   globalfn

Whereas in symbolic.so, the call always invokes the definition of
globalfn within the shared object.

$ objdump --disassemble symbolic.so
...snip...
0000016c <shared>:
...snip...
 17e:   e8 0d 00 00 00          call   190 <globalfn>
...snip...
$ readelf -r symbolic.so

There are no relocations in this file.
感情洁癖 2024-12-08 15:51:25

事情是这样的 - 我们使用 boost 1.47,MATLAB 使用 boost 1.40。目前,库在从我们的库调用它们的 boost(正则表达式)时会调用段错误。

你正在调用未定义的行为,这是一种“医生,当我这样做时会很痛”的情况。 Matlab 可执行文件已包含类 boost::re_detail::perl_matcher的外部函数省略了>。当 Matlab 加载您的共享库时,动态链接器会发现您的共享库以与现有定义冲突的方式定义了完全相同的符号。未定义的行为。

解决方案是构建一个与 Matlab 一起使用的库版本,该版本使用与 Matlab 相同版本的 Boost。

Here is the deal - we use boost 1.47, MATLAB has boost 1.40. Currently, library call segfaults on a call from OUR library to their boost (regex).

You are invoking undefined behavior, which is a "Doctor, it hurts when I do this" kind of situation. The Matlab executable already contains external functions for class boost::re_detail::perl_matcher< elided >. When Matlab loads your shared library the dynamic linker sees that your shared library defines those exact same symbols in a way that conflicts with the existing definitions. Undefined behavior.

The solution is to build a version of your library for use with Matlab that uses the same version of Boost as does Matlab.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文