在共享库中混合 PIC 和非 PIC 对象

发布于 2024-12-19 08:43:36 字数 876 浏览 2 评论 0原文

这个问题与这个问题及其答案相关。

我刚刚发现我正在开发的构建中有一些丑陋之处。情况有点像下面这样(以 gmake 格式编写);请注意,这特别适用于 sparc 和 x86 硬件上的 32 位内存模型:

OBJ_SET1  := some objects
OBJ_SET2  := some objects

# note: OBJ_SET2 doesn't get this flag
${OBJ_SET1} : CCFLAGS += -PIC

${OBJ_SET1} ${OBJ_SET2} : %.o : %.cc
  ${CCC} ${CCFLAGS} -m32 -o ${@} -c ${<}

obj1.o       : ${OBJ_SET1}
obj2.o       : ${OBJ_SET2}
sharedlib.so : obj1.o obj2.o
obj1.o obj2.o sharedlib.so :
  ${LINK} ${LDFLAGS} -m32 -PIC -o ${@} ${^}

显然,它可以在共享对象中混合使用和不使用 PIC 编译的对象(这已使用多年)。我对 PIC 的了解不够,不知道这是否是一个好主意/聪明,我的猜测是在这种情况下它是不需要的,而是它正在发生,因为有人在处理时没有足够关心找到正确的方法来做到这一点关于构建的新内容。

我的问题是:

  1. 这安全吗?
  2. 这是一个好主意吗?
  3. 结果可能会出现什么潜在问题?
  4. 如果我将所有内容都切换到 PIC,是否有任何我可能需要注意的不明显的问题。

This question is related to this one as well as its answer.

I just discovered some ugliness in a build I'm working on. The situation looks somewhat like the following (written in gmake format); note, this specifically applies to a 32-bit memory model on sparc and x86 hardware:

OBJ_SET1  := some objects
OBJ_SET2  := some objects

# note: OBJ_SET2 doesn't get this flag
${OBJ_SET1} : CCFLAGS += -PIC

${OBJ_SET1} ${OBJ_SET2} : %.o : %.cc
  ${CCC} ${CCFLAGS} -m32 -o ${@} -c ${<}

obj1.o       : ${OBJ_SET1}
obj2.o       : ${OBJ_SET2}
sharedlib.so : obj1.o obj2.o
obj1.o obj2.o sharedlib.so :
  ${LINK} ${LDFLAGS} -m32 -PIC -o ${@} ${^}

Clearly it can work to mix objects compiled with and without PIC in a shared object (this has been in use for years). I don't know enough about PIC to know whether it's a good idea/smart, and my guess is in this case it's not needed but rather it's happening because someone didn't care enough to find out the right way to do it when tacking on new stuff to the build.

My question is:

  1. Is this safe
  2. Is it a good idea
  3. What potential problems can occur as a result
  4. If I switch everything to PIC, are there any non-obvious gotchas that I might want to watch out for.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦罢 2024-12-26 08:43:36

忘了我什至写过这个问题。

首先是一些解释:

  • 在[大多数?]现代操作系统中,非 PIC 代码可以由操作系统加载到内存中的任何位置。加载所有内容后,它会经历一个修复文本段(可执行文件结束的地方)的阶段,以便它正确地寻址全局变量;为了实现这一点,文本段必须是可写的。
  • PIC 可执行数据可由操作系统加载一次并在多个用户/进程之间共享。然而,要让操作系统执行此操作,文本段必须是只读的——这意味着不需要修复。代码被编译为使用全局偏移表 (GOT),因此它可以寻址相对于 GOT 的全局变量,从而减少修复的需要。
  • 如果在没有 PIC 的情况下构建共享对象,尽管强烈鼓励这样做,但似乎并不是绝对必要的;如果操作系统必须修复文本段,那么它被迫将其加载到标记为读写的内存中......这会阻止跨进程/用户共享。
  • 如果使用 PIC 构建可执行二进制文件,我不知道幕后出了什么问题,但我亲眼目睹了一些工具变得不稳定(神秘的崩溃等)。

答案:

  • 混合 PIC/非 PIC,或在可执行文件中使用 PIC 可能会导致难以预测和追踪不稳定性。我没有技术解释为什么。
    • ...包括段错误、总线错误、堆栈损坏,以及可能更多。
  • 共享对象中的非 PIC 可能不会导致任何严重问题,但如果跨进程和/或用户多次使用库,可能会导致使用更多 RAM。

更新 (4/17)

我已经发现了我以前见过的一些崩溃的原因。举例来说:

/*header.h*/
#include <map>
typedef std::map<std::string,std::string> StringMap;
StringMap asdf;

/*file1.cc*/
#include "header.h"

/*file2.cc*/
#include "header.h"

int main( int argc, char** argv ) {
  for( int ii = 0; ii < argc; ++ii ) {
    asdf[argv[ii]] = argv[ii];
  }

  return 0;
}

...那么:

$ g++ file1.cc -shared -PIC -o libblah1.so
$ g++ file1.cc -shared -PIC -o libblah2.so
$ g++ file1.cc -shared -PIC -o libblah3.so
$ g++ file1.cc -shared -PIC -o libblah4.so
$ g++ file1.cc -shared -PIC -o libblah5.so

$ g++ -zmuldefs file2.cc -Wl,-{L,R}$(pwd) -lblah{1..5} -o fdsa
#     ^^^^^^^^^
#     This is the evil that made it possible
$ args=(this is the song that never ends);
$ eval ./fdsa $(for i in {1..100}; do echo -n ${args[*]}; done)

那个特定的示例可能不会最终崩溃,但这基本上是该组代码中存在的情况。如果它确实崩溃,它很可能出现在析构函数中,通常是双重释放错误。

许多年前,他们在构建中添加了 -zmuldefs 来消除多重定义的符号错误。编译器发出用于在全局对象上运行构造函数/析构函数的代码。 -zmuldefs 强制它们位于内存中的同一位置,但它仍然为 exe 和包含有问题标头的每个库运行一次构造函数/析构函数 - 因此是双重释放。

Forgot I even wrote this question.

Some explanations are in order first:

  • Non-PIC code may be loaded by the OS into any position in memory in [most?] modern OSs. After everything is loaded, it goes through a phase that fixes up the text segment (where the executable stuff ends up) so it correctly addresses global variables; to pull this off, the text segment must be writable.
  • PIC executable data can be loaded once by the OS and shared across multiple users/processes. For the OS to do this, however, the text segment must be read-only -- which means no fix-ups. The code is compiled to use a Global Offset Table (GOT) so it can address globals relative to the GOT, alleviating the need for fix-ups.
  • If a shared object is built without PIC, although it is strongly encouraged it doesn't appear that it's strictly necessary; if the OS must fix-up the text segment then it's forced to load it into memory that's marked read-write ... which prevents sharing across processes/users.
  • If an executable binary is built /with/ PIC, I don't know what goes wrong under the hood but I've witnessed a few tools become unstable (mysterious crashes & the like).

The answers:

  • Mixing PIC/non-PIC, or using PIC in executables can cause hard to predict and track down instabilities. I don't have a technical explanation for why.
    • ... to include segfaults, bus errors, stack corruption, and probably more besides.
  • Non-PIC in shared objects is probably not going to cause any serious problems, though it can result in more RAM used if the library is used many times across processes and/or users.

update (4/17)

I've since discovered the cause of some of the crashes I had seen previously. To illustrate:

/*header.h*/
#include <map>
typedef std::map<std::string,std::string> StringMap;
StringMap asdf;

/*file1.cc*/
#include "header.h"

/*file2.cc*/
#include "header.h"

int main( int argc, char** argv ) {
  for( int ii = 0; ii < argc; ++ii ) {
    asdf[argv[ii]] = argv[ii];
  }

  return 0;
}

... then:

$ g++ file1.cc -shared -PIC -o libblah1.so
$ g++ file1.cc -shared -PIC -o libblah2.so
$ g++ file1.cc -shared -PIC -o libblah3.so
$ g++ file1.cc -shared -PIC -o libblah4.so
$ g++ file1.cc -shared -PIC -o libblah5.so

$ g++ -zmuldefs file2.cc -Wl,-{L,R}$(pwd) -lblah{1..5} -o fdsa
#     ^^^^^^^^^
#     This is the evil that made it possible
$ args=(this is the song that never ends);
$ eval ./fdsa $(for i in {1..100}; do echo -n ${args[*]}; done)

That particular example may not end up crashing, but it's basically the situation that had existed in that group's code. If it does crash it'll likely be in the destructor, usually a double-free error.

Many years previous they added -zmuldefs to their build to get rid of multiply defined symbol errors. The compiler emits code for running constructors/destructors on global objects. -zmuldefs forces them to live at the same location in memory but it still runs the constructors/destructors once for the exe and each library that included the offending header -- hence the double-free.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文