为什么这个动态库加载代码可以与gcc一起工作?
背景:
我发现自己面临着将 C++ GNU/Linux 应用程序移植到 Windows 的艰巨任务。该应用程序所做的事情之一是在特定路径上搜索共享库,然后使用 posix dlopen() 和 dlsym() 调用动态地从中加载类。我们有充分的理由以这种方式进行加载,但我不会在这里讨论。
问题:
要动态发现 C++ 编译器使用 dlsym() 或 GetProcAddress() 生成的符号,必须使用 extern“C”链接块对它们进行整理。例如:
#include <list>
#include <string>
using std::list;
using std::string;
extern "C" {
list<string> get_list()
{
list<string> myList;
myList.push_back("list object");
return myList;
}
}
此代码是完全有效的 C++,可以在 Linux 和 Windows 上的众多编译器上编译和运行。但是,它无法使用 MSVC 进行编译,因为“返回类型不是有效的 C”。我们提出的解决方法是更改函数以返回指向列表而不是列表对象的指针:
#include <list>
#include <string>
using std::list;
using std::string;
extern "C" {
list<string>* get_list()
{
list<string>* myList = new list<string>();
myList->push_back("ptr to list");
return myList;
}
}
我一直在尝试为 GNU/Linux 加载程序找到一个最佳解决方案,该解决方案既可以与新的函数和旧的遗留函数原型,或者至少检测何时遇到已弃用的函数并发出警告。如果我们的用户在尝试使用旧库时代码出现段错误,那对他们来说是不合适的。我最初的想法是在调用 get_list 期间设置 SIGSEGV 信号处理程序(我知道这很恶心 - 我愿意接受更好的想法)。因此,为了确认加载旧库会在我认为会发生段错误的地方通过新的加载代码(需要指向列表的指针)使用旧函数原型(返回列表对象)运行一个库,令我惊讶的是刚刚工作。我的问题是为什么?
下面的加载代码适用于上面列出的两个函数原型。我已经确认它可以在使用 gcc 版本 4.1.2 和 4.4.4 的 Fedora 12、RedHat 5.5 和 RedHawk 5.1 上运行。使用带有 -shared 和 -fPIC 的 g++ 编译库,并且可执行文件需要与 dl (-ldl) 链接。
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include <list>
#include <string>
using std::list;
using std::string;
int main(int argc, char **argv)
{
void *handle;
list<string>* (*getList)(void);
char *error;
handle = dlopen("library path", RTLD_LAZY);
if (!handle)
{
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror();
*(void **) (&getList) = dlsym(handle, "get_list");
if ((error = dlerror()) != NULL)
{
printf("%s\n", error);
exit(EXIT_FAILURE);
}
list<string>* libList = (*getList)();
for(list<string>::iterator iter = libList->begin();
iter != libList->end(); iter++)
{
printf("\t%s\n", iter->c_str());
}
dlclose(handle);
exit(EXIT_SUCCESS);
}
Background:
I've found myself with the unenviable task of porting a C++ GNU/Linux application over to Windows. One of the things this application does is search for shared libraries on specific paths and then loads classes out of them dynamically using the posix dlopen() and dlsym() calls. We have a very good reason for doing loading this way that I will not go into here.
The Problem:
To dynamically discover symbols generated by a C++ compiler with dlsym() or GetProcAddress() they must be unmangled by using an extern "C" linkage block. For example:
#include <list>
#include <string>
using std::list;
using std::string;
extern "C" {
list<string> get_list()
{
list<string> myList;
myList.push_back("list object");
return myList;
}
}
This code is perfectly valid C++ and compiles and runs on numerous compilers on both Linux and Windows. It, however, does not compile with MSVC because "the return type is not valid C". The workaround we've come up with is to change the function to return a pointer to the list instead of the list object:
#include <list>
#include <string>
using std::list;
using std::string;
extern "C" {
list<string>* get_list()
{
list<string>* myList = new list<string>();
myList->push_back("ptr to list");
return myList;
}
}
I've been trying to find an optimal solution for the GNU/Linux loader that will either work with both the new functions and the old legacy function prototype or at least detect when the deprecated function is encountered and issue a warning. It would be unseemly for our users if the code just segfaulted when they tried to use an old library. My original idea was to set a SIGSEGV signal handler during the call to get_list (I know this is icky - I'm open to better ideas). So just to confirm that loading an old library would segfault where I thought it would I ran a library using the old function prototype (returning a list object) through the new loading code (that expects a pointer to a list) and to my surprise it just worked. The question I have is why?
The below loading code works with both function prototypes listed above. I've confirmed that it works on Fedora 12, RedHat 5.5, and RedHawk 5.1 using gcc versions 4.1.2 and 4.4.4. Compile the libraries using g++ with -shared and -fPIC and the executable needs to be linked against dl (-ldl).
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
#include <list>
#include <string>
using std::list;
using std::string;
int main(int argc, char **argv)
{
void *handle;
list<string>* (*getList)(void);
char *error;
handle = dlopen("library path", RTLD_LAZY);
if (!handle)
{
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror();
*(void **) (&getList) = dlsym(handle, "get_list");
if ((error = dlerror()) != NULL)
{
printf("%s\n", error);
exit(EXIT_FAILURE);
}
list<string>* libList = (*getList)();
for(list<string>::iterator iter = libList->begin();
iter != libList->end(); iter++)
{
printf("\t%s\n", iter->c_str());
}
dlclose(handle);
exit(EXIT_SUCCESS);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
正如阿什普勒所说,这是因为你很幸运。
事实证明,用于 x86 和 x64 的 gcc(以及大多数其他编译器)的 ABI 通过向函数传递一个额外的“隐藏”指针 arg 来返回“大”结构(太大而无法放入寄存器),该函数使用该指针作为空间来存储返回值,然后返回指针本身。因此,事实证明,这种形式的函数
大致相当于
调用者为“foo”分配空间(可能在堆栈上)并传入指向它的指针的地方。
因此,如果您有一个期望以这种方式调用的函数(期望返回一个结构体),并且通过返回指针的函数指针来调用它,那么它可能看起来可以工作——如果垃圾位它获取额外的参数(调用者留下的随机寄存器内容)碰巧指向可写的地方,被调用的函数会很乐意在那里写入其返回值,然后返回该指针,因此被调用的代码将返回看起来像的东西指向它所期望的结构的有效指针。因此,代码表面上看起来可以工作,但实际上它可能会破坏稍后可能很重要的随机内存位。
As aschepler says, its because you got lucky.
As it turns out, the ABI used for gcc (and most other compilers) for both x86 and x64 returns 'large' structs (too big to fit in a register) by passing an extra 'hidden' pointer arg to the function, which uses that pointer as space to store the return value, and then returns the pointer itself. So it turns out that a function of the form
is roughly equivlant to
where the caller is expected to allocate space for a 'foo' (probably on the stack) and pass in a pointer to it.
So it just happens that if you have a function that is expecting to be called this way (expecting to return a struct) and instead call it via a function pointer that returns a pointer, it MAY appear to work -- if the garbage bits it gets for the extra arg (random register contents left there by the caller) happen to point to somewhere writable, the called function will happily write its return value there and then return that pointer, so the called code will get back something that looks a like a valid pointer to the struct it is expecting. So the code may superficially appear to work, but its actually probably clobbering a random bit of memory that may be important later.