如何在混合语言应用程序中创建堆?
我们有一个用 Visual Basic 6.0 编写的前端,它调用几个用混合 C/C++ 编写的后端 DLL。问题是每个 DLL 似乎都有自己的堆,并且其中一个堆不够大。当我们分配了足够的内存时,堆与程序堆栈发生冲突。 除了基本的 DLL 包装器是用 C++ 编写的之外,每个 DLL 都完全用 C 编写。每个 DLL 都有一些入口点。每个入口点立即调用一个 C 例程。我们想增加 DLL 中堆的大小,但无法弄清楚如何做到这一点。我搜索了指导并找到了这些 MSDN 文章:
http: //msdn.microsoft.com/en-us/library/hh405351(v=VS.85).aspx
这些文章很有趣,但提供的信息相互矛盾。在我们的问题中,每个 DLL 似乎都有自己的堆。这与“堆:快乐与痛苦”一文相匹配,该文章称 C 运行时 (C RT) 库在启动时创建自己的堆。 “管理堆内存”一文指出,C RT 库是从默认进程堆中分配的。 “Win32 中的内存管理选项”一文指出,该行为取决于所使用的 C RT 库的版本。
我们通过从私有堆分配内存暂时解决了这个问题。然而,为了改进这个非常大的复杂程序的结构,我们希望从带有瘦 C++ 包装器的 C 切换到带有类的真正 C++。我们非常确定 new 和 free 运算符不会从我们的私有堆中分配内存,并且我们想知道如何控制 C++ 用于在每个 DLL 中分配对象的堆的大小。该应用程序需要在所有版本的桌面 Windows-NT(从 2000 到 7)中运行。
问题
任何人都可以向我们指出明确且正确的文档吗? 解释如何控制 C++ 用于分配的堆的大小 对象?
有几个人断言,由于堆分配覆盖堆栈而导致的堆栈损坏是不可能的。这是我们观察到的情况。 VB 前端使用四个动态加载的 DLL。每个 DLL 都独立于其他 DLL,并提供一些由前端调用的方法。所有 DLL 都通过写入磁盘文件的数据结构进行通信。这些数据结构都是静态构造的。它们不包含指针,只包含值类型和固定大小的值类型数组。有问题的 DLL 是通过传递文件名的单个调用来调用的。它被设计为分配完成其处理所需的大约 20MB 的数据结构。它进行大量计算,将结果写入磁盘,释放 20MB 的数据结构,并返回错误代码。然后前端卸载 DLL。在调试所讨论的问题时,我们在数据结构分配代码的开头设置了一个断点,并观察从 calloc 调用返回的内存值,并将它们与当前堆栈指针进行比较。我们观察分配的块接近堆栈。分配完成后,堆栈开始增长,直到与堆重叠。最终计算写入堆并损坏堆栈。当堆栈展开时,它试图返回到无效地址并因分段错误而崩溃。
我们的每个 DLL 都静态链接到 CRT,因此每个 DLL 都有自己的 CRT 堆和堆管理器。微软在 http://msdn.microsoft.com 中说/en-us/library/ms235460(v=vs.80).aspx:
CRT 库的每个副本都有一个单独且不同的状态。 因此,CRT 对象(例如文件句柄、环境变量和 区域设置仅对这些对象所在的 CRT 副本有效 分配或设置。当 DLL 及其用户使用不同的 DLL 副本时 CRT 库,您不能跨 DLL 边界传递这些 CRT 对象 并期望它们能够在另一侧被正确拾取。
另外,由于 CRT 库的每个副本都有自己的堆管理器, 在一个 CRT 库中分配内存并通过一个 CRT 库传递指针 由 CRT 库的不同副本释放的 DLL 边界是 堆损坏的潜在原因。
我们不在 DLL 之间传递指针。我们没有遇到堆损坏,我们遇到的是堆栈损坏。
We have a front end written in Visual Basic 6.0 that calls several back end DLLs written in mixed C/C++. The problem is that each DLL appears to have its own heap and one of them isn’t big enough. The heap collides with the program stack when we’ve allocated enough memory.
Each DLL is written entirely in C, except for the basic DLL wrapper, which is written in C++. Each DLL has a handful of entry points. Each entry point immediately calls a C routine. We would like to increase the size of the heap in the DLL, but haven’t been able to figure out how to do that. I searched for guidance and found these MSDN articles:
http://msdn.microsoft.com/en-us/library/hh405351(v=VS.85).aspx
These articles are interesting but provide conflicting information. In our problem it appears that each DLL has its own heap. This matches the “Heaps: Pleasures and Pains” article that says that the C Run-Time (C RT) library creates its own heap on startup. The “Managing Heap Memory” article says that the C RT library allocated out of the default process heap. The “Memory management options in Win32” article says the behavior depends on the version of the C RT library being used.
We’ve temporarily solved the problem by allocating memory from a private heap. However, in order to improve the structure of this very large complex program, we want to switch from C with a thin C++ wrapper to real C++ with classes. We’re pretty certain that the new and free operator won’t allocate memory from our private heap and we’re wondering how to control the size of the heap C++ uses to allocate objects in each DLL. The application needs to run in all versions of desktop Windows-NT, from 2000 through 7.
The Question
Can anyone point us to definitive and correct documentation that
explains how to control the size of the heap C++ uses to allocate
objects?
Several people have asserted that stack corruption due to heap allocations overwriting the stack are impossible. Here is what we observed. The VB front end uses four DLLs that it dynamicly loads. Each DLL is independant of the others and provides a handful of methods called by the front end. All the DLLs comunicate via data structures written to files on disk. These data structures are all structured staticlly. They contain no pointers, just value types and fixed sized arrays of value types. The problem DLL is invoked by a single call where a file name is passed. It is designed to allocate about 20MB of data structures required to do complete its processing. It does a lot of calculation, writes the results to disk, releases the 20MB of data structures, and returns and error code. The front end then unloads the DLL. While debugging the problem under discussion, we set a break point at the beginning of the data structure allocation code and watched the memory values returned from the calloc calls and compared them with the current stack pointer. We watched as the allocated blocks approached the the stack. After the allocation was complete the stack began to grow until it overlapped the heap. Eventually the calculations wrote into the heap and corrupted the stack. As the stack unwound it tried to return to an invalid address and crashed with a segmentation fault.
Each of our DLLs is staticly linked to the CRT, so that each DLL has its own CRT heap and heap manager. Microsoft says in http://msdn.microsoft.com/en-us/library/ms235460(v=vs.80).aspx:
Each copy of the CRT library has a separate and distinct state.
As such, CRT objects such as file handles, environment variables, and
locales are only valid for the copy of the CRT where these objects are
allocated or set. When a DLL and its users use different copies of the
CRT library, you cannot pass these CRT objects across the DLL boundary
and expect them to be picked up correctly on the other side.
Also, because each copy of the CRT library has its own heap manager,
allocating memory in one CRT library and passing the pointer across a
DLL boundary to be freed by a different copy of the CRT library is a
potential cause for heap corruption.
We don't pass pointers between DLLs. We aren't experiencing heap corruption, we are experiencing stack corruption.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
好的,问题是:
我要回答我自己的问题。我通过阅读 Raymond Chen 的博客旧的新事物,特别是还有一个用于非托管代码的大型对象堆,但它位于常规堆。在那篇文章中,Raymond 推荐了 Mario Hewardt 和 Daniel Pravat 的高级 Windows 调试。这本书提供了有关堆栈和堆损坏的非常具体的信息,这正是我想知道的。另外,它还提供了有关如何调试这些问题的各种信息。
OK, the question is:
I am going to answer my own question. I got the answer from reading Raymond Chen's blog The Old New Thing, specifically There's also a large object heap for unmanaged code, but it's inside the regular heap. In that article Raymond recommends Advanced Windows Debugging by Mario Hewardt and Daniel Pravat. This book has very specific information on both stack and heap corruption, which is what I wanted to know. As a plus it provides all sorts of information about how to debug these problems.
您能否详细说明一下您的声明:
如果我们谈论的是 Windows(或任何其他成熟的平台),这种情况不应该发生:操作系统确保堆栈、堆、映射文件和其他对象永远不会相交。
还:
Windows 上的堆大小并不固定:它随着应用程序使用越来越多的内存而增长。它会一直增长,直到进程的所有可用虚拟内存空间都被使用为止。确认这一点非常容易:只需编写一个简单的测试应用程序,它会不断分配内存并计算已分配的内存量。在默认的 32 位 Windows 上,您将达到近 2Gb。当然,最初堆不会占用所有可用空间,因此它必须在此过程中增长。
如果没有有关“碰撞”的许多详细信息,很难判断您的案例中发生了什么。然而,查看这个问题的标签提示我一种可能性。分配的内存区域的所有权有可能在模块(在您的情况下是 DLL)之间传递(不幸的是,这种情况经常发生)。场景如下:
如果堆不同,大多数堆管理器不会检查正在释放的内存区域是否实际上属于它(主要是出于性能原因)。所以他们会取消分配不属于他们的东西。通过这样做,他们破坏了其他模块的堆。这可能(并且经常)导致崩溃。但并非总是如此。根据您的运气(以及特定的堆管理器实现),此操作可能会更改其中一个堆,从而使下一次分配发生在堆所在区域之外。
当一个模块是托管代码,而另一个模块是本机代码时,通常会发生这种情况。由于您的问题中有 VB6 标签,我会检查是否是这种情况。
Could you please elaborate on this your statement:
If we're talking about Windows (or any other mature platform), this should not be happening: the OS makes sure that stacks, heaps, mapped files and other objects never intersect.
Also:
The heap size is not fixed on Windows: it grows as the application uses more and more memory. It will grow until all available virtual memory space for the process is used. It is pretty easy to confirm this: just write a simple test app which keeps allocating memory and counts how much has been allocated. On a default 32-bit Windows you'll reach almost 2Gb. Surely, initially the heap doesn't occupy all available space, therefore it must grow in the process.
Without many details about the "collision" it's hard to tell what's happening in your case. However, looking at the tags to this question prompts me to one possibility. It is possible (and happens quite often, unfortunately) that ownership of allocated memory areas is being passed between modules (DLLs in your case). Here's the scenario:
If the heaps are different, most heap managers would not check if the memory region being deallocated actually belongs to it (mostly for performance reasons). So they would deacllocate something which doesn't belong to them. By doing that they corrupt the other module's heap. This may (and often does) lead to a crash. But not always. Depending on your luck (and particular heap manager implementation), this operation may change one of the heaps in a manner that the next allocation will happen outside of the area where the heap is located.
This often happens when one module is managed code, while the other is native one. Since you have the VB6 tag in the question, I'd check if this is the case.
如果堆栈增长到足以触及堆,则问题可能是过早中止的堆栈溢出:传递了不满足问题 DLL 中某些递归的退出条件(循环检测不起作用或不存在)的无效数据,因此无限递归会消耗非常大的堆栈空间。人们会期望这样的 DLL 会因堆栈溢出异常而终止,但由于编译器/链接器优化或较大的外部堆大小,它可能会在其他地方崩溃。
If the stack grows large enough to hit the heap, a prematurely aborted stack overflow may be the problem: Invalid data is passed that does not satisfy the exit condition of some recursion (loop detection is not working or not existing) in the problem DLL so that an infinite recursion consumes ridiculously large stack space. One would expect such a DLL to terminate with a stack overflow exception, but for maybe compiler / linker optimizations or large foreign heap sizes it crashes elsewhere.
堆由 CRT 创建。也就是说,
malloc
堆是由CRT创建的,与HeapCreate()
无关。不过,它不用于大型分配,而是直接交给操作系统。对于多个 DLL,您可能有多个堆(较新的 VC 版本更擅长共享,但如果您使用 MSVCRT.DLL,即使 VC6 也没有问题 - 这是共享的)
另一方面,堆栈由操作系统管理。在这里您会明白为什么多个堆并不重要:不同堆的操作系统分配永远不会与堆栈的操作系统分配发生冲突。
请注意,操作系统可能会分配靠近堆栈的堆空间。规则只是不能重叠,毕竟没有保证“未使用的隔离区”。如果出现缓冲区溢出,它很可能会溢出到堆栈空间。
那么,有什么解决办法吗?是:转移到 VC2010。它具有缓冲区安全检查,以非常有效的方式实施。即使在发布模式下它们也是默认的。
Heaps are created by the CRT. That is to say, the
malloc
heap is created by the CRT, and is unrelated toHeapCreate()
. It's not used for large allocations, though, which are handed off to the OS directly.With multiple DLLs, you might have multiple heaps (newer VC versions are better at sharing, but even VC6 had no problem if you used MSVCRT.DLL - that's shared)
The stack, on the other hand, is managed by the OS. Here you see why multiple heaps don't matter: The OS allocation for the different heaps will never collide with the OS allocation for the stack.
Mind you, the OS may allocate heap space close to the stack. The rule is just no overlap, after all, there's no guaranteed "unused separation zone". If you then have a buffer overflow, it could very well overflow into the stack space.
So, any solutions? Yes: move to VC2010. It has buffer security checks, implemented in quite an efficient way. They're the default even in release mode.