DLL、内存映射、基地址、内存使用和.NET?
在开始真正的问题之前,我想说的是,我可能会弄错一些细节。 如果是这样,请逮捕我,甚至不要回答我的问题。
我的问题基本上是关于 DLL 和 .NET。 我们有一个应用程序使用了大量内存,我们正在尝试找出如何正确测量内存的方法,特别是当问题主要发生在客户端计算机上时。
让我印象深刻的一件事是,我们有一些相当大的 .NET 程序集,其中包含生成的 ORM 代码。
如果我使用具有唯一基地址的非托管 (Win32) DLL,则同一台计算机上的多个并发进程会将 DLL 加载到物理内存中一次,然后将其映射到所有应用程序的虚拟内存中。 因此,该 DLL 将使用一次物理内存。
问题是 .NET 程序集会发生什么情况。 该 DLL 包含 IL,尽管这部分内容可能在应用程序之间共享,但是从该 IL 生成的 JIT 代码又如何呢? 是共享的吗? 如果不是,我该如何衡量以确定这实际上是否导致了问题? (是的,我知道,它会有所帮助,但我不会花太多时间在这上面,直到它成为最大的问题)。
另外,我知道我们还没有查看解决方案中所有 .NET 程序集的基地址,.NET 程序集是否有必要这样做? 如果是这样,是否有一些关于如何确定这些地址的指南?
任何对此领域的见解都将受到欢迎,即使事实证明这不是一个大问题,甚至根本不是一个问题。
编辑:刚刚发现这个问题:.NET 程序集和 DLL rebasing< /a> 这部分回答了我的问题,但我仍然想知道 JITted 代码如何影响所有这些。
从该问题及其接受的答案看来,JITted 代码放置在堆上,这意味着每个进程将加载共享的二进制程序集映像,并在其自己的内存空间内生成代码的私有 JITted 副本。
我们有什么办法可以衡量这个吗? 如果这会生成大量代码,我们就必须更多地查看生成的代码以确定是否需要调整它。
编辑:在这里添加了一个较短的问题列表:
- 是否有必要确保 .NET 程序集的基址是唯一且不重叠的,以避免重新定位主要用于获取 IL 的 dll代码不适合 JITting?
- 如何测量 JIT 代码使用了多少内存来确定这是否真的是一个问题?
@Brian Rasmussen 的答案 此处 表明 JITting 将生成 JITted 代码的每个进程副本,正如我所预期的那样,但是对程序集进行变基实际上会产生减少内存使用的效果。 我将不得不深入研究他提到的 WinDbg+SoS 工具,这些工具我已经在我的清单上有一段时间了,但现在我怀疑我不能再推迟了:)
编辑:一些我在该主题上找到的链接:
Before I start with the real question, let me just say that I might get some of the details here wrong. If so, please arrest me on those as well as, or even instead of answering my question.
My question is about DLLs and .NET, basically. We have an application that is using quite a bit of memory and we're trying to figure out how to measure this correctly, especially when the problem mainly occurs on clients' computers.
One thing that hit me is that we have some rather large .NET assemblies with generated ORM-code.
If I were using an unmanaged (Win32) DLL that had a unique base-address, multiple simultaneous processes on the same machine would load the DLL once into physical memory, and just map it into virtual memory for all the applications. Thus, physical memory would be used once for this DLL.
The question is what happens with a .NET assembly. This DLL contains IL, and though this portion of it might be shared between the applications, what about the JITted code that results from this IL? Is it shared? If not, how do I measure to figure out of this is actually contributing to the problem or not? (Yes, I know, it will contribute, but I'm not going to spend much time on this until it is the biggest problem).
Also, I know that we haven't looked at the base address for all the .NET assemblies in our solution, is it necessary for .NET assemblies to do so? And if so, are there some guidelines available on how to determine these addresses?
Any insight into this area would be most welcome, even if it turns out that this is not a big problem, or not even a problem at all.
Edit: Just found this question: .NET assemblies and DLL rebasing which partially answers my question, but I'd still like to know how JITted code factors into all of this.
It appears from that question and its accepted answer that the JITted code is placed on a heap, which means that each process will load up the shared binary assembly image, and produce a private JITted copy of the code inside its own memory space.
Is there any way for us to measure this? If this turns out to produce a lot of code, we'd have to look at the generated code more to figure out if we need to adjust it.
Edit: Added a shorter list of questions here:
- Is there any point in making sure base addresses of .NET assemblies are unique and non-overlapping to avoid rebasing a dll that will mostly be used to just get IL code out of for JITting?
- How can I measure how much memory is used for JITted code to figure out if this is really a problem or not?
The answer by @Brian Rasmussen here indicates that JITting will produce per-process copies of JITted code, as I expected, but that rebasing the assemblies will actually have an effect in regards of reduced memory usage. I will have to dig into the WinDbg+SoS tools he mentions, something I've had on my list for a while but now I suspect I can't put it off any longer :)
Edit: Some links I've found on the subject:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是针对问题 1)
即时代码被放置在一个特殊的堆上。 您可以使用 WinDbg + SoS 中的
!eeheap
命令检查此堆。 因此,每个进程都会有自己的即时代码副本。 该命令还将显示代码堆的总大小。如果您需要有关从 WinDbg 获取此信息的更多详细信息,请告诉我。
这是针对问题 2)
根据 Expert .NET 2.0 IL Assembly 一书,纯 IL PE 文件的
.reloc
部分仅包含CLR 启动存根的一个修复条目。 因此,在变基期间托管 DLL 所需的修复量相当有限。但是,如果您列出任何给定的托管进程,您会注意到 Microsoft 已重新调整其大部分(或可能全部)托管 DLL 的基础。 这是否应该被视为重新设定基准的原因取决于您。
This is for question 1)
The jitted code is placed on a special heap. You can inspect this heap using the
!eeheap
command in WinDbg + SoS. Thus every process will have its own copy of the jitted code. The command will also show you the total size of the code heap.Let me know if you want additional details on getting this information from WinDbg.
This is for question 2)
According to the book Expert .NET 2.0 IL Assembly the
.reloc
part of a pure-IL PE file contains only one fixup entry for the CLR startup stub. So the amount of fixups needed for a managed DLL during rebasing is fairly limited.However, if you list any given managed process, you'll notice that Microsoft has rebased the bulk (or maybe all) of their managed DLLs. Whether that should be viewed as an reason for rebasing or not is up to you.
我不确定以下信息对于较新版本的 .NET 和/或 Windows 版本的准确性如何。 自 .NET 早期以来,MS 可能已经解决了一些 DLL 加载/共享问题。 但我相信以下大部分内容仍然适用。
对于 .NET 程序集,进程之间(以及终端服务器会话之间)页面共享的许多好处消失了,因为 JIT 需要动态编写本机代码 - 没有图像文件来备份本机代码。 因此,每个进程都有自己的、独立的内存页来存储 jitted 代码。
这类似于由于 DLL 的基础不正确而导致的问题 - 如果操作系统在加载标准 Win32 DLL 时需要对其执行修复,则无法共享修复部分的内存页。
然而,即使无法共享 jitted 代码,变基 .NET DLL 也有好处,因为 DLL 仍会加载元数据(和 IL)——并且如果不需要修复,则可以共享这些内容。
通过使用 ngen 可以帮助与 .NET 程序集共享内存页面。 但这也带来了一系列问题。
有关详细信息,请参阅 Jason Zander 的这篇旧博客文章:
http: //blogs.msdn.com/jasonz/archive/2003/09/24/53574.aspx
Larry Osterman 有一篇关于 DLL 页面共享和修复效果的不错的博客文章:
http://blogs.msdn.com/larryosterman/archive/2004/07/06/174516。 ASPX
I'm not sure how accurate the following infomrationis with newer versions of .NET and/or Windows versions. MS may have addressed some of the DLL loading/sharing issues since the early days of .NET. But I believe that much of the following still does apply.
With .NET assemblies a lot of the benefit of page sharing between processes (and between Terminal server sessions) disappears because the JIT needs to write the native code on the fly - there's no image file to back up the native code. So each process gets it's own, separate memory pages for the jitted code.
This is similar to the issues that are caused by having DLLs improperly based - if the OS needs to perform fixups on a standard Win32 DLL when it's loaded, the memory pages for the fixed up portions cannot be shared.
However, even if the jitted code cannot be shared, there is a benefit to rebasing .NET DLLs because the DLL is still loaded for the metadata (and IL) - and that stuff can be shared if no fixups are required.
It's possible to help share memory pages with a .NET assembly by using ngen. but that brings along its own set of issues.
See this old blog post by Jason Zander for some details:
http://blogs.msdn.com/jasonz/archive/2003/09/24/53574.aspx
Larry Osterman has a decent blog article on DLL page sharing and the effect of fixups:
http://blogs.msdn.com/larryosterman/archive/2004/07/06/174516.aspx
我认为您对共享程序集和 dll 以及进程内存空间感到困惑。
.NET 和标准 Win32 DLL 在使用它们的不同进程之间共享代码。 对于 .NET,这只适用于具有相同版本签名的 DLL,以便同一 DLL 的两个不同版本可以同时加载到内存中。
问题是,看起来您希望共享库调用分配的内存,但这种情况永远不会(几乎)发生。 当库中的函数分配内存时(我猜这种情况对于 ORM DLL 来说经常发生),该内存是在调用进程的内存空间内分配的,每个进程都有唯一的数据实例。
所以,是的,事实上 DLL 代码 被加载一次并在调用者之间共享,但代码指令(以及分配)单独发生在调用进程空间中。
编辑:
好的,让我们看看 JIT 如何与 .NET 程序集一起工作。
当我们谈论对代码进行 JIT 时,过程相对简单。 内部有一个称为虚拟方法表的结构,它基本上包含在调用期间将调用的虚拟地址。 在 .NET 中,JIT 的工作原理基本上是编辑该表,以便每个调用都重定向到 JIT 编译器。 这样,每当我们调用一个方法时,JIT 都会介入并将代码编译为实际的机器指令(因此称为“Just In Time”),一旦完成,JIT 就会返回到 VMT 并替换调用的旧条目他指向生成的低级代码。 这样,所有后续调用都将被重定向到已编译的代码(因此我们只需编译一次)。 因此,JIT 不会每次都被调用,并且所有后续调用都将重定向到相同的编译代码。 对于 DLL,过程可能是相同的(尽管我不能完全向您保证确实如此)。
I think you're getting confused about shared assemblies and dlls and the process memory space.
Both .NET and standard Win32 DLL share code among the different process using them. In the case of .NET this is only true for DLLs with the same version signature so that two different versions of the same dll can be loaded in memory at the same time.
The thing is it looks like you're expecting the memory allocated by the library calls to be shared as well, well that never (almost) happens. When a function inside your library allocates memory, and I guess that happens a lot for an ORM DLL, that memory is allocated inside the memory space of the calling process, each process having unique instances of the data.
So yes, in fact the DLL code is being loaded once and shared among the callers but the code instructions (and therefore the allocations) take place separately into the calling process space.
Edit:
Ok, Let's see how JIT works with .NET assemblies.
When we talk about JITing the code the process is relatively simple. Internally there's a structure called the Virtual Method Table which basically contains the virtual address that will be invoked during a call. In .NET, JIT works by basically editing that table so that every single call redirects to the JIT compiler. That way, any time we call a method the JIT steps in and compiles the code to the actual machine instructions (hence the Just In Time), once that has been done, the JIT goes back to the VMT and substitutes the old entry that invoked him to point the generated low level code. That way, all subsequent calls will be redirected to the compiled code (so we just compile once). So the JIT is not invoked every time and all subsequent calls will redirect to the same compiled code. For DLLs the process is likely to be the same (although I can't completely assure you it is).