.NET CLR 如何区分托管指针和非托管指针?
所有内容最终都被 JIT 到本机机器代码中,因此最终,我们在 .NET 中有一个本机堆栈,GC 每当进行垃圾收集时都需要扫描对象指针。
现在的问题是:.NET 垃圾收集器如何确定指向 GC 堆内对象的指针实际上是托管指针还是随机整数,碰巧具有对应于的值有效地址?
显然,如果它无法区分两者,那么可能会出现内存泄漏,所以我想知道它是如何工作的。或者——我敢说——.NET 有可能泄漏内存吗? :哦
Everything is ultimately JITed into native machine code, so ultimately, we have a native stack in .NET which the GC needs to scan for object pointers whenever it does a garbage collection.
Now, the question is: How does the .NET garbage collector figure out if a pointer to an object inside the GC heap is actually a managed pointer or a random integer that happens to have a value that corresponds to a valid address?
Obviously, if it can't distinguish the two, then there can be memory leaks, so I'm wondering how it works. Or -- dare I say it -- does .NET have the potential to leak memory? :O
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
正如其他人指出的那样,GC 准确地知道堆栈和堆上每个块的哪些字段是托管引用,因为 GC 和抖动知道所有内容的类型。
不过,你的观点很好理解。想象一个完全假设的世界,其中同一进程中进行两种内存管理。例如,假设您有一个完全假设的程序,名为“InterMothra Chro-Nagava-Sploranator”,用 C++ 编写,它使用传统的 COM 样式引用计数内存管理,其中所有内容都只是指向进程内存的指针,并且通过调用释放方法正确的次数。假设 Sploranator 假设有一种脚本语言 JabbaScript,它维护一个垃圾收集对象池。
当 JabbaScript 对象引用非托管 Sploranator 对象,并且该 Sploranator 对象也有引用时,就会出现问题。这是 JabbaScript 垃圾收集器无法破坏的循环引用,因为它不知道 Sploranator 对象的内存布局。所以这里存在内存泄漏的可能性。
解决此问题的一种方法是重写 Sploranator 内存管理器,以便它从托管 GC 池中分配其对象。
另一种方法是使用启发式方法; GC 可以专门使用处理器的线程来扫描所有内存,查找恰好是指向其对象的指针的整数。这听起来很多,但它可以省略未提交的页面、其自己的托管堆中的页面、已知仅包含代码的页面等等。 GC 可以猜测,如果它认为某个对象可能已死亡,并且在其控制之外的任何内存中找不到指向该对象的任何指针,则该对象几乎肯定已死亡。
当然,这种启发式的缺点是它可能是错误的。您可能有一个意外与指针匹配的整数(尽管在 64 位环境中这种情况不太可能发生)。这将延长对象的寿命。但谁在乎呢?我们已经处于循环引用可以延长对象生命周期的情况。我们正在努力让这种情况变得更好,而这种启发式方法就是这样做的。它不完美并不重要;它并不完美。这比什么都没有好。
另一种可能出错的方式是 Sploranator 可能对指针进行编码,例如在存储值时翻转其所有位,然后仅在调用之前将其翻转回来。如果 Sploranator 对这种 GC 启发式策略积极敌视,那么它就不起作用。
这里概述的垃圾收集策略与任何产品的实际 GC 策略之间的相似之处几乎完全是巧合。 Eric 对假设的不存在产品的垃圾收集器的实现细节的思考仅供娱乐。
As others have pointed out, the GC knows precisely which fields of every block on the stack and the heap are managed references, because the GC and the jitter know the type of everything.
However, your point is well-taken. Imagine an entirely hypothetical world in which there are two kinds of memory management going on in the same process. For example, suppose you have an entirely hypothetical program called "InterMothra Chro-Nagava-Sploranator" written in C++ that uses traditional COM-style reference-counted memory management where everything is just a pointer to process memory, and objects are released by invoking a Release method the correct number of times. Suppose Sploranator hypothetically has a scripting language, JabbaScript, that maintains a garbage-collected pool of objects.
Trouble arises when a JabbaScript object has a reference to a non-managed Sploranator object, and that same Sploranator object has a reference right back. That's a circular reference that cannot be broken by the JabbaScript garbage collector, because it doesn't know about the memory layout of the Sploranator object. So there is the potential here for memory leaks.
One way to solve this problem is to rewrite the Sploranator memory manager so that it allocates its objects out of the managed GC pool.
Another way is to use a heuristic; the GC can dedicate a thread of a processor to scan all of memory looking for integers that happen to be pointers to its objects. That sounds like a lot, but it can omit pages that are uncommitted, pages in its own managed heap, pages that are known to contain only code, and so on. The GC can make a guess that if it thinks an object might be dead, and it cannot find any pointer to that object in any memory outside of its control, then the object is almost certainly dead.
The down side of this heuristic is of course that it can be wrong. You might have an integer that accidentally matches a pointer (though that is less likely in 64 bit land). That would extend the lifetime of the object. But who cares? We are already in the situation where circular references can extend the lifetimes of objects. We're trying to make that situation better, and this heuristic does so. That it is not perfect is irrelevant; it's better than nothing.
The other way it can be wrong is that Sploranator could have encoded the pointer, by, say, flipping all of its bits when storing the value and only flipping it back right before the call. If Sploranator is actively hostile to this GC heuristic strategy then it doesn't work.
Resemblance between the garbage collection strategy outlined here and the actual GC strategy of any product is almost entirely coincidental. Eric's musings about implementation details of garbage collectors of hypothetical non-existing products are for entertainment purposes only.
垃圾收集器不需要推断特定的字节模式(无论是 4 字节还是 8 字节)是否是指针 - 它已经知道。
在 CLR 中,所有内容都是强类型的,因此垃圾收集器知道字节是否是
int
、long
、对象引用、无类型指针等。内存中对象的属性是在编译类型中定义的 - 存储在程序集中的元数据给出了实例的每个成员的类型和位置。
堆栈帧的布局类似 - JITter 在编译方法时布置堆栈帧,并跟踪哪些类型的数据存储在何处。 (它由 JITter 完成,以便根据处理器的功能进行不同的优化)。
当垃圾收集器运行时,它可以访问所有这些元数据,因此它永远不需要猜测特定的位模式是否可能是引用。
Eric Lippert 的博客是了解更多信息的好地方 - 引用不是地址 将是一个起点。
The garbage collector doesn't need to infer whether a particular byte pattern (whether 4 or 8 bytes) is a pointer or not - it already knows.
In the CLR everything is strongly typed, so the garbage collector knows whether the bytes are an
int
, along
, an object reference, an untyped pointer, etc etc.The layout of an object in memory is defined at compile type - metadata stored in the assembly gives the type and location of every member of the instance.
The layout of stack frames is similar - the JITter lays out the stack frame when the method is compiled, and keeps track of what kinds of data are stored where. (It's done by the JITter to allow for different optimizations depending on the capabilities of your processor).
When the garbage collector runs, it has access to all this metadata, so it never needs to guess whether a specific bit pattern might be a reference or not.
Eric Lippert's blog is a good place to find out more - References are not addresses would be a place to start.
当 JIT 代码时,编译器知道它将对象的引用放在哪里。每当您在方法中使用包含引用的字段时,它就知道该位置有一个引用。当您 JIT 代码时,也可以保留此信息。
现在有一个引用指向该对象。每个对象都有一个指向其类的指针(.GetType() 方法)。基本上GC现在可以获取一个指针,跟随它,读取对象的类型。该类型告诉您是否有其他字段包含对其他对象的引用。这样GC就可以遍历整个栈和堆。
当然这有点过于简化了,但是基本原理。最后是一个实现细节。当然还有其他方法和各种技巧可以有效地做到这一点。
注释后更新:栈上的指针指向堆上的对象。每个对象都有一个标头,其中还包含指向其类型信息的指针。因此,您可以取消引用堆栈上的指针,取消引用指向对象信息的指针以找出它是什么类型的对象。
Well when JITing the code the compiler knows in which places it puts the reference to objects. Whenever you use a field in a method, which holds a reference, it knows that in that place theres a reference. This information can also be preserved when you JIT the code.
Now a reference points to the object. Each object has a pointer to its class (the .GetType()-method). Basically the GC can now take a pointer, follow it, read the type of the object. The type tells you if there are other fields which contain references to other objects. This way the GC can walk the entire stack and heap.
Of course this is a bit over simplified, but the basic principle. And in the end its a implementation-detail. There are certainly other ways and all kinds of tricks to do this efficiently.
Update after comment: The pointer on the stack points to a object on the heap. Every object has a header, which also contains a pointer to its type-info. So you can dereference the pointer on the stack, there dereference the pointer to the object-info to find out what kind of object it is.
请记住,所有托管内存均由 CLR 管理。任何实际的托管引用都是由 CLR 创建的。它知道自己创造了什么,没有创造什么。
如果您确实觉得必须了解实现的细节,那么您应该阅读 CLR via C# 作者:杰弗里·里希特。答案并不简单——它的引用比 SO 上可以回答的要多一些。
Remember that all managed memory is managed by the CLR. Any actual managed reference was created by the CLR. It knows what it created and what it didn't.
If you really feel you must know the details of the implementation, then you should read CLR via C# by Jeffrey Richter. The answer is not simple - it's quote a bit more than can be answered on SO.
当您在 .NET 中创建新的引用类型对象时,您会自动将其“注册”到 CLR 及其 GC。无法将随机值类型注入此过程。换句话说:
CLR 不会维护一些与值类型混合的大型、无组织的指针堆。它只是跟踪 CLR 创建的对象(无论如何都是出于垃圾收集的目的)。任何值类型在堆栈上都将是短暂的,或者是类实例的成员。 GC 不会出现混乱。
When you create a new reference type object in .NET you are automatically "registering" it with the CLR and its GC. There is no way to inject random value types into this process. In other words:
The CLR does not maintain some large, disorganized heap of pointers mixed with value types. It just tracks CLR-created objects (for garbage collection purposes anyway.) Any value type will be short-lived on the stack or be a member of a class instance. There is no potential for confusion the GC.
查看垃圾收集:Microsoft .NET Framework 中的自动内存管理
(一些技术细节可能有点过时,但所描述的结构是有效的。)
文章中的一些要点......
...
...
关于这个问题...
如果该对象不可访问,GC 无论如何都会销毁它。
Have a look at Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework
(Some of the technical details might be a bit dated but the structure described is valid.)
Some brief points from the article....
...
...
On the question...
If the object is not reachable the GC will destroy it regardless.
根据“CLR via C#”一书,运行时通过检查“方法的内部表”确切地知道在哪里可以找到引用/指针。
这个内部表在微软的实现中保存了什么尚不清楚,但它可以准确地识别堆栈上的调用帧、局部变量,甚至寄存器为每个 EIP 地址保存什么样的值。
Mono 实现使用保守扫描,这意味着将堆栈上的每个值都视为潜在的指针。这不仅会导致内存泄漏,而且(因为它无法更新这些值)由此标识的对象也会被视为固定的(GC 压缩器无法移动),从而导致内存碎片。
现在,mono 可以选择使用 GCMap 的“精确堆栈标记”。您可以在此处阅读更多信息 http://www.mono-project.com/Generational_GC#Precise_Stack_Marking< /a>
请注意,此实现并不准确,因为它是 MS 实现,因为它继续保守地处理当前帧。
According to the "CLR via C#" book, the runtime knows exactly where will find the references/pointers by inspecting the "method's internal table".
What this internal table holds in the microsoft's implementation is unknown, but it can accuretly identify call frames on the stack, local variables and even what kind of value the registers hold for each EIP address.
The mono implementation used a conservative scanning, which means that treated every value on the stack as potential pointer. That not only translates to memory leaks, but also (since it cannot update those values) the objects identified by this, are treated as pinned (unmovable by the GC compactor) and that leads to memory fragmentation.
Now mono has the option of "Precise Stack Marking" which uses GCMaps. You can read more for it here http://www.mono-project.com/Generational_GC#Precise_Stack_Marking
Note that this implementation is not accurate as it is the MS one, since it continues to treat the current frame conservatively.
引用有标题,所以它不仅仅是一个随机整数。
References have headers, so it's not just a random integer.