.NET 唯一对象标识符
有没有办法获取实例的唯一标识符?
对于指向同一实例的两个引用,GetHashCode()
是相同的。 然而,两个不同的实例可以(很容易)获得相同的哈希码:
Hashtable hashCodesSeen = new Hashtable();
LinkedList<object> l = new LinkedList<object>();
int n = 0;
while (true)
{
object o = new object();
// Remember objects so that they don't get collected.
// This does not make any difference though :(
l.AddFirst(o);
int hashCode = o.GetHashCode();
n++;
if (hashCodesSeen.ContainsKey(hashCode))
{
// Same hashCode seen twice for DIFFERENT objects (n is as low as 5322).
Console.WriteLine("Hashcode seen twice: " + n + " (" + hashCode + ")");
break;
}
hashCodesSeen.Add(hashCode, null);
}
我正在编写一个调试插件,并且我需要获得某种 ID 来作为参考,该 ID 在程序运行期间是唯一的。
我已经设法获取实例的内部地址,该地址在垃圾收集器 (GC) 压缩堆(= 移动对象 = 更改地址)之前是唯一的。
堆栈溢出问题Object.GetHashCode()的默认实现 > 可能相关。
这些对象不受我的控制,因为我正在使用调试器 API 访问正在调试的程序中的对象。 如果我可以控制这些对象,那么添加我自己的唯一标识符将是微不足道的。
我想要用于构建哈希表 ID 的唯一 ID -> 对象,能够查找已经见过的对象。 现在我是这样解决的:
Build a hashtable: 'hashCode' -> (list of objects with hash code == 'hashCode')
Find if object seen(o) {
candidates = hashtable[o.GetHashCode()] // Objects with the same hashCode.
If no candidates, the object is new
If some candidates, compare their addresses to o.Address
If no address is equal (the hash code was just a coincidence) -> o is new
If some address equal, o already seen
}
Is there a way of getting a unique identifier of an instance?
GetHashCode()
is the same for the two references pointing to the same instance. However, two different instances can (quite easily) get the same hash code:
Hashtable hashCodesSeen = new Hashtable();
LinkedList<object> l = new LinkedList<object>();
int n = 0;
while (true)
{
object o = new object();
// Remember objects so that they don't get collected.
// This does not make any difference though :(
l.AddFirst(o);
int hashCode = o.GetHashCode();
n++;
if (hashCodesSeen.ContainsKey(hashCode))
{
// Same hashCode seen twice for DIFFERENT objects (n is as low as 5322).
Console.WriteLine("Hashcode seen twice: " + n + " (" + hashCode + ")");
break;
}
hashCodesSeen.Add(hashCode, null);
}
I'm writing a debugging addin, and I need to get some kind of ID for a reference which is unique during the run of the program.
I already managed to get internal ADDRESS of the instance, which is unique until the garbage collector (GC) compacts the heap (= moves the objects = changes the addresses).
Stack Overflow question Default implementation for Object.GetHashCode() might be related.
The objects are not under my control as I am accessing objects in a program being debugged using the debugger API. If I was in control of the objects, adding my own unique identifiers would be trivial.
I wanted the unique ID for building a hashtable ID -> object, to be able to lookup already seen objects. For now I solved it like this:
Build a hashtable: 'hashCode' -> (list of objects with hash code == 'hashCode')
Find if object seen(o) {
candidates = hashtable[o.GetHashCode()] // Objects with the same hashCode.
If no candidates, the object is new
If some candidates, compare their addresses to o.Address
If no address is equal (the hash code was just a coincidence) -> o is new
If some address equal, o already seen
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
仅限 .NET 4 及更高版本
大家好消息!
这项工作的完美工具是在 .NET 4 中构建的,名为
ConditionalWeakTable
。 此类:.NET 4 and later only
Good news, everyone!
The perfect tool for this job is built in .NET 4 and it's called
ConditionalWeakTable<TKey, TValue>
. This class:查看 ObjectIDGenerator 类? 这就是您想要做的事情以及 Marc Gravell 所描述的事情。
Checked out the ObjectIDGenerator class? This does what you're attempting to do, and what Marc Gravell describes.
引用是对象的唯一标识符。 我不知道有什么方法可以将其转换为字符串等。引用的值会在压缩过程中发生变化(如您所见),但每个先前的值 A 都会更改为值 B,所以到目前为止就安全代码而言,它仍然是唯一的 ID。
如果涉及的对象在您的控制之下,您可以使用弱引用创建映射(以避免阻止垃圾收集)从对您选择的 ID(GUID、整数等)的引用。 然而,这会增加一定量的开销和复杂性。
The reference is the unique identifier for the object. I don't know of any way of converting this into anything like a string etc. The value of the reference will change during compaction (as you've seen), but every previous value A will be changed to value B, so as far as safe code is concerned it's still a unique ID.
If the objects involved are under your control, you could create a mapping using weak references (to avoid preventing garbage collection) from a reference to an ID of your choosing (GUID, integer, whatever). That would add a certain amount of overhead and complexity, however.
RuntimeHelpers.GetHashCode()
可能会有所帮助(MSDN)。RuntimeHelpers.GetHashCode()
may help (MSDN).您可以在一秒钟内开发自己的东西。 例如:
您可以选择自己想要的唯一 ID,例如 System.Guid.NewGuid() 或简单的整数以实现最快的访问。
You can develop your own thing in a second. For instance:
You can choose what you will like to have as unique ID on your own, for instance, System.Guid.NewGuid() or simply integer for fastest access.
这个方法怎么样:
将第一个对象中的字段设置为新值。 如果第二个对象中的相同字段具有相同的值,则它可能是同一个实例。 否则,以不同方式退出。
现在将第一个对象中的字段设置为不同的新值。 如果第二个对象中的相同字段已更改为不同的值,则它肯定是同一个实例。
不要忘记在退出时将第一个对象中的字段设置回其原始值。
问题?
How about this method:
Set a field in the first object to a new value. If the same field in the second object has the same value, it's probably the same instance. Otherwise, exit as different.
Now set the field in the first object to a different new value. If the same field in the second object has changed to the different value, it's definitely the same instance.
Don't forget to set field in the first object back to it's original value on exit.
Problems?
可以在 Visual Studio 中创建唯一的对象标识符:在监视窗口中,右键单击对象变量,然后从上下文菜单中选择创建对象 ID。
不幸的是,这是一个手动步骤,我不相信可以通过代码访问标识符。
It is possible to make a unique object identifier in Visual Studio: In the watch window, right-click the object variable and choose Make Object ID from the context menu.
Unfortunately, this is a manual step, and I don't believe the identifier can be accessed via code.
您必须自己手动分配这样的标识符 - 无论是在实例内部还是在外部。
对于与数据库相关的记录,主键可能很有用(但您仍然可以获得重复项)。 或者,可以使用
Guid
,或者保留自己的计数器,使用Interlocked.Increment
进行分配(并使其足够大,以免溢出)。You would have to assign such an identifier yourself, manually - either inside the instance, or externally.
For records related to a database, the primary key may be useful (but you can still get duplicates). Alternatively, either use a
Guid
, or keep your own counter, allocating usingInterlocked.Increment
(and make it large enough that it isn't likely to overflow).我知道这个问题已经得到解答,但至少值得注意的是,您可以使用:
http://msdn.microsoft.com/en-us/library/system.object.referenceequals.aspx
这不会直接给你一个“唯一的id”,而是与WeakReferences(和哈希集?)可以为您提供一种非常简单的方法来跟踪各种实例。
I know that this has been answered, but it's at least useful to note that you can use:
http://msdn.microsoft.com/en-us/library/system.object.referenceequals.aspx
Which will not give you a "unique id" directly, but combined with WeakReferences (and a hashset?) could give you a pretty easy way of tracking various instances.
如果您正在自己的代码中编写用于特定用途的模块,majkinetor 的方法 可能工作了。 但也存在一些问题。
首先,官方文档不保证
GetHashCode()
返回唯一标识符(参见Object.GetHashCode 方法 () ):第二,假设您的对象数量非常少,因此
GetHashCode()
在大多数情况下都可以工作,此方法可以被某些类型覆盖。例如,您正在使用某个类 C,它重写
GetHashCode()
以始终返回 0。然后 C 的每个对象都将获得相同的哈希码。不幸的是,
Dictionary
、HashTable
和其他一些关联容器将使用此方法:所以,这种方法有很大的局限性。
甚至,如果您想构建一个通用库怎么办?
您不仅无法修改所使用的类的源代码,而且它们的行为也是不可预测的。
我很欣赏乔恩和西蒙 已经发布了他们的答案,我将在下面发布一个代码示例和关于性能的建议。
在我的测试中,在
for
循环中创建 10,000,000 个对象(比上面代码高出 10 倍)时,ObjectIDGenerator
会抛出异常,抱怨对象过多。此外,基准测试结果是
ConditionalWeakTable
实现比ObjectIDGenerator
实现快 1.8 倍。If you are writing a module in your own code for a specific usage, majkinetor's method MIGHT have worked. But there are some problems.
First, the official document does NOT guarantee that the
GetHashCode()
returns an unique identifier (see Object.GetHashCode Method ()):Second, assume you have a very small amount of objects so that
GetHashCode()
will work in most cases, this method can be overridden by some types.For example, you are using some class C and it overrides
GetHashCode()
to always return 0. Then every object of C will get the same hash code.Unfortunately,
Dictionary
,HashTable
and some other associative containers will make use this method:So, this approach has great limitations.
And even more, what if you want to build a general purpose library?
Not only are you not able to modify the source code of the used classes, but their behavior is also unpredictable.
I appreciate that Jon and Simon have posted their answers, and I will post a code example and a suggestion on performance below.
In my test, the
ObjectIDGenerator
will throw an exception to complain that there are too many objects when creating 10,000,000 objects (10x than in the code above) in thefor
loop.Also, the benchmark result is that the
ConditionalWeakTable
implementation is 1.8x faster than theObjectIDGenerator
implementation.我在这里提供的信息并不新鲜,我只是为了完整性而添加了这些信息。
这段代码的想法非常简单:
GUID
,它根据定义是唯一的。ConditionalWeakTable
。组合起来,将为您提供以下代码:
要使用它,请创建
UniqueIdMapper
的实例并使用它为对象返回的 GUID。附录
所以,这里还有更多内容; 让我写一些关于
ConditionalWeakTable
的内容。ConditionalWeakTable
做了几件事。 最重要的是它不关心垃圾收集器,也就是说:您在此表中引用的对象无论如何都会被收集。 如果你查找一个对象,它的工作原理基本上与上面的字典相同。好奇吗? 毕竟,当 GC 收集对象时,它会检查是否存在对该对象的引用,如果存在,就会收集它们。 那么,如果
ConditionalWeakTable
中有一个对象,那么为什么要收集引用的对象呢?ConditionalWeakTable
使用了一个小技巧,其他一些 .NET 结构也使用了该技巧:它实际上存储的是 IntPtr,而不是存储对象的引用。 因为那不是真正的引用,所以可以收集该对象。所以,此时有两个问题需要解决。 首先,对象可以在堆上移动,那么我们将使用什么作为IntPtr呢? 其次,我们如何知道对象具有活动引用?
最后一个解决方案确实要求运行时在显式释放列表存储桶之前不重复使用它们,并且还要求通过调用运行时来检索所有对象。
如果我们假设他们使用这个解决方案,我们也可以解决第二个问题。 马克与 扫描算法会跟踪哪些对象已被收集; 一旦收集完毕,我们就知道了。 一旦对象检查该对象是否存在,它就会调用“Free”,这将删除指针和列表条目。 对象确实消失了。
此时需要注意的一件重要事情是,如果
ConditionalWeakTable
在多个线程中更新并且它不是线程安全的,那么事情会发生严重错误。 结果将是内存泄漏。 这就是为什么ConditionalWeakTable
中的所有调用都会执行一个简单的“锁定”以确保这种情况不会发生。另一件需要注意的事情是清理条目必须每隔一段时间进行一次。 虽然实际对象将被 GC 清理,但条目却不会。 这就是为什么
ConditionalWeakTable
的大小只会增长。 一旦达到一定的限制(由哈希中的碰撞机会确定),它就会触发Resize
,检查对象是否必须清理——如果需要清理,free
> 在 GC 进程中调用,删除IntPtr
句柄。我相信这也是
DependentHandle
不直接公开的原因 - 您不想弄乱事情并因此导致内存泄漏。 下一个最好的方法是WeakReference
(它还存储IntPtr
而不是对象) - 但不幸的是不包括“依赖关系”方面。剩下的就是让您尝试一下机制,以便您可以看到实际的依赖关系。 请务必启动多次并观察结果:
The information I give here is not new, I just added this for completeness.
The idea of this code is quite simple:
RuntimeHelpers.GetHashCode
to get us a sort-of unique IDobject.ReferenceEquals
GUID
, which is by definition unique.ConditionalWeakTable
.Combined, that will give you the following code:
To use it, create an instance of the
UniqueIdMapper
and use the GUID's it returns for the objects.Addendum
So, there's a bit more going on here; let me write a bit down about
ConditionalWeakTable
.ConditionalWeakTable
does a couple of things. The most important thing is that it doens't care about the garbage collector, that is: the objects that you reference in this table will be collected regardless. If you lookup an object, it basically works the same as the dictionary above.Curious no? After all, when an object is being collected by the GC, it checks if there are references to the object, and if there are, it collects them. So if there's an object from the
ConditionalWeakTable
, why will the referenced object be collected then?ConditionalWeakTable
uses a small trick, which some other .NET structures also use: instead of storing a reference to the object, it actually stores an IntPtr. Because that's not a real reference, the object can be collected.So, at this point there are 2 problems to address. First, objects can be moved on the heap, so what will we use as IntPtr? And second, how do we know that objects have an active reference?
DependentHandle
- but I believe it's slightly more sophisticated.DependentHandle
.This last solution does require that the runtime doesn't re-use the list buckets until they are explicitly freed, and it also requires that all objects are retrieved by a call to the runtime.
If we assume they use this solution, we can also address the second problem. The Mark & Sweep algorithm keeps track of which objects have been collected; as soon as it has been collected, we know at this point. Once the object checks if the object is there, it calls 'Free', which removes the pointer and the list entry. The object is really gone.
One important thing to note at this point is that things go horribly wrong if
ConditionalWeakTable
is updated in multiple threads and if it isn't thread safe. The result would be a memory leak. This is why all calls inConditionalWeakTable
do a simple 'lock' which ensures this doesn't happen.Another thing to note is that cleaning up entries has to happen once in a while. While the actual objects will be cleaned up by the GC, the entries are not. This is why
ConditionalWeakTable
only grows in size. Once it hits a certain limit (determined by collision chance in the hash), it triggers aResize
, which checks if objects have to be cleaned up -- if they do,free
is called in the GC process, removing theIntPtr
handle.I believe this is also why
DependentHandle
is not exposed directly - you don't want to mess with things and get a memory leak as a result. The next best thing for that is aWeakReference
(which also stores anIntPtr
instead of an object) - but unfortunately doesn't include the 'dependency' aspect.What remains is for you to toy around with the mechanics, so that you can see the dependency in action. Be sure to start it multiple times and watch the results: