当前位置：文江博客话题详情

C# hashcode

hashCode 有何用途？它是独一无二的吗？

发布于 2024-12-04 10:04:37 字数 203 浏览 1 评论 0 原文

我注意到 WP7 中的每个控件、项目中都有一个 getHashCode() 方法，它返回一个数字序列。我可以使用此哈希码来唯一标识一个项目吗？

例如，我想识别设备上的图片或歌曲，并检查它的位置。如果为特定项目给出的哈希码是唯一的，则可以完成此操作。

你能帮我解释一下 hashCode 是什么，以及 getHashCode() 的用途吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谈下烟灰 2024-12-11 10:04:37

通过类比进行简单解释

在了解了它的全部内容之后（MSDN 文档对我来说有点太复杂），我想通过一个“故事”来简化它（希望）使其更容易理解。

摘要：什么是哈希码？

这是一个指纹。
它有什么用？我们可以使用此指纹来识别感兴趣的人。

您可以将哈希码视为我们试图唯一识别某人

我是一名侦探，正在寻找罪犯。让我们称他为残酷先生吧。（当我还是个孩子的时候，他是一个臭名昭著的绑匪——他闯入一所房子，绑架并谋杀了一个可怜的女孩，然后扔掉了她的尸体。他仍然逍遥法外 - 但这是另一回事。残酷先生有一些独特的特征，我可以用这些特征在茫茫人海中唯一地识别出他。我们在澳大利亚有 2500 万人，其中之一就是残酷先生。我们如何才能找到他呢？

显然，残酷

先生的眼睛是蓝色的。澳大利亚人也有蓝眼睛。

识别残酷先生的好方法

我还可以使用什么？我会使用指纹！

优点：

真的是真的。两个人很难拥有相同的指纹（并非不可能，但极不可能）
残酷先生
整个生命的每一个部分：他的外表、头发颜色、个性、饮食习惯等都必须（理想地）反映在他的指纹中。如果他有一个兄弟（非常相似但不相同） - 那么两个应该都有不同指纹。我说“应该”是因为我们不能100%保证这个世界上的两个人会有不同的指纹。
但我们始终可以保证残酷先生将始终拥有相同的指纹 - 并且他的指纹永远不会改变。

哈希码和指纹

上述特征通常可以构成良好的哈希函数：对于给定的输入，我们想要一个唯一的输出 - 每次都有相同的输出；如果我们稍微改变输入，那么我们应该得到完全不同的输出。这个输出是“哈希码”。

hashFunction(string input) { // etc. }

hashFunction("1234") => "ABCD" output
hashFunction("1235") => "KDSL" output //completely different, even though the input changed only the last digit

那么什么是“碰撞”？

所以想象一下，如果我找到线索并且发现有人与残酷先生的指纹相匹配。这是否意味着我找到了残酷先生？

........也许！我必须仔细看看。如果我使用 SHA256（一种哈希函数）并且我正在一个只有 5 个人的小镇寻找 - 那么我很有可能找到他！但是，如果我使用 MD5（另一个著名的哈希函数）并在一个人口超过 2^1000 的城镇中检查指纹，那么两个完全不同的人很可能拥有相同的指纹。

那么这一切有什么好处呢？

哈希码的唯一真正好处是，如果您想将某些内容放入哈希表中 - 并且使用哈希表您希望快速找到对象 - 这就是哈希码的用处。它们使您可以非常快速地在哈希表中查找内容。这是一种可以极大提高性能的技巧，但会牺牲一点点准确性。

因此，让我们想象一下，我们有一个包含人员的哈希表 - 澳大利亚有 2500 万嫌疑人。残酷先生就在那里……我们怎样才能快速找到他？？我们需要对所有这些进行分类：找到潜在的匹配者，或者以其他方式释放潜在的嫌疑人。您不想考虑每个人的独特特征，因为这会花费太多时间。你会用什么来代替？你会使用哈希码！哈希码可以告诉您两个人是否不同。乔·布洛格斯是否不是残酷先生。如果指纹不匹配，那么您就知道这绝对不是残酷先生。但是，如果指纹匹配，那么根据您使用的哈希函数，您很可能已经找到了您的男人。但这不是100%。您可以确定的唯一方法是进一步调查：(i) 他/她是否有机会/动机，(ii) 证人等。

当您使用计算机时，如果两个对象具有相同的内容哈希码值，那么您再次需要进一步调查它们是否真正相等。例如，您必须检查对象是否具有相同的高度、相同的重量等，整数是否相同，或者 customer_id 是否匹配，然后得出它们是否相同的结论。这通常可以通过实现 IComparer 或 IEquality 接口来完成。

关键摘要

所以基本上哈希码就是指纹。

两个不同的人/物体理论上仍然可以具有相同的
指纹。或者换句话说，如果您有两个相同的指纹......那么它们不必都来自同一个人/物体。
但是，同一个人/物体将始终返回
相同的指纹。
这意味着如果两个对象返回不同哈希码，那么您可以 100% 确定这些对象是不同的。

脚注：

哈希码解释的前奏：https://en.wikipedia.org/wiki/Murder_of_Karmein_Chan

Simple Expalantion via Analogy

After learning what it is all about (MSDN documentation was a little too complex for me) I thought to simplify it via a "story" to (hopefully) make it easier to understand.

Summary: What is a hashcode?

It's a fingerprint.
What's it useful for? We can use this finger print to identify people of interest.

You can think of a Hashcode as us trying to To Uniquely Identify Someone

I am a detective, on the look out for a criminal. Let us call him Mr Cruel. (He was a notorious kidnapper when I was a kid -- he broke into a house, kidnapped, and murdered a poor girl, then dumped her body. He's still out on the loose - but that's a separate matter. Mr Cruel has certain peculiar characteristics that I can use to uniquely identify him amongst a sea of people. We have 25 million people in Australia. One of them is Mr Cruel. How can we find him?

Bad ways of Identifying Mr Cruel

Apparently Mr Cruel has blue eyes. That's not much help because almost half the population in Australia also has blue eyes.

Good ways of Identifying Mr Cruel

What else can i use? I know: I will use a fingerprint!

Advantages:

It is really really hard for two people to have the same finger print (not impossible, but extremely unlikely).
Mr Cruel's fingerprint will never change.
Every single part of Mr Cruel's entire being: his looks, hair colour, personality, eating habits etc must (ideally) be reflected in his fingerprint, such that if he has a brother (who is very similar but not the same) - then both should have different finger prints. I say "should" because we cannot guarantee 100% that two people in this world will have different fingerprints.
But we can always guarantee that Mr Cruel will always have the same finger print - and that his fingerprint will NEVER change.

Hashcodes and Fingerprints

The above characteristics generally make for good hash functions: for a given input, we want a unique output - the same output every time; if we change the input a tiny bit, then we ought to get a completely different output. This output, is the 'hashcode'.

hashFunction(string input) { // etc. }

hashFunction("1234") => "ABCD" output
hashFunction("1235") => "KDSL" output //completely different, even though the input changed only the last digit

So then what's a 'Collision'?

So imagine if I get a lead and I find someone matching Mr Cruel's fingerprints. Does this mean I have found Mr Cruel?

........perhaps! I must take a closer look. If i am using SHA256 (a hashing function) and I am looking in a small town with only 5 people - then there is a very good chance I found him! But if I am using MD5 (another famous hashing function) and checking for fingerprints in a town with +2^1000 people, then it is a fairly good possibility that two entirely different people might have the same fingerprint.

So what is the benefit of all this anyways?

The only real benefit of hashcodes is if you want to put something in a hash table - and with hash tables you'd want to find objects quickly - and that's where the hash code comes in. They allow you to find things in hash tables really quickly. It's a hack that massively improves performance, but at a small expense of accuracy.

So let's imagine we have a hash table filled with people - 25 million suspects in Australia. Mr Cruel is somewhere in there..... How can we find him really quickly? We need to sort through them all: to find a potential match, or to otherwise acquit potential suspects. You don't want to consider each person's unique characteristics because that would take too much time. What would you use instead? You'd use a hashcode! A hashcode can tell you if two people are different. Whether Joe Bloggs is NOT Mr Cruel. If the prints don't match then you know it's definitely NOT Mr Cruel. But, if the finger prints do match then depending on the hash function you used, chances are already fairly good you found your man. But it's not 100%. The only way you can be certain is to investigate further: (i) did he/she have an opportunity/motive, (ii) witnesses etc etc.

When you are using computers if two objects have the same hash code value, then you again need to investigate further whether they are truly equal. e.g. You'd have to check whether the objects have e.g. the same height, same weight etc, if the integers are the same, or if the customer_id is a match, and then come to the conclusion whether they are the same. this is typically done perhaps by implementing an IComparer or IEquality interfaces.

Key Summary

So basically a hashcode is a finger print.

Two different people/objects can theoretically still have the same
fingerprint. Or in other words, if you have two fingerprints that are the same.........then they need not both come from the same person/object.
Buuuuuut, the same person/object will always return the
same fingerprint.
Which means that if two objects return different hash codes then you know for 100% certainty that those objects are different.

Footnotes:

The prelude to the hashcode explanation: https://en.wikipedia.org/wiki/Murder_of_Karmein_Chan

回复收藏 0 原文

永不分离 2024-12-11 10:04:37

MSDN 说：

哈希码是用于标识对象的数值
在平等测试期间。它还可以用作对象的索引
在一个集合中。

GetHashCode方法适用于散列算法和
数据结构，例如哈希表。

GetHashCode 方法的默认实现没有
保证不同对象的唯一返回值。此外，
.NET Framework 不保证默认实现
GetHashCode方法，并且它返回的值之间会是相同的
.NET Framework 的不同版本。因此，默认的
此方法的实现不得用作唯一对象
用于散列目的的标识符。

GetHashCode 方法可以被派生类型重写。价值
类型必须重写此方法以提供哈希函数
适合该类型并提供有用的分布
哈希表。为了唯一性，哈希码必须基于值
实例字段或属性而不是静态字段或
属性。

用作 Hashtable 对象中的键的对象还必须覆盖
GetHashCode 方法，因为这些对象必须生成自己的哈希值
代码。如果用作键的对象不提供有用的
GetHashCode的实现，可以指定哈希码提供者
当构造 Hashtable 对象时。在 .NET Framework 出现之前
2.0 版本中，哈希码提供程序基于
System.Collections.IHashCodeProvider 接口。从版本开始
2.0，哈希码提供者基于
System.Collections.IEqualityComparer 接口。