hashCode 有何用途?它是独一无二的吗?

发布于 2024-12-04 10:04:37 字数 203 浏览 1 评论 0 原文

我注意到 WP7 中的每个控件、项目中都有一个 getHashCode() 方法,它返回一个数字序列。我可以使用此哈希码来唯一标识一个项目吗?

例如,我想识别设备上的图片或歌曲,并检查它的位置。如果为特定项目给出的哈希码是唯一的,则可以完成此操作。

你能帮我解释一下 hashCode 是什么,以及 getHashCode() 的用途吗?

I notice there is a getHashCode() method in every control, item, in WP7, which returns a sequence of numbers. Can I use this hashcode to uniquely identify an item?

For example, I want to identify a picture or a song on the device, and check it whereabout. This could be done if the hashcode given for specific items is unique.

Can you help explain to me what a hashCode is, and what getHashCode() is used for?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

谈下烟灰 2024-12-11 10:04:37

通过类比进行简单解释

在了解了它的全部内容之后(MSDN 文档对我来说有点太复杂),我想通过一个“故事”来简化它(希望)使其更容易理解。

摘要:什么是哈希码?

数字指纹 -图片归属 Pixabay - 免费使用:https://pixabay.com/en/finger-fingerprint-security-digital-2081169/

  • 这是一个指纹。

  • 它有什么用?我们可以使用此指纹来识别感兴趣的人。

您可以将哈希码视为我们试图唯一识别某人

我是一名侦探,正在寻找罪犯。让我们称他为残酷先生吧。 (当我还是个孩子的时候,他是一个臭名昭著的绑匪——他闯入一所房子,绑架并谋杀了一个可怜的女孩,然后扔掉了她的尸体。他仍然逍遥法外 - 但这是另一回事。残酷先生有一些独特的特征,我可以用这些特征在茫茫人海中唯一地识别出他。我们在澳大利亚有 2500 万人,其中之一就是残酷先生。我们如何才能找到他呢?

显然,残酷

先生的眼睛是蓝色的。澳大利亚人也有蓝眼睛。

识别残酷先生的好方法

我还可以使用什么?我会使用指纹!

优点

  • 真的是真的。两个人很难拥有相同的指纹(并非不可能,但极不可能)
  • 残酷先生
  • 整个生命的每一个部分:他的外表、头发颜色、个性、饮食习惯等都必须(理想地)反映在他的指纹中。如果他有一个兄弟(非常相似但不相同) - 那么两个应该都有不同指纹。我说“应该”是因为我们不能100%保证这个世界上的两个人会有不同的指纹。
  • 但我们始终可以保证残酷先生将始终拥有相同的指纹 - 并且他的指纹永远不会改变。

哈希码和指纹

上述特征通常可以构成良好的哈希函数:对于给定的输入,我们想要一个唯一的输出 - 每次都有相同的输出;如果我们稍微改变输入,那么我们应该得到完全不同的输出。这个输出是“哈希码”。

hashFunction(string input) { // etc. }

hashFunction("1234") => "ABCD" output
hashFunction("1235") => "KDSL" output //completely different, even though the input changed only the last digit

那么什么是“碰撞”?

所以想象一下,如果我找到线索并且发现有人与残酷先生的指纹相匹配。这是否意味着我找到了残酷先生?

........也许!我必须仔细看看。如果我使用 SHA256(一种哈希函数)并且我正在一个只有 5 个人的小镇寻找 - 那么我很有可能找到他!但是,如果我使用 MD5(另一个著名的哈希函数)并在一个人口超过 2^1000 的城镇中检查指纹,那么两个完全不同的人很可能拥有相同的指纹。

那么这一切有什么好处呢?

哈希码的唯一真正好处是,如果您想将某些内容放入哈希表中 - 并且使用哈希表您希望快速找到对象 - 这就是哈希码的用处。它们使您可以非常快速地在哈希表中查找内容。这是一种可以极大提高性能的技巧,但会牺牲一点点准确性。

因此,让我们想象一下,我们有一个包含人员的哈希表 - 澳大利亚有 2500 万嫌疑人。残酷先生就在那里……我们怎样才能快速找到他??我们需要对所有这些进行分类:找到潜在的匹配者,或者以其他方式释放潜在的嫌疑人。您不想考虑每个人的独特特征,因为这会花费太多时间。你会用什么来代替?你会使用哈希码!哈希码可以告诉您两个人是否不同。乔·布洛格斯是否不是残酷先生。如果指纹不匹配,那么您就知道这绝对不是残酷先生。但是,如果指纹匹配,那么根据您使用的哈希函数,您很可能已经找到了您的男人。但这不是100%。您可以确定的唯一方法是进一步调查:(i) 他/她是否有机会/动机,(ii) 证人等。

当您使用计算机时,如果两个对象具有相同的内容哈希码值,那么您再次需要进一步调查它们是否真正相等。例如,您必须检查对象是否具有相同的高度、相同的重量等,整数是否相同,或者 customer_id 是否匹配,然后得出它们是否相同的结论。这通常可以通过实现 IComparer 或 IEquality 接口来完成。

关键摘要

所以基本上哈希码就是指纹。

  1. 两个不同的人/物体理论上仍然可以具有相同的
    指纹。或者换句话说,如果您有两个相同的指纹......那么它们不必都来自同一个人/物体。
  2. 但是,同一个人/物体将始终返回
    相同的指纹
  3. 这意味着如果两个对象返回不同哈希码,那么您可以 100% 确定这些对象是不同的。

脚注:

Simple Expalantion via Analogy

After learning what it is all about (MSDN documentation was a little too complex for me) I thought to simplify it via a "story" to (hopefully) make it easier to understand.

Summary: What is a hashcode?

Digital Fingerprint - Picture attribute to Pixabay - Freely available for use at: https://pixabay.com/en/finger-fingerprint-security-digital-2081169/

  • It's a fingerprint.

  • What's it useful for? We can use this finger print to identify people of interest.

You can think of a Hashcode as us trying to To Uniquely Identify Someone

I am a detective, on the look out for a criminal. Let us call him Mr Cruel. (He was a notorious kidnapper when I was a kid -- he broke into a house, kidnapped, and murdered a poor girl, then dumped her body. He's still out on the loose - but that's a separate matter. Mr Cruel has certain peculiar characteristics that I can use to uniquely identify him amongst a sea of people. We have 25 million people in Australia. One of them is Mr Cruel. How can we find him?

Bad ways of Identifying Mr Cruel

Apparently Mr Cruel has blue eyes. That's not much help because almost half the population in Australia also has blue eyes.

Good ways of Identifying Mr Cruel

What else can i use? I know: I will use a fingerprint!

Advantages:

  • It is really really hard for two people to have the same finger print (not impossible, but extremely unlikely).
  • Mr Cruel's fingerprint will never change.
  • Every single part of Mr Cruel's entire being: his looks, hair colour, personality, eating habits etc must (ideally) be reflected in his fingerprint, such that if he has a brother (who is very similar but not the same) - then both should have different finger prints. I say "should" because we cannot guarantee 100% that two people in this world will have different fingerprints.
  • But we can always guarantee that Mr Cruel will always have the same finger print - and that his fingerprint will NEVER change.

Hashcodes and Fingerprints

The above characteristics generally make for good hash functions: for a given input, we want a unique output - the same output every time; if we change the input a tiny bit, then we ought to get a completely different output. This output, is the 'hashcode'.

hashFunction(string input) { // etc. }

hashFunction("1234") => "ABCD" output
hashFunction("1235") => "KDSL" output //completely different, even though the input changed only the last digit

So then what's a 'Collision'?

So imagine if I get a lead and I find someone matching Mr Cruel's fingerprints. Does this mean I have found Mr Cruel?

........perhaps! I must take a closer look. If i am using SHA256 (a hashing function) and I am looking in a small town with only 5 people - then there is a very good chance I found him! But if I am using MD5 (another famous hashing function) and checking for fingerprints in a town with +2^1000 people, then it is a fairly good possibility that two entirely different people might have the same fingerprint.

So what is the benefit of all this anyways?

The only real benefit of hashcodes is if you want to put something in a hash table - and with hash tables you'd want to find objects quickly - and that's where the hash code comes in. They allow you to find things in hash tables really quickly. It's a hack that massively improves performance, but at a small expense of accuracy.

So let's imagine we have a hash table filled with people - 25 million suspects in Australia. Mr Cruel is somewhere in there..... How can we find him really quickly? We need to sort through them all: to find a potential match, or to otherwise acquit potential suspects. You don't want to consider each person's unique characteristics because that would take too much time. What would you use instead? You'd use a hashcode! A hashcode can tell you if two people are different. Whether Joe Bloggs is NOT Mr Cruel. If the prints don't match then you know it's definitely NOT Mr Cruel. But, if the finger prints do match then depending on the hash function you used, chances are already fairly good you found your man. But it's not 100%. The only way you can be certain is to investigate further: (i) did he/she have an opportunity/motive, (ii) witnesses etc etc.

When you are using computers if two objects have the same hash code value, then you again need to investigate further whether they are truly equal. e.g. You'd have to check whether the objects have e.g. the same height, same weight etc, if the integers are the same, or if the customer_id is a match, and then come to the conclusion whether they are the same. this is typically done perhaps by implementing an IComparer or IEquality interfaces.

Key Summary

So basically a hashcode is a finger print.

  1. Two different people/objects can theoretically still have the same
    fingerprint. Or in other words, if you have two fingerprints that are the same.........then they need not both come from the same person/object.
  2. Buuuuuut, the same person/object will always return the
    same fingerprint.
  3. Which means that if two objects return different hash codes then you know for 100% certainty that those objects are different.

Footnotes:

永不分离 2024-12-11 10:04:37

MSDN 说

哈希码是用于标识对象的数值
在平等测试期间。它还可以用作对象的索引
在一个集合中。

GetHashCode方法适用于散列算法和
数据结构,例如哈希表。

GetHashCode 方法的默认实现没有
保证不同对象的唯一返回值。此外,
.NET Framework 不保证默认实现
GetHashCode方法,并且它返回的值之间会是相同的
.NET Framework 的不同版本。因此,默认的
此方法的实现不得用作唯一对象
用于散列目的的标识符。

GetHashCode 方法可以被派生类型重写。价值
类型必须重写此方法以提供哈希函数
适合该类型并提供有用的分布
哈希表。为了唯一性,哈希码必须基于值
实例字段或属性而不是静态字段或
属性。

用作 Hashtable 对象中的键的对象还必须覆盖
GetHashCode 方法,因为这些对象必须生成自己的哈希值
代码。如果用作键的对象不提供有用的
GetHashCode的实现,可以指定哈希码提供者
当构造 Hashtable 对象时。在 .NET Framework 出现之前
2.0 版本中,哈希码提供程序基于
System.Collections.IHashCodeProvider 接口。从版本开始
2.0,哈希码提供者基于
System.Collections.IEqualityComparer 接口。

基本上,哈希码的存在是为了使哈希表成为可能。
保证两个相等的对象具有相等的哈希码。
两个不相等的对象保证具有不相等的哈希码(这称为冲突)。

MSDN says:

A hash code is a numeric value that is used to identify an object
during equality testing. It can also serve as an index for an object
in a collection.

The GetHashCode method is suitable for use in hashing algorithms and
data structures such as a hash table.

The default implementation of the GetHashCode method does not
guarantee unique return values for different objects. Furthermore, the
.NET Framework does not guarantee the default implementation of the
GetHashCode method, and the value it returns will be the same between
different versions of the .NET Framework. Consequently, the default
implementation of this method must not be used as a unique object
identifier for hashing purposes.

The GetHashCode method can be overridden by a derived type. Value
types must override this method to provide a hash function that is
appropriate for that type and to provide a useful distribution in a
hash table. For uniqueness, the hash code must be based on the value
of an instance field or property instead of a static field or
property.

Objects used as a key in a Hashtable object must also override the
GetHashCode method because those objects must generate their own hash
code. If an object used as a key does not provide a useful
implementation of GetHashCode, you can specify a hash code provider
when the Hashtable object is constructed. Prior to the .NET Framework
version 2.0, the hash code provider was based on the
System.Collections.IHashCodeProvider interface. Starting with version
2.0, the hash code provider is based on the
System.Collections.IEqualityComparer interface.

Basically, hash codes exist to make hashtables possible.
Two equal objects are guaranteed to have equal hashcodes.
Two unequal objects are not guaranteed to have unequal hashcodes (that's called a collision).

无所谓啦 2024-12-11 10:04:37

GetHashCode() 用于帮助支持使用对象作为哈希表的键。 (Java等中也存在类似的事情)。目标是让每个对象返回一个不同的哈希码,但这通常不能绝对保证。尽管两个逻辑上相等的对象返回相同哈希码是必需的

典型的哈希表实现从 hashCode 值开始,采用模数(从而将值限制在一个范围内)并将其用作“桶”数组的索引。

GetHashCode() is used to help support using the object as a key for hash tables. (A similar thing exists in Java etc). The goal is for every object to return a distinct hash code, but this often can't be absolutely guaranteed. It is required though that two logically equal objects return the same hash code.

A typical hash table implementation starts with the hashCode value, takes a modulus (thus constraining the value within a range) and uses it as an index to an array of "buckets".

十级心震 2024-12-11 10:04:37

它并不是 WP7 独有的——它存在于所有 .Net 对象中。它有点像您所描述的那样,但我不建议将其作为应用程序中的唯一标识符,因为它不能保证是唯一的。

Object.GetHashCode 方法

It's not unique to WP7--it's present on all .Net objects. It sort of does what you describe, but I would not recommend it as a unique identifier in your apps, as it is not guaranteed to be unique.

Object.GetHashCode Method

阪姬 2024-12-11 10:04:37

这是来自此处的 msdn 文章:

https ://blogs.msdn.microsoft.com/tomarcher/2006/05/10/are-hash-codes-unique/

“虽然您会听到人们说哈希代码生成给定输入的唯一值,事实是,虽然很难实现,但找到散列到相同值的两个不同数据输入在技术上是可行的但是,真正的决定因素。哈希算法的有效性取决于生成的哈希码的长度和被哈希的数据的复杂性。”

因此,只需使用适合您的数据大小的哈希算法,它就会具有唯一的哈希码。

This is from the msdn article here:

https://blogs.msdn.microsoft.com/tomarcher/2006/05/10/are-hash-codes-unique/

"While you will hear people state that hash codes generate a unique value for a given input, the fact is that, while difficult to accomplish, it is technically feasible to find two different data inputs that hash to the same value. However, the true determining factors regarding the effectiveness of a hash algorithm lie in the length of the generated hash code and the complexity of the data being hashed."

So just use a hash algorithm suitable to your data size and it will have unique hashcodes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文