java中的哈希函数是什么?
我已经查看了 this 维基百科页面,但我仍然不明白。有人可以帮助我愚蠢的头脑理解哈希、哈希表/哈希图和哈希函数的概念吗?一些例子确实会有帮助。
I have check out this Wikipedia page on it, but I still don't understand it. Can someone please help my dim-witted mind to understand the concepts of hashing, hashtable/hashmap, and hash functions? Some examples would really help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
维基百科文章将包含大量技术信息,但哈希的简单视图如下所示。
想象一下,有一个神奇的函数可以为任何对象赋予一个数字。给定相同的对象,它总是返回相同的数字。
现在您可以快速地测试两个对象是否相同:向此函数询问它们的编号并进行比较。如果它们不同,那么它们就不一样。
但如果他们有相同的号码怎么办?两个不同的物体可以有相同的数字吗?
是的,这在大多数情况下都是可能的。例如,假设该函数只能给出 1..10 之间的数字,并且有 100 个不同的对象。那么当然一些不同的对象必须具有相同的编号。这就是所谓的“碰撞”。 “冲突”使我们的快速相等测试不再那么有用,因此我们希望尽可能减少它的发生。一个好的神奇功能就是尽量减少“碰撞”的次数。
那么你还能用这个数字做什么呢?好吧,您可以使用它来索引数组。给定一个对象,您可以将其放置在这个神奇函数中的数字给出的索引处。这个数组本质上就是哈希表;这个神奇的函数就是哈希函数。
The Wikipedia article will have a lot of technical information, but a simplistic view of hashing is something like the following.
Imagine that there's a magical function that can give a number to any object. Given the same object, it always return the same number.
Immediately now you have a quick way to test if two objects are the same: ask this function for their numbers and compare. If they're different, then they're not the same.
But what if they have the same number? Could two different objects have the same number?
Yes, this is possible in most scenario. Let's say that the function can only give numbers between 1..10, for example, and there are 100 different objects. Then of course some different objects must have the same number. This is what is called a "collision". A "collision" makes our quick equality test not as useful, so as much as possible we want to minimize its happening. A good magical function is one that would try to minimize the number of "collision".
So what else can you do with this number? Well, you can use it to index an array. Given an object, you can put it at the index given by the number from this magical function. This array is essentially what a hashtable is; this magical function is a hash function.
哈希函数是一种创建任意大量数据的紧凑表示的方法。在java中,使用hashcode方法,这意味着以某种方式用int(4字节)来描述对象的状态(无论有多大)。并且通常被写得相当快,如下所述。
为了简化哈希表/哈希映射,哈希码充当一种廉价的等于。拿两个 Foo 类型的对象 a 和 b 来计算 a.equals(b) 是否需要 500 毫秒,而计算一个(有效的)哈希码只需要 10 毫秒。因此,如果我们想知道 a.equals(b) 是否等于,而不是首先直接这样做,我们将查看哈希码并询问 a.hashCode() == b.hashCode() 是否。请注意,在我们的示例中,这只需要 20 毫秒。
由于哈希码的 API 定义,我们知道,如果a 的哈希码不等于 b,则 a.equals(b) 永远不会为 true。因此,在上面的测试中,如果我们看到哈希码是如果不相等,那么我们就不需要进行更长的 .equals() 测试,这就是为什么您应该始终一起重写 hashCode 和 equals。
您还可能会看到有关编写“良好”或“分布良好”哈希码的参考资料。这与之前有关 hashcode 和 equals 的语句的反面不成立有关。更具体地说 a.hashCode() == b.hashCode() 并不一定意味着 a.equals(b) 因此,好的哈希码的想法是减少 a.hashCode() = 的可能性= b.hashCode() 当 a.equals(b) 为 false 时。您可能已经看到这被称为哈希函数的冲突。
回到哈希图/表。这些基于键/值对。因此,当您添加或检索值时,您将提供一个键。因此,地图要做的第一件事就是查找密钥,这意味着找到 .equals() 您提供的密钥的内容。但正如我们上面所讨论的,.equals() 可能会非常慢,这意味着通过首先检查哈希码可以大大加快比较速度。因为当哈希码分布良好时,您应该很快知道 x 何时肯定 != y。
现在,除了比较之外,哈希图/表实际上还使用哈希码来组织数据的内部存储,但是我认为这超出了您目前想要理解的范围。
A hash function is a way to create a compact representation of an arbitrarily large amount of data. In java with the hashcode method this means somehow describing the state of your object (no matter how large) in an int (4 bytes). And is usually written to be a fairly fast as explained below.
To simplify in hashtables/hashmaps the hashcode serves as sort of a cheap equals. Take two objects a and b of type Foo lets says to figure out if a.equals(b) takes 500 ms where as calculating a (efficient) hashcode only take 10ms. So if we want to know if a.equals(b) instead of doing that directly first we will look at the hashcodes and ask does a.hashCode() == b.hashCode(). Note that this will take only 20ms in our example.
Because of the API definition of hashcode we know that if the hashcode of a is not equal to b then a.equals(b) should never be true. So in our above test if we see the hashcodes are unequal then we never need to do the longer .equals() test, this is why you should always override hashCode and equals together.
You may also see references about writing "good" or "well distributed" hashcodes. This has to do with the fact that the inverse of the previous statements about hashcode and equals is not true. More specifically a.hashCode() == b.hashCode() does not necessarily imply a.equals(b) So the idea of a good hashcode is you reduce the likelyhood of a.hashCode() == b.hashCode() when a.equals(b) is false. You may have seen this referred to as a collision of a hash function.
Back to hashmaps/tables. These are based on key/value pairs. So when you add or retrieve a value you will supply a key. So the first thing the map has to do is look for the key, which means finding something that .equals() the key you provide. But as we discussed above .equals() can be incredibly slow which means comparisons can be greatly sped up by checking hashcodes first. Since when the hashcodes are well distributed you should know quickly when x is definitely != y.
Now in addition to the comparison hashmaps/tables actually use the hashcodes to organize their internal storage of the data, however I think that is beyond the scope of what you are looking to understand at this point.
哈希函数:哈希函数采用一组字符(称为键)并将其映射为一定长度的值(称为哈希值或散列)。哈希值代表原始字符串,但通常小于原始字符串。
哈希是为了索引和定位数据库中的项目而进行的,因为与较长的字符串相比,更容易找到较短的哈希值。散列法也用于加密。该术语也称为散列算法或消息摘要函数。
HASH MAP:HashMap 是一个集合类,旨在将元素存储为键值对。地图提供了一种根据一件事的价值来查找另一件事的方法。
一个查找表,旨在有效存储非连续键(帐号、零件号等)。 )的字母或数字序列可能有很大的差距。
哈希表:哈希表是使用一种算法创建的,该算法将键存储到包含键值对的哈希桶中。由于不同的键可能会散列到同一个桶中,因此哈希表设计的目标是将键值对均匀分布,每个桶包含尽可能少的键值对。当查找某个项目时,对其键进行哈希处理以找到适当的存储桶,然后比较该存储桶以找到正确的键值对。
HASH FUNCTION:- A hash function takes a group of characters (called a key) and maps it to a value of a certain length (called a hash value or hash). The hash value is representative of the original string of characters, but is normally smaller than the original.
Hashing is done for indexing and locating items in databases because it is easier to find the shorter hash value than the longer string. Hashing is also used in encryption.This term is also known as a hashing algorithm or message digest function.
HASH MAP:- HashMap is a collection class that is designed to store elements as key-value pairs. Maps provide a way of looking up one thing based on the value of another.
A lookup table that is designed to efficiently store non-contiguous keys (account numbers, part numbers, etc.) that may have wide gaps in their alphabetic or numeric sequences.
HASH TABLE:- Hash tables are created with an algorithm that stores the keys into hash buckets, which contain key-value pairs. Since different keys may hash to the same bucket, the goal of hash table design is to spread out the key-value pairs evenly with each bucket containing as few key-value pairs as possible. When an item is looked up, its key is hashed to find the appropriate bucket, and the bucket is then compared to find the right key-value pair.
这本书(以及支持视频讲座)提供了一些关于算法和数据结构的精彩解释。有一些关于哈希函数的讲座(1, 2)。我建议这样做。
(来源:mit.edu)
此外,仅供参考,事实并非如此,正如 polygenelubricants< 所指出的那样/a> 在评论中。hashCode()
,在Object
类返回内存中此特定实例的地址。This book (and supporting video lectures) provide some great explanation of algorithms and data structures. There are some lectures about hash functions (1, 2). I'd recommend that.
(source: mit.edu)
Also, just FYI,Not really true, as pointed by polygenelubricants in comments.hashCode()
, called on an instance ofObject
class returns an address of this particular instance in memory.哈希表基本上是一种在数组中存储任何内容并检索它的速度,几乎与通过索引在数组中查找内容一样快,而不会浪费太多空间。
哈希函数的作用是(在这种情况下)根据对象的内容计算存储对象的数组索引。这意味着,它必须始终为同一对象返回相同的结果,并且应尽可能为不同的对象返回不同的结果。当两个不同的对象具有相同的哈希值时,称为“冲突”,您必须特殊对待这些情况,这会使整个过程变慢。
A hashtable is basically a way to store anything in an array and retrieve it almost as fast as looking up something in an array via an index, without wasting too much space.
The job of a hash function is (in this context) to compute the array index at which an object will be stored, based on the contents of the object. That means, it must always return the same result for the same object, and should return different results for different objects as much as possible. When two different objects have the same hash, it's called a "collision", and you have to treat those cases specially, which makes the whole thing slower.
哈希表中键到索引的映射称为哈希函数。
哈希函数包含两部分
哈希码映射:它将键转换为任意范围的整数。
压缩映射:它将这些整数转换(带来)到哈希表的键范围。
摘自http://coder2design.com/hashing/
The mapping of keys to indices of a hash table is called hash function.
Hash function contains two parts
Hash Code Map : It converts keys to integer of any range.
Compression Map : It converts(brings) these integers to the range of keys hashtable has.
Taken from http://coder2design.com/hashing/
哈希函数:如果将同一对象多次传递给该函数,无论是文本、二进制还是数字,您总是会得到相同的输出。出于哈希表的目的,使用返回整数的哈希函数。
上述功能称为散列。
哈希表:计算机科学中神奇的数据结构,可在恒定时间或 O(1) 内返回搜索结果。它基于上述哈希的概念。
因此,它比链表、二叉搜索树等具有更好的访问时间。
为什么接近 O(1):它在内部使用数组作为其基本结构来存储对象,并且由于数组具有恒定的访问时间,因此哈希表也是如此。
【内部基本】:
因此,它内部使用固定大小的数组,当您插入(Key,Value)对时,它会计算键的哈希值,并使用该哈希值作为索引将(Key,Value)对存储在数组中。接下来,当您使用相同的键搜索对象时,它会再次使用该键的哈希值作为索引来在数组中搜索该键。
现在,两个对象可以具有相同的哈希值,因此,在将这些对象插入哈希表时会发生冲突。解决冲突的方法有两种。您可以参考此链接 有关此主题的足够详细的讨论。
Hash function: If you pass the same object to this function any number of times, be it text, binary or number, you always get the same output. For the hash table purposes an integer returning hash function is used.
Above functionality is calling hashing.
Hash table: Miracle data structure of computer science that returns search result in constant time or O(1). It is based on the above concept of hashing.
So, it has better access time than linkedlist, binary search trees etc.
Why nearly O(1): It uses an array as its base structure internally to store the objects and as arrays have constant access time hence, the Hash table does too.
[Basic internal]:
So, it uses an array of fixed size internally and when you insert an (Key, Value) pair, it calculates key's hash and uses this hash value as index to store the (Key,Value) pair in the array. Next, when you search for the object using the same key, it uses the hash of the key again as index to search for the key in the array.
Now, two objects can have same hash value and hence, while inserting these objects in the hash table there will be collision. There are two ways for collision resolution. You can refer this link for a sufficiently detailed discussion on this topic.
HashCode()
函数返回一个整数值,HashMap
使用它来查找正确的存储桶。HashCode()
function, which returns an integer value, is used byHashMap
to find a correct bucket.