加密哈希函数有哪些要点?
我正在阅读关于MD5哈希值和接受的这个问题答案让我困惑。 据我了解,加密哈希函数的主要属性之一是不可能找到具有相同哈希值的两个不同消息(输入)。
然而,这个问题的共识答案为什么 MD5 哈希值不可逆?是因为无限数量的输入字符串将生成相同的输出。这对我来说似乎完全矛盾。
另外,让我有些困惑的是,算法是公开的,但哈希值仍然是不可逆的。 这是因为哈希函数中总是会丢失数据,因此无法判断哪些数据被丢弃了?
当输入数据大小小于固定输出数据大小时会发生什么(例如,对密码“abc”进行哈希处理)?
编辑:
好吧,让我看看我是否明白了:
- 从哈希推断输入真的非常困难因为有无限数量的输入字符串将生成相同的输出(不可逆属性)。
- 然而,找到甚至生成相同输出的多个输入字符串的单个实例也非常非常困难(抗冲突属性)。
I was reading this question on MD5 hash values and the accepted answer confuses me. One of the main properties, as I understand it, of a cryptopgraphic hash function is that it is infeasible to find two different messages (inputs) with the same hash value.
Yet the consensus answer to the question Why aren't MD5 hash values reversible? is Because an infinite number of input strings will generate the same output. This seems completely contradictory to me.
Also, what perplexes me somewhat is the fact that the algorithms are public, yet the hash values are still irreversible. Is this because there is always data loss in a hash function so there's no way to tell which data was thrown away?
What happens when the input data size is smaller than the fixed output data size (e.g., hashing a password "abc")?
EDIT:
OK, let me see if I have this straight:
- It is really, really hard to infer the input from the hash because there are an infinite amount of input strings that will generate the same output (irreversible property).
- However, finding even a single instance of multiple input strings that generate the same output is also really, really hard (collision resistant property).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
1:哈希的主要目的是将一个非常非常大的空间映射到一个更小但仍然非常大的空间(例如,MD5,它将采用“任何东西”并将其转换为大小为 2^128 的空间 -- big ,但远没有 aleph-0 那么大。)
除了其他功能之外,好的哈希值均匀地填充了目标空间。 坏哈希以块状方式填充空间,为许多常见输入提供相同的哈希。
想象一下愚蠢的哈希函数 sum(),它只是将输入数字的所有数字相加:它成功向下映射,但在低位存在一堆冲突(输入具有相同的输出,例如 3、12 和 21)输出空间的末端和空间的上端几乎是空的。 因此,它对空间的利用非常差,很容易破解等。因此,
即使使用目标空间的良好哈希也会很难找到具有相同输出的两个输入,只是概率:如果MD5 是完美的,两个输入具有相同输出的几率是 2^-128。 这是相当不错的几率:在不求助于更大输出空间的情况下可以做到的最好的结果。 (事实上,MD5 并不完美,这也是它容易受到攻击的原因之一。)
但是,大量输入将映射到任何给定的哈希值仍然是事实,因为输入空间是“无限的”,并且无穷大除以 2^128 仍然得到无穷大。
2:是的,哈希总是会导致数据丢失,除非您的输出空间与输入空间相同或大于输入空间 - 在这种情况下您可能不需要哈希!
3:对于较小的输入,最佳做法是对输入加盐。 实际上,这对于任何加密哈希都是很好的做法,因为否则攻击者可以为您提供特定的输入并尝试找出您正在使用的哈希。 “盐”只是您附加(或前置)到输入中的一组附加信息; 然后对结果进行哈希处理。
编辑:在密码学中,哈希函数抵抗原像攻击也很重要,直观上来说,即使知道许多其他输入/输出对,也很难猜测给定输出的输入。 “sum”函数可能很容易被猜到(但由于它破坏了数据仍然可能不容易逆转)。
1: The primary purpose of a hash is to map a very, very large space to a smaller but still very large space (e.g., MD5, which will take 'anything' and convert it into a space of size 2^128 -- big, but not nearly as big as aleph-0.)
In addition to other features, good hashes fill the destination space homogeneously. Bad hashes fill the space in a clumpy fashion, coming up with the same hash for many common inputs.
Imagine the idiotic hash function sum(), which just adds all the digits of the input number: it succeeds in mapping down, but there are a bunch of collisions (inputs with the same output, like 3 and 12 and 21) at the low end of the output space and the upper end of the space is nearly empty. As a result it makes very poor use of the space, is easy to crack, etc.
So a good hash that makes even use of the destination space will make it difficult to find two inputs with the same output, just by the odds: if MD5 were perfect, the odds that two inputs would have the same output would be 2^-128. That's pretty decent odds: the best you can do without resorting to a larger output space. (In truth MD5 isn't perfect, which is one of the things that makes it vulnerable.)
But it will still be true that a huge number of inputs will map to any given hash, because the input space is 'infinite', and dividing infinity by 2^128 still gives you infinity.
2: Yes, hashes always cause data loss, except in the case where your output space is the same as, or larger than, your input space -- and in that case you probably didn't need to hash!
3: For smaller inputs, best practice is to salt the input. Actually, that's good practice for any cryptographic hashing, because otherwise an attacker can feed you specific inputs and try to figure out which hash you are using. 'Salt' is just a set of additional information that you append (or prepend) to your input; you then hash the result.
edit: In cryptography, it is also important that the hash function is resistant to preimage attacks, intuitively, that is hard to guess the input for a given output even knowing many other input/output pairs. The "sum" function could probably be guessed rather easily (but since it destroys data still might not be easy to reverse).
您可能会感到困惑,因为您引用的问题的答案令人困惑。
加密哈希函数的要求之一是它应该具有抗原像性。 也就是说,如果您知道 MD5(x) 但不知道消息 x,则很难找到任何 x'(等于 x 或不同于 x)使得 MD5(x') = MD5(x)。
原像抗性与可逆性是不同的性质。 如果给定 y = f(x) ,恰好有一个 x 适合(无论这是否容易),则函数是可逆的。 例如定义 f(x) = x mod 10。
那么f是不可逆的。 从 f(x) = 7 你无法确定 x 是 17、27 还是其他。 但 f 不具有原像抗性,因为很容易找到 f(x) = 7 的 x' 值。 x' = 17, 27, 12341237 等都可以。
在进行加密时,您通常需要具有原像抗性(以及其他属性,例如抗碰撞性)的函数,而不仅仅是不可逆的函数。
You may be confused, because the answer to the question you cite is confusing.
One of the requirements for a cryptographic hash function is that it should be preimage resistant. That is, if you know MD5(x) but not the message x, then it is difficult to find any x' (either equal x or different from x) such that MD5(x') = MD5(x).
Being preimage resistant is a different property than being reversible. A function is reversible if given y = f(x) there is exactly one x which fits (whether this is easy or not). For example define f(x) = x mod 10.
Then f is not reversible. From f(x) = 7 you can't determine whether x was 17, 27 or something else. But f is not preimage resistant, since values x' such that f(x) = 7 are easy to find. x' = 17, 27, 12341237 etc all work.
When doing crypto you usually need functions that are preimage resistant (and other properties such as collision resistance), not just something that is not reversible.
这些是哈希函数的一般属性。
但请注意,MD5 不应再使用,因为其中已发现漏洞。 检查“漏洞”部分和详细说明这些攻击的外部链接。 http://en.wikipedia.org/wiki/Md5 您可以通过以下方式进行 MD5 冲突:仅更改消息中的 128 位。
SHA-1 对于简单散列来说是安全的,尽管存在一些攻击会使其在对抗资金雄厚的实体(政府、大公司)方面变得较弱。SHA
-256 是未来几十年针对技术的安全起点。
These are the properties of hash functions in general.
A word of caution though, MD5 shouldn't be used anymore because of vulnerabilities that have been found in it. Check the 'Vulnerabilities' section and external links detailing these attacks. http://en.wikipedia.org/wiki/Md5 You can make an MD5 collision by changing only 128 bits in a message.
SHA-1 is safe for simple hashing although there are some attacks that would make it weaker against well-funded entities (Governments, large corporations)
SHA-256 is a safe starting point against technology for the next couple decades.
对于任何哈希函数都是如此,但这不是加密哈希函数的本质。
对于诸如密码之类的短输入字符串,理论上可以反转加密哈希函数,但在计算上应该是不可行的。 也就是说,您的计算将运行太长时间而无用。
这种不可行的原因是输入在哈希值中是如此彻底地“混合在一起”,以至于不可能用比计算所有输入的哈希值的暴力攻击更少的努力来解开它。
This is true for any hash function, but it is not the essence of a cryptographic hash function.
For short input strings such as passwords it is theoretically possible to reverse a cryptographic hash function, but it ought to be computationally infeasible. I.e. your computation would run too long to be useful.
The reason for this infeasibility is that the input is so thoroughly "mixed together" in the hash value that it becomes impossible to disentangle it with any less effort than the brute force attack of computing the hash value for all inputs
这就是不可能反转哈希函数(获得相同的输入)的原因。
加密哈希函数是抗冲突的,这意味着也很难找到映射到相同输出的另一个输入值(如果您的哈希函数是 mod 2 : 134 mod 2 = 0;现在您无法从结果,但我们仍然可以找到具有相同输出值的数字 2(134 和 2 碰撞))。
当输入小于块大小时,使用填充来适应它块大小。
this is the reason that it isn't possible to reverse the hash function (get the same input).
cryptographic hash functions are collision resistant, that means that it's also hard to find another input value that maps to the same output (if your hash function was mod 2 : 134 mod 2 = 0; now you can't get the 134 back from the result, but we can stil find number 2 with the same output value (134 and 2 collide)).
When the input is smaller than the block size, padding is used to fit it to the block size.
警告:长答案
我认为所有这些答案都缺少加密哈希函数的一个非常重要的属性:不仅不可能计算经过哈希处理的原始消息以获得给定的哈希值,而且不可能计算将散列到给定散列值的任何消息。 这称为原像抵抗。
(所谓“不可能”——我的意思是没有人知道如何在比猜测每一条可能的消息更短的时间内做到这一点,直到你猜出被散列到你的散列中的消息。)
(尽管普遍认为 MD5 不安全, MD5 仍然具有原像抗性。任何不相信我的人都可以向我提供任何哈希到
2aaddf751bff2121cc51dc709e866f19
的东西。碰撞阻力,这完全是另一回事。)现在,如果你不能在加密哈希函数是因为哈希函数会丢弃数据来创建哈希,所以它不能保证原像抵抗:您仍然可以“向后工作”,只需在哈希函数丢弃数据的地方插入随机数据,而您不会来即使有了原始消息,您仍然会得到一条散列到所需散列值的消息。 但你不能。
所以问题就变成了:为什么不呢? (或者,换句话说,如何使函数具有抗原像性?)
答案是加密哈希函数模拟混沌系统。 他们获取你的消息,将其分解成块,混合这些块,让一些块相互交互,混合这些块,然后重复很多次(好吧,一个加密哈希函数可以做到这一点;其他的则有自己的方法)自己的方法)。 由于块彼此交互,块 C 不仅必须与块 D 交互以生成块 A,而且还必须与块 E 交互以生成块 B。现在,当然,您可以找到块 C、D 的值, E 会在你的哈希值中产生区块 A 和 B,但是当你进一步回溯时,突然你需要一个区块 F 与 C 交互来生成 D,并与 E 交互来生成 B,并且没有这样的区块可以在同一时间! 您一定猜到了 C、D 和 E 的错误值。
虽然并非所有加密哈希函数都与上述块交互完全相同,但它们具有相同的想法:如果您尝试“向后工作”,您将最终会出现很多死胡同,并且尝试足够的值来生成原像所需的时间约为数百到数百万年(取决于哈希函数),并不比它的时间好多少只需尝试消息,直到找到有效的消息。
Warning: Long answer
I think all of these answers are missing a very important property of cryptographic hash functions: Not only is it impossible to compute the original message that was hashed to get a given hash, it's impossible to compute any message that would hash to a given hash value. This is called preimage resistance.
(By "impossible" - I mean that no one knows how to do it in less time than it takes to guess every possible message until you guess the one that was hashed into your hash.)
(Despite popular belief in the insecurity of MD5, MD5 is still preimage resistant. Anyone who doesn't believe me is free to give me anything that hashes to
2aaddf751bff2121cc51dc709e866f19
. What MD5 doesn't have is collision resistance, which is something else entirely.)Now, if the only reason you can't "work backwards" in a cryptographic hash function was because the hash function discards data to create the hash, then it would not guarantee preimage resistance: You can still "work backwards", and just insert random data wherever the hash function discards data, and while you wouldn't come up with the original message, you'd still come up with a message that hashes to the desired hash value. But you can't.
So the question becomes: Why not? (Or, in other words, how do you make a function preimage resistant?)
The answer is that cryptographic hash functions simulate chaotic systems. They take your message, break it into blocks, mix those blocks around, have some of the blocks interact with each other, mix those blocks around, and repeat that a lot of times (well, one cryptographic hash function does that; others have their own methods). Since the blocks interact with each other, block C not only has to interact with block D to produce block A, but it has to interact with block E to produce block B. Now, sure, you can find values of blocks C, D, E that would produce the blocks A and B in your hash value, but as you go further back, suddenly you need a block F that interacts with C to make D, and with E to make B, and no such block can do both at the same time! You must have guessed wrong values for C, D, and E.
While not all cryptographic hash functions are exactly as described above with block interaction, they have the same idea: That if you try to "work backwards", you're going to end up with a whole lot of dead ends, and the time it takes for you to try enough values to generate a preimage is on the order of hundreds to millions of years (depending on the hash function), not much better than the time it would take just to try messages until you find one that works.