为什么无法逆转加密哈希?
为什么不能像反转数学函数那样反转算法呢?怎么可能创建一个不可逆的算法呢?
如果你使用彩虹桌,为什么用盐就无法破解它呢?如果您使用暴力生成彩虹表,那么它会发明每个可能的明文值(达到一定长度),最终将包括每个可能的密码的盐和每个可能的盐(盐和密码/文本将只是作为一个单独的文本组合在一起)。
Why can't you just reverse the algorithm like you could reverse a math function? How is it possible to make an algorithm that isn't reversible?
And if you use a rainbow table, what makes using a salt impossible to crack it? If you are making a rainbow table with brute force to generate it, then it invents each plaintext value possible (to a length), which would end up including the salt for each possible password and each possible salt (the salt and password/text would just come together as a single piece of text).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我不认为 md5 会给你完整的结果 - 所以你不能向后工作来找到 md5 编辑的原始内容
I don't think the md5 gives you the whole result - so you can't work backwards to find the original things that was md5-ed
md5是128位,即3.4*10^38组合。
八个字符长度密码的总数:
您必须为密码存储 8 个字节,16对于 md5 值,每个条目总共 24 个字节。
因此,您的彩虹表需要大约 67000G 或 5200000G 的存储空间。
实际上可以破解密码的唯一原因是人们使用明显的密码。
md5 is 128bit, that's 3.4*10^38 combinations.
the total number of eight character length passwords:
You have to store 8 bytes for the password, 16 for the md5 value, that's 24 bytes total per entry.
So you need approx 67000G or 5200000G storage for your rainbow table.
The only reason that it's actually possible to figure out passwords is because people use obvious ones.
MD5 被设计为加密不可逆。在这种情况下,最重要的属性是找到哈希的逆序在计算上是不可行的,但是找到任何数据的哈希很容易。例如,让我们考虑仅对数字进行操作(毕竟二进制文件可以被解释为一个非常长的数字)。
假设我们有数字“7”,我们想要获取它的哈希值。也许我们作为哈希函数尝试的第一件事是“乘以二”。正如我们将看到的,这不是一个非常好的哈希函数,但我们将尝试它来说明一点。在这种情况下,数字的哈希值将为“14”。这很容易计算。但现在,如果我们看看扭转它是多么困难,我们会发现它也同样容易!给定任何哈希值,我们只需将其除以二即可得到原始数字!这不是一个好的哈希,因为哈希的全部要点是计算逆比计算哈希要困难(至少在某些上下文中这是最重要的属性)。
现在,让我们尝试另一个哈希。对于这一点,我将不得不介绍时钟算术的概念。时钟上的数字并不是无限的。事实上,它只是从 0 到 11(记住,时钟上的 0 和 12 是相同的)。因此,如果您将 11“加一”,则只会得到零。您可以将乘法、加法和求幂的思想扩展到时钟。例如,8+7=15,但时钟上的 15 实际上只是 3!所以在时钟上,你会说 8+7=3! 6*6=36,但是在时钟上,36=0!所以6*6=0!现在,对于权力的概念,您可以做同样的事情。 2^4=16,但16只是4。所以2^4=4!现在,以下是它与散列的关系。我们尝试使用时钟算术的哈希函数 f(x)=5^x 怎么样?正如您将看到的,这会产生一些有趣的结果。让我们尝试像以前一样获取 7 的哈希值。
我们看到 5^7=78125,但在时钟上,这只是 5(如果你计算一下,你会发现我们已经绕时钟旋转了 6510 次)。所以我们得到f(7)=5。现在的问题是,如果我告诉你我的号码的哈希值是 5,你能算出我的号码是 7 吗?嗯,在一般情况下,实际上非常很难计算这个函数的逆函数。比我聪明得多的人已经证明,在某些情况下,反转这个函数比向前计算它要困难得多。 (编辑:尼莫指出这实际上还没有被“证明”;事实上,你得到的唯一保证是很多聪明人已经尝试了很长时间来找到一种简单的方法来做到这一点,但没有一个成功。) 逆向此操作的问题称为“离散对数问题"。查找更深入的报道。这至少是一个好的哈希函数的开始。
对于现实世界的哈希函数,其想法基本相同:您会发现一些难以逆转的函数。比我聪明得多的人已经设计了 MD5 和其他哈希值,使它们难以逆转。
现在,也许您更早地想到了:“计算逆数会很容易!我只需对每个数字进行哈希,直到找到匹配的数字!”现在,对于数字都小于十二的情况,这是可行的。但对于现实世界的哈希函数的模拟,想象一下涉及的所有数字都是巨大的。这个想法是,计算这些大数的哈希函数仍然相对容易,但搜索所有可能的输入变得更快更困难。但您偶然发现的仍然是一个非常重要的想法:在输入空间中搜索将给出匹配输出的输入。彩虹表是这个想法的更复杂的变体,它以智能的方式使用输入输出对的预先计算表,以便能够快速搜索大量可能的输入。
现在假设您正在使用哈希函数在计算机上存储密码。这个想法是这样的:计算机只存储正确密码的哈希值。当用户尝试登录时,您将输入密码的哈希值与正确密码的哈希值进行比较。如果它们匹配,您就假设用户拥有正确的密码。这样做的好处是,如果有人窃取了您的计算机,他们仍然无法访问您的密码,只能访问它的哈希值。由于哈希函数是由聪明人设计的,很难逆向操作,因此他们无法轻松地从中检索您的密码。
攻击者最好的选择是暴力攻击,他们会尝试一堆密码。就像您在上一个问题中可能会尝试小于 12 的数字一样,攻击者可能会尝试由长度小于 7 个字符的数字和字母组成的所有密码,或者字典中出现的所有单词。这里重要的是,他无法尝试所有可能的密码,因为可能的 16 字符密码太多,无法永远进行测试。因此,关键是攻击者必须限制他可能测试的密码,否则他永远不会检查其中的一小部分。
现在,对于盐,想法是这样的:如果两个用户具有相同的密码怎么办?他们将具有相同的哈希值。如果您考虑一下,攻击者实际上不必单独破解每个用户的密码。他只是简单地检查每个可能的输入密码,并将哈希值与所有哈希值进行比较。如果它与其中之一匹配,那么他就找到了一个新密码。我们真正想要强迫他做的是为他想要检查的每个用户+密码组合计算一个新的哈希值。这就是盐的想法,就是让每个用户的哈希函数略有不同,这样他就不能为所有用户重用一组预先计算的值。最直接的方法是在获取哈希值之前在每个用户的密码中添加一些随机字符串,其中每个用户的随机字符串都不同。因此,例如,如果我的密码是“shittypassword”,我的哈希可能会显示为 MD5(“6n93nshittypassword”),如果您的密码是“shittypassword”,那么您的哈希可能会显示为 MD5(“fa9elshittypassword”)。这一点“fa9el”被称为“盐”,对于每个用户来说都是不同的。例如,我的盐是“6n93n”。现在,附加到您的密码上的这一点也存储在您的计算机上。当您尝试使用密码X登录时,计算机可以计算MD5(“fa9el”+X)并查看它是否与存储的哈希值匹配。
因此,登录的基本机制保持不变,但对于攻击者来说,他们现在面临着更艰巨的挑战:他们面临的不是 MD5 哈希值列表,而是 MD5 和和盐的列表。他们本质上有两个选择:
他们可以忽略哈希值被加盐的事实,并尝试按原样使用其查找表破解密码。然而,他们实际破解密码的可能性大大降低了。例如,即使“shittypassword”位于要检查的输入列表中,但“fa9elshittypassword”很可能不在其中。为了获得破解之前密码的概率的一小部分,他们需要测试更多数量级的可能密码。
他们可以根据每个用户重新计算哈希值。因此,对于每个用户 X,他们不是计算 MD5(密码猜测),而是计算 MD5(Salt_of_user_X + 密码猜测)。这不仅迫使他们为每个想要破解的用户计算一个新的哈希值,而且最重要的是,它阻止他们使用预先计算的表(例如彩虹表),因为他们不知道什么Salt_of_user_X 是事先准备好的,因此他们无法预先计算要测试的哈希值。
所以基本上,如果他们尝试使用预先计算的表,使用盐有效会大大增加他们为了破解密码而必须测试的可能输入,即使他们没有使用预先计算的表,它仍然会使速度减慢 N 倍,其中 N 是您存储的密码数量。
希望这能回答您所有的问题。
MD5 is designed to be cryptographically irreversible. In this case, the most important property is that it is computationally unfeasible to find the reverse of a hash, but it is easy to find the hash of any data. For example, let's think about just operating on numbers (binary files after all, could be interpreted as just a very long number).
Let's say we have the number "7", and we want to take the hash of it. Perhaps the first thing we try as our hash function is "multiply by two". As we'll see, this is not a very good hash function, but we'll try it, to illustrate a point. In this case, the hash of the number will be "14". That was pretty easy to calculate. But now, if we look at how hard it is to reverse it, we find that it is also just as easy! Given any hash, we can just divide it by two to get the original number! This is not a good hash, because the whole point of a hash is that it is much harder to calculate the inverse than it is to calculate the hash (this is the most important property in at least some contexts).
Now, let's try another hash. For this one, I'm going to have to introduce the idea of clock arithmetic. On a clock, there aren't an infinite amount of number. In fact, it just goes from 0 to 11 (remember, 0 and 12 are the same on a clock). So if you "add one" to 11, you just get zero. You can extend the ideas of multiplication, addition, and exponentiation to a clock. For example, 8+7=15, but 15 on a clock is really just 3! So on a clock, you would say 8+7=3! 6*6=36, but on a clock, 36=0! so 6*6=0! Now, for the concept of powers, you can do the same thing. 2^4=16, but 16 is just 4. So 2^4=4! Now, here's how it ties into hashing. How about we try the hash function f(x)=5^x, but with clock arithmetic. As you'll see, this leads to some interesting results. Let's try taking the hash of 7 as before.
We see that 5^7=78125 but on a clock, that's just 5 (if you do the math, you see that we've wrapped around the clock 6510 times). So we get f(7)=5. Now, the question is, if I told you that the hash of my number was 5, would you be able to figure out that my number was 7? Well, it's actually very hard to calculate the reverse of this function in the general case. People much smarter than me have proved that in certain cases, reversing this function is way harder than calculating it forward. (EDIT: Nemo has pointed out that this in fact has not been "proven"; in fact, the only guarantee you get is that a lot of smart people have tried a long time to find an easy way to do so, and none of them have succeeded.) The problem of reversing this operation is called the "Discrete Logarithm Problem". Look it up for more in depth coverage. This is at least the beginning of a good hash function.
With real world hash functions, the idea is basically the same: You find some function that is hard to reverse. People much smarter than me have engineered MD5 and other hashes to make them provably hard to reverse.
Now, perhaps earlier the thought has occurred to you: "it would be easy to calculate the inverse! I'd just take the hash of every number until I found the one that matched!" Now, for the case where the numbers are all less than twelve, this would be feasible. But for the analog of a real-world hash function, imagine all the numbers involved are huge. The idea is that it is still relatively easy to calculate the hash function for these large numbers, but to search through all possible inputs becomes harder much quicker. But what you've stumbled upon is the still a very important idea though: searching through the input space for an input which will give a matching output. Rainbow tables are a more complex variation on the idea, which use precomputed tables of input-output pairs in smart ways in order to make it possible to quickly search through a large number of possible inputs.
Now let's say that you are using a hash function to store passwords on your computer. The idea is this: The computer just stores the hash of the correct password. When a user tries to login, you compare the hash of the input password to the hash of the correct password. If they match, you assume the user has the correct password. The reason this is advantageous is because if someone steals your computer, they still don't have access to your password, just the hash of it. Because the hash function was designed by smart people to be hard to take the reverse of, they can't easily retrieve your password from it.
An attacker's best bet is a bruteforce attack, where they try a bunch of passwords. Just like you might try the numbers less that 12 in the previous problem, an attacker might try all the passwords just composed of numbers and letters less than 7 characters long, or all words which show up in the dictionary. The important thing here is that he can't try all possible passwords, because there are way too many possible 16 character passwords, for example, to ever test. So the point is that an attacker has to restrict the possible passwords he tests, otherwise he will never even check a small percentage of them.
Now, as for a salt, the idea is this: What if two users had the same password? They would have the same hash. If you think about it, the attacker doesn't really have to crack every users password individually. He simply goes through every possible input password, and compares the hash to all the hashes. If it matches one of them, then he has found a new password. What we'd really like to force him to do is calculate a new hash for every user+password combination he wants to check. That's the idea of a salt, is that you make the hash function be slightly different for every user, so he can't reuse a single set of precomputed values for all users. The most straightforward way to do this is to tack on some random string to each user's password before you take the hash, where the random string is different for each user. So, for example, if my password is "shittypassword", my hash might show up as MD5("6n93nshittypassword") and if your password is "shittypassword", your hash might show up as MD5("fa9elshittypassword"). This little bit "fa9el" is called the "salt", and it's different for every user. For example, my salt is "6n93n". Now, this little bit which is tacked on to your password is just stored on your computer as well. When you try to login with the password X, the computer can just calculate MD5("fa9el"+X) and see if it matches the stored hash.
So the basic mechanics of logging in remain unchanged, but for an attacker, they are now faced with a more daunting challenge: rather than a list of MD5 hashes, they are faced with a list of MD5 sums and salts. They essentially have two options:
They can ignore the fact that the hashes are salted, and try to crack the passwords with their lookup table as is. However, the chances that they'll actually crack a password are much reduced. For example, even if "shittypassword" is on their list of inputs to check, most likely "fa9elshittypassword" isn't. In order to get even a small percentage of the probability of cracking a password that they had before, they'll need to test orders of magnitude more possible passwords.
They can recalculate the hashes on a per-user basis. So rather than calculating MD5(passwordguess), for each user X, they calculate MD5( Salt_of_user_X + passwordguess). Not only does this force them to calculate a new hash for each user they want to crack, but also most importantly, it prevents them from being able to use precalculated tables (like rainbow table, for example), because they can't know what Salt_of_user_X is before hand, so they can't precalculate the hashes to test.
So basically, if they are trying to use precalculated tables, using a salt effectively greatly increases the possible inputs they have to test in order to crack the password, and even if they aren't using precalculated tables, it still slows them down by a factor of N, where N is the number of passwords you are storing.
Hopefully this answers all your questions.
想出 1 到 9999 之间的 2 个数字。将它们相加。现在告诉我最后的数字。
我无法从这些信息中推断出您最初想到的是哪些数字。这是单向哈希的一个非常简单的示例。
现在,我可以想到给出相同结果的两个数字,这就是这个简单示例与 MD5 或 SHA1 等“正确”加密哈希的不同之处。使用这些算法,在计算上应该很难得出产生特定哈希值的输入。
Think of 2 numbers from 1 to 9999. Add them. Now tell me the final digit.
I can't, from that information, deduce which numbers you originally thought of. That is a very simple example of a one-way hash.
Now, I can think of two numbers which give the same result, and this is where this simple example differs from a 'proper' cryptographic hash like MD5 or SHA1. With those algorithms, it should be computationally difficult to come up with an input which produces a specific hash.
无法反转哈希函数的一大原因是数据丢失。
考虑一个简单的示例函数:“OR”。如果将其应用到输入数据 1 和 0,则会产生 1。但是现在,如果您知道答案是“1”,如何返回原始数据呢?你不能。它可能是 1,1,也可能是 0,1,或者可能是 1,0。
至于盐渍和彩虹表。是的,理论上,您可以拥有一个包含所有可能的盐和密码的彩虹表,但实际上,这太大了。如果您尝试了小写字母、大写字母、数字和 12 个标点符号的所有可能组合(最多 50 个字符),则有 (26+26+10+12)^50 = 2.9 x 10^93 种不同的可能性。这比可见宇宙中的原子数量还要多。
彩虹表背后的想法是提前计算一堆可能的密码的哈希值,并且密码比 50 个字符短得多,因此可以这样做。这就是为什么要在前面添加盐:如果您在密码前面添加“57sjflk43380h4ljs9flj4ay”。虽然有人可能已经计算出“pa55w0rd”的哈希值,但没有人已经计算出“57sjflk43380h4ljs9flj4aypa55w0rd”的哈希值。
One big reason you can't reverse the hash function is because data is lost.
Consider a simple example function: 'OR'. If you apply that to your input data of 1 and 0, it yields 1. But now, if you know the answer is '1', how do you back out the original data? You can't. It could have been 1,1 or maybe 0,1, or maybe 1,0.
As for salting and rainbow tables. Yes, theoretically, you could have a rainbow table which would encompass all possible salts and passwords, but practically, that's just too big. If you tried every possible combination of lower case letters, upper case, numbers, and twelve punctuation symbols, up to 50 characters long, that's (26+26+10+12)^50 = 2.9 x 10^93 different possibilities. That's more than the number of atoms in the visible universe.
The idea behind rainbow tables is to calculate the hash for a bunch of possible passwords in advance, and passwords are much shorter than 50 characters, so it's possible to do so. That's why you want to add a salt in front: if you add on '57sjflk43380h4ljs9flj4ay' to the front of the password. While someone may have already computed the hash for "pa55w0rd", no one will have already calculated the has for '57sjflk43380h4ljs9flj4aypa55w0rd'.