为什么 crypt/blowfish 使用两种不同的盐生成相同的哈希值?
这个问题与 PHP 的 crypt()
。对于这个问题,盐的前 7 个字符不被计算在内,因此盐 '$2a$07$a
' 会被认为长度为 1,因为它只有 1 个盐字符和七个字符的元数据。
当使用长度超过 22 个字符的盐字符串时,生成的哈希值不会发生变化(即截断),而当使用长度小于 21 个字符的盐字符串时,盐将自动填充(使用 '$
' 字符) , 显然);这相当简单。但是,如果给定一个 salt 20 个字符和一个 salt 21 个字符,其中除了 21 长度 salt 的最后一个字符之外,两者相同,则两个哈希字符串将相同。盐长度为 22 个字符,与 21 个长度的盐相同,除了最后一个字符之外,散列将再次不同。
代码示例:
$foo = 'bar';
$salt_xx = '$2a$07$';
$salt_19 = $salt_xx . 'b1b2ee48991281a439d';
$salt_20 = $salt_19 . 'a';
$salt_21 = $salt_20 . '2';
$salt_22 = $salt_21 . 'b';
var_dump(
crypt($foo, $salt_19),
crypt($foo, $salt_20),
crypt($foo, $salt_21),
crypt($foo, $salt_22)
);
将产生:
string(60) "$2a$07$b1b2ee48991281a439d$$.dEUdhUoQXVqUieLTCp0cFVolhFcbuNi"
string(60) "$2a$07$b1b2ee48991281a439da$.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2O4AH0.y/AsOuzMpI.f4sBs8E2hQjPUQq"
这是为什么?
编辑:
一些用户注意到整个字符串存在差异,这是事实。在 salt_20
中,偏移量 (28, 4) 为 da$.
,而在 salt_21
中,偏移量 (28, 4) 为 da2 .
;但是,需要注意的是,生成的字符串包括哈希、盐以及生成盐的指令(即 $2a$07$
);事实上,发生差异的部分仍然是盐。实际哈希值未更改为 UxGYN739wLkV5PGoR1XA4EvNVPjwylG
。
因此,这实际上不是产生的哈希值的差异,而是用于存储哈希值的盐的差异,这正是当前的问题:两个盐生成相同的哈希值。
Rembmer:输出将采用以下格式:
"$2a$##$saltsaltsaltsaltsaltsaHASHhashHASHhashHASHhashHASHhash"
// ^ Hash Starts Here, offset 28,32
运行的迭代次数
其中##是log-base-2,确定算法为编辑2
:在评论中,要求我发布一些附加信息,因为用户无法重现我的输出。执行以下代码:
var_dump(
PHP_VERSION,
PHP_OS,
CRYPT_SALT_LENGTH,
CRYPT_STD_DES,
CRYPT_EXT_DES,
CRYPT_MD5,
CRYPT_BLOWFISH
);
产生以下输出:
string(5) "5.3.0"
string(5) "WINNT"
int(60)
int(1)
int(1)
int(1)
int(1)
希望这有帮助。
This question has to do with PHP's implementation of crypt()
. For this question, the first 7 characters of the salt are not counted, so a salt '$2a$07$a
' would be said to have a length of 1, as it is only 1 character of salt and seven characters of meta-data.
When using salt strings longer than 22 characters, there is no change in the hash generated (i.e., truncation), and when using strings shorter than 21 characters the salt will automatically be padded (with '$
' characters, apparently); this is fairly straightforward. However, if given a salt 20 characters and a salt 21 characters, where the two are identical except for the final character of the 21-length salt, both hashed strings will be identical. A salt 22 characters long, which is identical to the 21 length salt except for the final character, the hash will be different again.
Example In Code:
$foo = 'bar';
$salt_xx = '$2a$07
Will produce:
string(60) "$2a$07$b1b2ee48991281a439d$.dEUdhUoQXVqUieLTCp0cFVolhFcbuNi"
string(60) "$2a$07$b1b2ee48991281a439da$.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2O4AH0.y/AsOuzMpI.f4sBs8E2hQjPUQq"
Why is this?
EDIT:
Some users are noting that there is a difference in the overall string, which is true. In salt_20
, offset (28, 4) is da$.
, while in salt_21
, offset (28, 4) is da2.
; however, it is important to note that the string generated includes the hash, the salt, as well as instructions to generate the salt (i.e. $2a$07$
); the part in which the difference occurs is, in fact, still the salt. The actual hash is unchanged as UxGYN739wLkV5PGoR1XA4EvNVPjwylG
.
Thus, this is in fact not a difference in the hash produced, but a difference in the salt used to store the hash, which is precisely the problem at hand: two salts are generating the same hash.
Rembmer: the output will be in the following format:
"$2a$##$saltsaltsaltsaltsaltsaHASHhashHASHhashHASHhashHASHhash"
// ^ Hash Starts Here, offset 28,32
where ## is the log-base-2 determining the number of iterations the algorithm runs for
Edit 2:
In the comments, it was requested that I post some additional info, as the user could not reproduce my output. Execution of the following code:
var_dump(
PHP_VERSION,
PHP_OS,
CRYPT_SALT_LENGTH,
CRYPT_STD_DES,
CRYPT_EXT_DES,
CRYPT_MD5,
CRYPT_BLOWFISH
);
Produces the following output:
string(5) "5.3.0"
string(5) "WINNT"
int(60)
int(1)
int(1)
int(1)
int(1)
Hope this helps.
;
$salt_19 = $salt_xx . 'b1b2ee48991281a439d';
$salt_20 = $salt_19 . 'a';
$salt_21 = $salt_20 . '2';
$salt_22 = $salt_21 . 'b';
var_dump(
crypt($foo, $salt_19),
crypt($foo, $salt_20),
crypt($foo, $salt_21),
crypt($foo, $salt_22)
);
Will produce:
Why is this?
EDIT:
Some users are noting that there is a difference in the overall string, which is true. In salt_20
, offset (28, 4) is da$.
, while in salt_21
, offset (28, 4) is da2.
; however, it is important to note that the string generated includes the hash, the salt, as well as instructions to generate the salt (i.e. $2a$07$
); the part in which the difference occurs is, in fact, still the salt. The actual hash is unchanged as UxGYN739wLkV5PGoR1XA4EvNVPjwylG
.
Thus, this is in fact not a difference in the hash produced, but a difference in the salt used to store the hash, which is precisely the problem at hand: two salts are generating the same hash.
Rembmer: the output will be in the following format:
where ## is the log-base-2 determining the number of iterations the algorithm runs for
Edit 2:
In the comments, it was requested that I post some additional info, as the user could not reproduce my output. Execution of the following code:
Produces the following output:
Hope this helps.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
经过一些实验,我得出的结论是,这是由于盐的处理方式造成的。 salt 不被视为文字文本,而是被视为 base64 编码的字符串,这样 22 字节的 salt 数据实际上代表 16 字节的字符串 (
floor(22 * 24 / 32) == 16) 的盐。 “陷阱!”然而,这个实现的一个问题是,像 Unix crypt 一样,它使用“非标准”base64 字母表。确切地说,它使用以下字母表:
第 65 个字符“
$
”是填充字符。现在,
crypt()
函数似乎能够获取小于或等于其最大值的任何长度的盐,并通过丢弃任何不符合要求的数据来默默地处理 base64 中的任何不一致之处。又增加一个完整字节。如果您向 crypt 函数传递不属于其 base64 字母表的字符,则 crypt 函数将完全失败,这恰好证实了其操作的理论。取一个假想的盐“
1234
”。这与 Base64 完全一致,因为它表示 24 位数据,即 3 个字节,并且不携带任何需要丢弃的数据。这是Len Mod 4
为零的盐。向该盐附加任何字符,它就会变成 5 个字符的盐,并且 Len Mod 4 现在为 1。但是,这个附加字符仅代表六位数据,因此无法转换为另一个完整的盐。字节,因此被丢弃。因此,对于任何两个盐 A 和 B,其中
crypt() 用来计算哈希值的实际盐实际上是相同的。作为证明,我提供了一些可用于说明这一点的示例 PHP 代码。盐以
半非随机方式不断旋转(基于当前时间到微秒的漩涡哈希的随机片段),以及要散列的数据(这里称为$种子
)只是当前的 Unix-Epoch 时间。这会产生类似于以下的输出
结论?双重。首先,它按预期工作,其次,了解你自己的盐,或者不要自己卷盐。
After some experimentation, I have come to the conclusion that this is due to the way the salt is treated. The salt is not considered to be literal text, but rather to be a base64 encoded string, such that 22 bytes of salt data would actually represent a 16 byte string (
floor(22 * 24 / 32) == 16
) of salt. The "Gotcha!" with this implementation, though, is that, like Unix crypt, it uses a "non-standard" base64 alphabet. To be exact, it uses this alphabet:The 65th character, '
$
', is the padding character.Now, the
crypt()
function appears to be capable of taking a salt of any length less than or equal to its maximum, and silently handling any inconsistencies in the base64 by discarding any data that doesn't make up another full byte. The crypt function will fail completely if you pass it characters in the salt that are not part of its base64 alphabet, which just confirms this theory of its operation.Take an imaginary salt '
1234
'. This is perfectly base64 consistent in that it represents 24 bits of data, so 3 bytes, and does not carry any data that needs to be discarded. This is a salt whoseLen Mod 4
is zero. Append any character to that salt, and it becomes a 5 character salt, andLen Mod 4
is now 1. However, this additional character represents only six bits of data, and therefore cannot be transformed into another full byte, so it is discarded.Thus, for any two salts A and B, where
The actual salt used by
crypt()
to calculate the hash will, in fact, be identical. As proof, I'm including some example PHP code that can be used to show this. The salt constantly rotates in aseminon-random way (based on a randomish segment of the whirlpool hash of the current time to the microsecond), and the data to be hashed (herein called$seed
) is simply the current Unix-Epoch time.And this produces output similar to the following
The conclusion? Twofold. First, it's working as intended, and second, know your own salt or don't roll your own salt.
很好的答案,清晰的解释。但在我看来,要么是实现中存在错误,要么需要对意图进行进一步解释{帖子的评论解释了为什么没有错误}。 当前 php 文档 指出:
这与此处所述和演示的内容一致。不幸的是,文档没有非常有用地描述返回值:
但如 Dereleased 的回复所示,如果输入盐字符串有效,则输出由填充的输入盐组成具有“$”字符的固定长度,并附加 32 个字符的计算哈希值。不幸的是,结果中的盐仅被填充为 21 个 base64 位,而不是 22 个!该回复中的最后三行显示了这一点,其中我们看到 20 位数字有一个“$”,21 位没有“$”,并且当盐中有 22 个 base64 数字时,哈希结果的第一个字符将替换输入盐的第 22 位数字。该函数仍然可用,因为它计算的完整值可以通过
substr(crypt($pw,$salt), 28, 32)
形式提供给调用者,并且调用者已经知道完整的 salt 值因为它将该字符串作为参数传递。但很难理解为什么返回值被设计成只能给你 128 位 salt 值中的 126 位。事实上,很难理解为什么它包含输入盐;但省略2位实在是难以理解。下面是一个小片段,显示第 22 个 base64 数字仅向计算中实际使用的盐贡献了两位(仅生成 4 个不同的哈希值):
相同哈希值的分组还显示实际使用的字母表的映射最有可能是按照这里写的,而不是按照其他回复中显示的顺序。
也许接口是为了某种兼容性而设计的,也可能是因为它已经以这种方式发布,因此无法更改。 {帖子的第一条评论解释了为什么界面是这样的}。但文档当然应该解释正在发生的事情。以防万一该错误有一天可能得到修复,也许通过以下方式获取哈希值是最安全的
: = 1 就代码为何会以这种方式运行而言是有意义的,但它并不能解释为什么以这种方式编写代码是一个好主意。该代码可以并且可以说应该包含来自 Base64 数字的位,这些位在计算哈希时构成部分字节,而不是仅仅丢弃它们。如果代码是这样编写的,那么输出中丢失第 22 位盐的问题似乎也不会出现。 {正如该帖子的评论所解释的那样,即使第 22 位数字被覆盖,覆盖它的哈希数字也将只是四个可能值
[.Oeu]
之一,这些是仅第 22 位数字有效。如果第 22 位数字不是这四个值之一,它将被生成相同哈希值的这四个值之一替换。}根据评论,很明显没有错误,只是令人难以置信的沉默文档:-)由于我不是密码学家,所以我不能以任何权威的方式这么说,但在我看来,21 位盐显然可以产生所有可能的哈希值,而 22 位盐是该算法的一个弱点将哈希值的第一个数字限制为四个值中的一个。
Great answer, and clear explanation. But it seems to me there is either a bug in the implementation or some further explanation of the intent is needed {the comments to the post explain why there is not a bug}. The current php documentation states:
This is consistent with what's been stated and demonstrated here. Unfortunately the documentation doesn't describe the return value very usefully:
But as shown in the reply by Dereleased, if the input salt string is valid, the output consists of the input salt padded out to a fixed length with '$' characters, with the 32-character computed hash value appended to it. Unfortunately, the salt in the result is padded out to only 21 base64 digits, not 22! This is shown by the last three lines in that reply, where we see one '$' for 20 digits, no '$' for 21, and when there are 22 base64 digits in the salt, the first character of the hash result replaces the 22nd digit of the input salt. The function is still usable, because the complete value it computes is available to the caller as
substr(crypt($pw,$salt), 28, 32)
, and the caller already knows the complete salt value because it passed that string as an argument. But it's very difficult to understand why the return value is designed so that it can only give you 126 bits of the 128-bit salt value. In fact, it's hard to understand why it includes the input salt at all; but omitting 2 bits of it is really unfathomable.Here's a little snippet showing that the 22nd base64 digit contributes just two more bits to the salt actually used in the computation (there are only 4 distinct hashes produced):
The grouping of the identical hash values also shows that the mapping of the alphabet actually used is most likely as written here, rather then in the order shown in the other reply.
Perhaps the interface was designed this way for some kind of compatibility, and perhaps because it has already shipped this way it can't be changed. {the first comment to the post explains why the interface is this way}. But certainly the documentation ought to explain what's going on. Just in case the bug might get fixed some day, perhaps it would be safest to obtain the hash value with:
As a final note, while the explanation of why the hash value repeats when the number of base64 digits specified
mod 4 == 1
makes sense in terms of why code might behave that way, it doesn't explain why writing the code that way was a good idea. The code could and arguably should include the bits from a base64 digit that makes up a partial byte when computing the hash, instead of just discarding them. If the code had been written that way, then it seems likely the problem with losing the 22nd digit of the salt in the output would not have appeared, either. {As the comments to the post explain, even though the 22nd digit is overwritten, the digit of the hash that overwrites it will be only one of the four possible values[.Oeu]
, and these are the only significant values for the 22nd digit. If the 22nd digit is not one of those four values, it will be replaced by the one of those four that produces the same hash.}In light of the comments, it seems clear there is no bug, just incredibly taciturn documentation :-) Since I'm not a cryptographer, I can't say this with any authority, but it seems to me that it's a weakness of the algorithm that a 21-digit salt apparently can produce all possible hash values, while a 22-digit salt limits the first digit of the hash to only one of four values.
看起来输出实际上是不同的。 (da$, vs da2) salt_20 和 salt_21 的结果。
Looks like the outputs are actually different. (da$, vs da2) for result of salt_20 and salt_21.
根据我的调查,盐似乎始终是 22 个字符,散列偏移量是 29,而不是 28,使其长度为 31 个字符,而不是 32。我运行了这段代码:
结果是:
这表明返回的盐部分哈希仅存储有效位,因此它可能并不总是与您的输入盐匹配。好处是哈希值在验证时可以不改变地用作盐。因此,您最好只存储 crypt() 返回的完整哈希值,而不是最初使用的输入盐。实际上:
滚动
你自己的盐不是问题,并且如果你存储
crypt()
,那么了解它们(通过这个,我假设你的意思是将它们单独存储到哈希中)是不必要的的输出按原样。From my investigation it seemed that the salt is always 22 characters and the hash offset is 29, not 28, making it 31 characters in length, not 32. I ran this code:
The results were:
This suggests that the salt portion of the returned hash is storing only significant bits so it may not always match your input salt. The benefit is that the hash can be used unaltered as the salt when verifying. Thus, you're better off only storing the complete hash returned by
crypt()
, and never the input salt you use initially. In practical terms:and
Rolling your own salts is not a problem, and knowing them (by this, I assume you meant store them separately to the hash) is not necessary if you're storing
crypt()
's output as-is.