CRC32 冲突
我试图找到两条消息之间的冲突,从而导致相同的 CRC 哈希值。 考虑到我正在使用 CRC32,有什么方法可以缩短在进行暴力攻击时必须尝试的可能消息列表吗?
任何带有这方面提示的网站链接都会有所帮助。我已经有一个强力算法可以做到这一点,但它只是增加整数并查看它是否与其他哈希匹配。
I am trying to find a collision between two messages that will lead to the same CRC hash.
Considering I am using CRC32, is there any way I can shorten the list of possible messages I have to try when doing a brute force attack?
Any links to websites with hints on this will be helpful. I already have a brute force algorithm that will do this but it simply increment integers and sees if it will match other hashes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
这完全取决于你所说的“消息”是什么意思。如果您可以将四个字节的乱码附加到其中一条消息中。 (即,四个字节在消息的上下文中没有任何意义。)然后,从这个词的真正意义上来说,它就变得微不足道了。
从 CRC32 状态机中移动的位角度进行思考。
CRC32 基于伽罗瓦反馈移位寄存器,其状态中的每一位都将被从有效负载数据中归纳出的 32 位所替换。在引入每一位时,多项式指示的位置将与从移位寄存器末尾观察到的序列进行异或运算。在移位寄存器被填满之前,该序列不受输入数据的影响。
举个例子,假设我们有一个移位寄存器,其中填充了初始状态 10101110、多项式 10000011,并填充了未知位 X。
直到 SR 被填充后,反馈才以 X 表示!
因此,为了生成具有预定校验和的消息,您需要获取新消息,生成它的 CRC 并计算出接下来的 32 位反馈。您可以通过 CRC 函数的 32 个步骤来完成此操作。然后,您需要计算该反馈对移位寄存器内容的影响。
执行此操作的快捷方式是用四个零字节填充消息,然后查看校验和。 (校验和是 SR 末尾的状态,如果用四个零字节填充,则表示反馈和空字节的影响。)
将该影响与所需的校验和值进行异或,将四字节尾部替换为计算出的值值并重新生成校验和。您可以使用任何生成 CRC32 的程序、十六进制编辑器和可以处理十六进制的计算器来完成此操作。
如果您想生成两条完全有意义且不包含尾随垃圾的消息,事情会变得有点困难。确定一些可以编写合理替代方案的部分,并且长度完全相同。
以英语散文为例。
“我认为这可行”
和
“我相信这种方法”
具有大致相似的含义,并且长度完全相同。
识别消息中足够的示例是一个棘手的问题(除非您想用空格作弊!)只要数据在消息中具有正确的偏移量,CRC 32 就是线性的。所以 CRC([messagea][padding])^CRC([padding][messageb])=CRC([messagea][messageb])
您需要处理一些有关单词对齐的警告,作为一般提示,您希望将段落扩展到消息的“固定”部分。作为一般规则,您希望有 n*1.5 段的替代方案,其中 n 是 CRC 的大小。
现在,您可以计算骨架消息的 CRC、每个替代段落对其的印象,然后绘制一个表格,比较每个段落的每个替代选项的影响。然后,您需要选择修改骨架 CRC 以匹配您想要的 CRC 的替代方案。这个问题实际上解决起来很有趣,首先找到任何唯一修改一点的替代方案,如果您的 CRC 需要更改该位,请选择该替代方案并将其影响折叠到 CRC 中,然后再次循环。这应该会减少您需要搜索的解决方案空间。
编写代码是一件相当困难的事情,但它会在很短的时间内产生碰撞。
It depends entirely on what you mean by "message". If you can append four bytes of gibberish to one of the messages. (I.E. four bytes that have no meaning within the context of the message.) Then it becomes trivial in the truest sense of the word.
Thinking in terms of bits moving through the CRC32 state machine.
CRC32 is based on a galois feedback shift register, each bit in its state will be replaced with the induction of 32 bits from the payload data. At the induction of each bit, the positions indicated by the polynomial will be exclusive ored with the sequence observed from the end of the Shift register. This sequence is not influenced by the input data until the shift register has been filled.
As an example, imagine we have a shift register filled with initial state 10101110, polynomial 10000011, and filling with unknown bits, X.
The feedback isn't in terms of X until the SR has been filled!
So in order to generate a message with a predetermined checksum, you take your new message, generate it's CRC and work out it's next 32 bits of feedback. This you can do in 32 steps of the CRC function. You then need to calculate the effect this feedback has on the contents of the shift register.
A shortcut for doing this is to pad your message with four zero bytes and then look at the checksum. (Checksum is the state of the SR at the end, which if padded with four zero bytes is the influence of the feedback and the empty bytes.)
Exclusive OR that influence with the checksum value you want, replace the four byte trailer with that computed value and regenerate the checksum. You could do this with any program that generates CRC32, a hex editor, and a calculator that can handle hex.
If you want to generate two messages that both make complete sense and don't contain trailing garbage, things get a little harder. Identify a number of sections that you can write plausible alternatives, with exactly the same length.
Using english prose as an example.
"I think that this can work"
and
"I believe in this approach"
Have broadly similar meanings, and exactly the same length.
Identifying enough examples in your message is the tricky bit (Unless you want to cheat with whitespace!) CRC 32 is linear, provided the data has the correct offset within the message. So CRC([messagea][padding])^CRC([padding][messageb])=CRC([messagea][messageb])
There are some caveats with word alignment that you'll need to cope with, as a general hint, you want to extend the passages out into the "fixed" parts of the message. As a general rule you want to have alternatives for n*1.5 passages, where n is the size of the CRC.
You can now calculate the CRC that the skeletal message has, the impression that each alternative passage would have on it, and then draw up a table comparing the influence that each alternative for each passage would have. You then need to select alternatives that will modify the skeletal CRC to match the CRC you want. That problem is actually quite fun to solve, First off find any alternatives that uniquely modify a bit, if that bit needs to change for your CRC, select that alternative and fold it's influence into the CRC, then go round again. That should reduce the solution space that you then need to search.
That's quite a tough thing to code up, but it would generate your collisions in a very short time span.
我的微积分没有缺陷,在 N 次试验后未发现一次碰撞的概率大致如下表所示:
换句话说,必须计算超过 200,000 个 CRC32 值的概率 在找到重复项之前小于1%,或者在102,000次尝试之前找到重复项的概率为70.2%
顺便说一句,这是值得注意的,因为在第 200,000 次尝试中发现一次碰撞的概率仍然约为 1% 的 1/1000 ((4M - 200,0000) / 4M) ,但是在第 200,000 次尝试之前发现一次碰撞的可能性是准确定性的(嗯,无论如何,超过 99%)。 这表明保留迄今为止计算的 CRC 数据库的兴趣。
我们当然可以花一些时间研究 CRC32 算法及其基础数学,试图找到更有可能产生 CRC32 冲突的消息,但至少需要相对少量的真正随机尝试才能找到与准确定性的一次碰撞使得这种密码分析方法几乎不值得付出努力。例如,假设我们可以找到一种方法来选择相互冲突的可能性高出 10 倍的消息,那么我们仍然需要尝试 63,000 次,才能达到 99% 的至少发生一次冲突的可能性(优于 200,000,但仍需要大致相同类型的应用程序。)
在这方面,我们可能要考虑的唯一一件事是避免长度短于 4 个字节的消息(我在某处读到 CRC32 在该消息空间中是双射的),并且避免太相似的消息(即仅相差一两个字符),因为 CRC32 的最初目的是检测(并可能自动更正)消息中如此小的差异。
因此,任务的难度似乎并不在于找到以极快的速度计算 CRC32 的方法(尽管我们也不应该太慢),而是管理一个可快速搜索的数据库最多 200,000 条消息(或消息“密钥”,更多信息见下文)及其关联的 CRC32 值。
实现这一切的一些想法
Short of a flaw with my calculus, the probability of not having found one collision after N trials is approximated in the following table:
In other words, the probability of having to compute more than 200,000 CRC32 values before finding a duplicate is less than 1%, or, the probability of finding a duplicate before 102,000 attempts is 70.2%
BTW this is remarkable because the probability of finding one collision on, say, the very 200,000th attempt is still in the order of 1/1000th of 1% ((4M - 200,0000) / 4M), but the probably to have found one collision before the 200,000th attempt is a quasi certainty (well, above 99% anyway). This shows the interest of keeping a database of CRCs computed so far.
We could certainly spend some time studying the CRC32 algorithm and its underlying mathematics, in an attempt to find messages more likely to produce a CRC32 collisions, but the relatively small number of truly random attempts required to find at least one collision with quasi certainty, makes this kind of cryptanalysis approach hardly worth the effort. For example assuming that we could discover a way to select messages that are 10 times more likely to collide with one another, we'd still have to try in the order of 63,000 times before reaching the 99% chances of having at least one collision (better than 200,000 but, still requiring roughly the same type of application.)
The only thing we may want to consider, in this area, is to avoid messages shorter than 4 bytes in length (I read somewhere that CRC32 was bijective in this message space), and to avoid messages that are too similar (i.e. only differing by one or two characters), since after CRC32's original purpose is to detect (and possibly auto-correct) such small differences in messages.
Therefore, it appears that the difficulty of the assignment is not so much that of finding ways of computing CRC32s at sizzling speed (though we shouldn't be too slow with this either), but rather to manage a quickly-searcheable database of up to 200,000 Messages (or message "key", more on this below) and their associated CRC32 value.
A few ideas to implement all this
就在昨天,这个问题在这里,其中提到的一些指示可能会对您有所帮助。
Just yesterday there was this question here on SO, a couple of the pointers mentioned there may help you.
对于大小为 N 的哈希,暴力破解需要大约 sqrt(6N) 条随机长度消息才能获得 95% 的碰撞概率。例如 CRC32 , N = 2^32 ,您需要大约 160 000 条消息
Brute force you need about sqrt(6N) random length messages for a hash of size N to get a 95% probability for collision. E.g. CRC32 , N = 2^32 , you need about 160 000 messages
spoof 正是这样做的。它不需要蛮力。
spoof does exactly that. It does not require brute force.
我假设你的意思是“消息”而不是“密钥”。
如果允许您选择两个“密钥”,那么由于生日悖论,暴力破解会相当快。选择随机消息,计算它们的 CRC,记住所有消息以及相关的 CRC,随着它们的累积,每条新消息都有越来越多的机会与现有消息发生冲突。坦率地说,我希望这种方法在现代计算机上比查找已知的使 CRC32 冲突的方法更快。
I will assume that you mean "message" instead of "key".
If you are allowed to choose both "keys", then brute-force would be rather quick anyway because of the birthday paradox. Pick random messages, compute their CRC, remember all of them and the associated CRC, and each new one has more and more chances to collide with an existing one as they accumulate. Frankly, I expect this approach to be faster on a modern computer than looking up known approaches to make CRC32 collide.
我相信 CRC 是线性的,因此如果您修改(就地,不改变长度)文件的两个不同部分,
CRC 中的差异应该异或在一起。-- 更正:事情似乎并不那么简单。然而,这仍然是我在尝试构建碰撞时会采取的策略——你需要比我今晚倾向于做的更详细地遵循数学......
I believe CRC's are linear, so if you modify (in-place, without changing the length) two different parts of your file,
the differences in the CRC should be xor'ed together.-- correction: it does not seem to be quite that simple. However, this is still the kind of tack I would take in trying to construct a collision -- you need to follow the math in more detail than I am inclined to do tonight...