当前位置：文江博客话题详情

CRC32 冲突

发布于 2024-08-06 12:20:09 字数 154 浏览 16 评论 0原文

我试图找到两条消息之间的冲突，从而导致相同的 CRC 哈希值。考虑到我正在使用 CRC32，有什么方法可以缩短在进行暴力攻击时必须尝试的可能消息列表吗？

任何带有这方面提示的网站链接都会有所帮助。我已经有一个强力算法可以做到这一点，但它只是增加整数并查看它是否与其他哈希匹配。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸠魁 2024-08-13 12:20:09

这完全取决于你所说的“消息”是什么意思。如果您可以将四个字节的乱码附加到其中一条消息中。（即，四个字节在消息的上下文中没有任何意义。）然后，从这个词的真正意义上来说，它就变得微不足道了。

从 CRC32 状态机中移动的位角度进行思考。

CRC32 基于伽罗瓦反馈移位寄存器，其状态中的每一位都将被从有效负载数据中归纳出的 32 位所替换。在引入每一位时，多项式指示的位置将与从移位寄存器末尾观察到的序列进行异或运算。在移位寄存器被填满之前，该序列不受输入数据的影响。

举个例子，假设我们有一个移位寄存器，其中填充了初始状态 10101110、多项式 10000011，并填充了未知位 X。

Polynomial *     **  |feedback (End of SR.)
State      10101110     0
State      X1010111     1
State      XX101000     0
State      XXX10100     0
State      XXXX1010     0
State      XXXXX101     1
State      XXXXXX01     1
State      XXXXXXX1     1
State      XXXXXXXX     0

直到 SR 被填充后，反馈才以 X 表示！
因此，为了生成具有预定校验和的消息，您需要获取新消息，生成它的 CRC 并计算出接下来的 32 位反馈。您可以通过 CRC 函数的 32 个步骤来完成此操作。然后，您需要计算该反馈对移位寄存器内容的影响。

执行此操作的快捷方式是用四个零字节填充消息，然后查看校验和。（校验和是 SR 末尾的状态，如果用四个零字节填充，则表示反馈和空字节的影响。）

将该影响与所需的校验和值进行异或，将四字节尾部替换为计算出的值值并重新生成校验和。您可以使用任何生成 CRC32 的程序、十六进制编辑器和可以处理十六进制的计算器来完成此操作。

如果您想生成两条完全有意义且不包含尾随垃圾的消息，事情会变得有点困难。确定一些可以编写合理替代方案的部分，并且长度完全相同。

以英语散文为例。
“我认为这可行”
和
“我相信这种方法”
具有大致相似的含义，并且长度完全相同。

识别消息中足够的示例是一个棘手的问题（除非您想用空格作弊！）只要数据在消息中具有正确的偏移量，CRC 32 就是线性的。所以 CRC([messagea][padding])^CRC([padding][messageb])=CRC([messagea][messageb])
您需要处理一些有关单词对齐的警告，作为一般提示，您希望将段落扩展到消息的“固定”部分。作为一般规则，您希望有 n*1.5 段的替代方案，其中 n 是 CRC 的大小。

现在，您可以计算骨架消息的 CRC、每个替代段落对其的印象，然后绘制一个表格，比较每个段落的每个替代选项的影响。然后，您需要选择修改骨架 CRC 以匹配您想要的 CRC 的替代方案。这个问题实际上解决起来很有趣，首先找到任何唯一修改一点的替代方案，如果您的 CRC 需要更改该位，请选择该替代方案并将其影响折叠到 CRC 中，然后再次循环。这应该会减少您需要搜索的解决方案空间。

编写代码是一件相当困难的事情，但它会在很短的时间内产生碰撞。

It depends entirely on what you mean by "message". If you can append four bytes of gibberish to one of the messages. (I.E. four bytes that have no meaning within the context of the message.) Then it becomes trivial in the truest sense of the word.

Thinking in terms of bits moving through the CRC32 state machine.

CRC32 is based on a galois feedback shift register, each bit in its state will be replaced with the induction of 32 bits from the payload data. At the induction of each bit, the positions indicated by the polynomial will be exclusive ored with the sequence observed from the end of the Shift register. This sequence is not influenced by the input data until the shift register has been filled.

As an example, imagine we have a shift register filled with initial state 10101110, polynomial 10000011, and filling with unknown bits, X.

Polynomial *     **  |feedback (End of SR.)
State      10101110     0
State      X1010111     1
State      XX101000     0
State      XXX10100     0
State      XXXX1010     0
State      XXXXX101     1
State      XXXXXX01     1
State      XXXXXXX1     1
State      XXXXXXXX     0

The feedback isn't in terms of X until the SR has been filled!
So in order to generate a message with a predetermined checksum, you take your new message, generate it's CRC and work out it's next 32 bits of feedback. This you can do in 32 steps of the CRC function. You then need to calculate the effect this feedback has on the contents of the shift register.

A shortcut for doing this is to pad your message with four zero bytes and then look at the checksum. (Checksum is the state of the SR at the end, which if padded with four zero bytes is the influence of the feedback and the empty bytes.)

Exclusive OR that influence with the checksum value you want, replace the four byte trailer with that computed value and regenerate the checksum. You could do this with any program that generates CRC32, a hex editor, and a calculator that can handle hex.

If you want to generate two messages that both make complete sense and don't contain trailing garbage, things get a little harder. Identify a number of sections that you can write plausible alternatives, with exactly the same length.

Using english prose as an example.
"I think that this can work"
and
"I believe in this approach"
Have broadly similar meanings, and exactly the same length.

Identifying enough examples in your message is the tricky bit (Unless you want to cheat with whitespace!) CRC 32 is linear, provided the data has the correct offset within the message. So CRC([messagea][padding])^CRC([padding][messageb])=CRC([messagea][messageb])
There are some caveats with word alignment that you'll need to cope with, as a general hint, you want to extend the passages out into the "fixed" parts of the message. As a general rule you want to have alternatives for n*1.5 passages, where n is the size of the CRC.

You can now calculate the CRC that the skeletal message has, the impression that each alternative passage would have on it, and then draw up a table comparing the influence that each alternative for each passage would have. You then need to select alternatives that will modify the skeletal CRC to match the CRC you want. That problem is actually quite fun to solve, First off find any alternatives that uniquely modify a bit, if that bit needs to change for your CRC, select that alternative and fold it's influence into the CRC, then go round again. That should reduce the solution space that you then need to search.

That's quite a tough thing to code up, but it would generate your collisions in a very short time span.

回复收藏 0 原文

近箐 2024-08-13 12:20:09

我的微积分没有缺陷，在 N 次试验后未发现一次碰撞的概率大致如下表所示：

  N      Probability
-------  -----------
 50,000  74.7%
 77,000  50.1%
 78,000  49.2%
102,000  29.8%
110,000  24.5%
128,000  14.8%
150,000   7.3%
200,000   0.95%

换句话说，必须计算超过 200,000 个 CRC32 值的概率在找到重复项之前小于1%，或者在102,000次尝试之前找到重复项的概率为70.2%
顺便说一句，这是值得注意的，因为在第 200,000 次尝试中发现一次碰撞的概率仍然约为 1% 的 1/1000 ((4M - 200,0000) / 4M) ，但是在第 200,000 次尝试之前发现一次碰撞的可能性是准确定性的（嗯，无论如何，超过 99%）。 这表明保留迄今为止计算的 CRC 数据库的兴趣。

我们当然可以花一些时间研究 CRC32 算法及其基础数学，试图找到更有可能产生 CRC32 冲突的消息，但至少需要相对少量的真正随机尝试才能找到与准确定性的一次碰撞使得这种密码分析方法几乎不值得付出努力。例如，假设我们可以找到一种方法来选择相互冲突的可能性高出 10 倍的消息，那么我们仍然需要尝试 63,000 次，才能达到 99% 的至少发生一次冲突的可能性（优于 200,000，但仍需要大致相同类型的应用程序。）
在这方面，我们可能要考虑的唯一一件事是避免长度短于 4 个字节的消息（我在某处读到 CRC32 在该消息空间中是双射的），并且避免太相似的消息（即仅相差一两个字符），因为 CRC32 的最初目的是检测（并可能自动更正）消息中如此小的差异。

因此，任务的难度似乎并不在于找到以极快的速度计算 CRC32 的方法（尽管我们也不应该太慢），而是管理一个可快速搜索的数据库最多 200,000 条消息（或消息“密钥”，更多信息见下文）及其关联的 CRC32 值。

实现这一切的一些想法

需要一个简单的 ISAM 库，或者更好的是一个正式的 DBMS 接口，例如 MySql 甚至 SqlLite。
通过使用伪随机数生成器 (PRNG) 来生成消息，我们可以保存消息密钥（即我们向 PRNG 提供的任何内容来生成给定消息），而不是存储整个密钥 >消息。这将使数据库插入和搜索更加高效，但存在错误选择 PRNG（或者更确切地说是基于 pm 随机数的消息生成器）的风险，即（一开始）生成不太可能通过 CRC32 的消息的风险。碰撞...
分批工作可能更好，即生成 1,000 个新的 CRC，然后检查碰撞并存储它们，而不是一次为一个 CRC 执行所有这些操作。如果我们使用现成的 DBMS，则尤其如此

Short of a flaw with my calculus, the probability of not having found one collision after N trials is approximated in the following table:

  N      Probability
-------  -----------
 50,000  74.7%
 77,000  50.1%
 78,000  49.2%
102,000  29.8%
110,000  24.5%
128,000  14.8%
150,000   7.3%
200,000   0.95%

In other words, the probability of having to compute more than 200,000 CRC32 values before finding a duplicate is less than 1%, or, the probability of finding a duplicate before 102,000 attempts is 70.2%
BTW this is remarkable because the probability of finding one collision on, say, the very 200,000th attempt is still in the order of 1/1000th of 1% ((4M - 200,0000) / 4M), but the probably to have found one collision before the 200,000th attempt is a quasi certainty (well, above 99% anyway). This shows the interest of keeping a database of CRCs computed so far.

We could certainly spend some time studying the CRC32 algorithm and its underlying mathematics, in an attempt to find messages more likely to produce a CRC32 collisions, but the relatively small number of truly random attempts required to find at least one collision with quasi certainty, makes this kind of cryptanalysis approach hardly worth the effort. For example assuming that we could discover a way to select messages that are 10 times more likely to collide with one another, we'd still have to try in the order of 63,000 times before reaching the 99% chances of having at least one collision (better than 200,000 but, still requiring roughly the same type of application.)
The only thing we may want to consider, in this area, is to avoid messages shorter than 4 bytes in length (I read somewhere that CRC32 was bijective in this message space), and to avoid messages that are too similar (i.e. only differing by one or two characters), since after CRC32's original purpose is to detect (and possibly auto-correct) such small differences in messages.

Therefore, it appears that the difficulty of the assignment is not so much that of finding ways of computing CRC32s at sizzling speed (though we shouldn't be too slow with this either), but rather to manage a quickly-searcheable database of up to 200,000 Messages (or message "key", more on this below) and their associated CRC32 value.

A few ideas to implement all this

Need a simple ISAM library, or better a formal DBMS interface such as MySql or even SqlLite.
By using a pseudo random number generator (PRNG), to produce the messages, we can save the message keys (i.e. whatever we feed the PRNG to produce a given message), rather than storing the whole message. This would make the database inserts and searches more efficient, at the risk of wrongly picking the PRNG, (or rather the message generator based pm random numbers), i.e. one which would produce (at first) messages that are somehow less likely to CRC32-collide...
It is probably better to work in batches, i.e. producing say 1,000 new CRCs and then checking for collisions and storing them, instead of doing all these thing for one CRC at a time. This is particularly true if we use off-the-shelf DBMS

回复收藏 0 原文