两个数据块产生相同 CRC64 值的可能性有多大?
我有一个缓存应用程序,它使用 CRC64 值来确保数据完整性。 我正在考虑添加一个额外的字段,即与数据一起传递的时间戳 在各个缓存服务器之间进行比较,看看数据是否发生了变化。
然而,这需要改变协议。虽然这不是什么大不了的事,但我已经有了 CRC64 可用作指示某些内容已发生更改的指示器。
有谁知道生成相同 CRC64 的两个数据块的统计数据吗?如果不是,我该如何计算或估计它的可能性?
I have an caching application that uses a CRC64 value to ensure data integrity.
I'm thinking about putting an extra field, a timestamp to be passed around with the data
between the various cache servers and compared to see if data has changed.
However, this requires protocol changes. While that's not a huge deal, I already have
a CRC64 that could be used as an indicator that something has changed.
Does anyone know the stats around two blocks of data producing the same CRC64? If not, how could I compute it or estimate it's likelyhood?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您假设 crc64 是“完美”,那么这些数字是相当合理的:
对于 1% 的概率碰撞,需要 6.1 × 10^8 个条目。对于 50% 的冲突概率,需要 5.1 × 10^9 条目。
当然,如果数据可能由恶意来源提供,那么像 crc64 这样简单的哈希中的冲突就可以很容易地生成,并且冲突可能会猖獗。因此,是否走这条路取决于输入数据的来源和冲突的潜在后果。
If you assume that crc64 is 'perfect', then the numbers are pretty reasonable:
For a 1% probability of collision, you need 6.1 × 10^8 entries. For a 50% probability of collision, you need 5.1 × 10^9 entries.
Of course, if the data is potentially supplied by malicious sources, then collisions in a hash as simple as crc64 can be generated easily, and collisions could be rampant. So whether or not you go this route depends on the source of input data and the potential ramifications of collisions.
任何两个给定块发生碰撞的概率为 1/264,即 1 分之 1.8 × 1019。
但是,如果您对大小为 N 的群体中的任意两个块的碰撞率感兴趣,则该概率很快就会变得更大。
有关更多信息,请参阅维基百科上的生日问题,其中包含公式和近似值。
The probability of any two given blocks colliding is 1/264, or 1 in about 1.8 × 1019.
However, the probability rapidly becomes more likely if you are interested in the rate of collision out of any two blocks from a population of size N.
For more information, see Birthday Problem on Wikipedia, which has formulas and approximations.
不同随机数据上的两个 CRC64 相同的概率在 2** 64 中接近 1 次。但由于 CRC 对数据模式有些敏感,因此可能会出现退化情况,您会丢失多个二进制保护顺序。可能无法得出一个确切的数字,但您可以安全地假设最坏情况下发生碰撞的几率将小于 2** 50 中的 1 次左右。
如果您使用加密哈希而不是 CRC64,则可以确保更接近理论极限,但加密哈希的计算成本通常要高得多。
The probability of two CRC64s over different random data being identical would be something close to 1 chance in 2** 64. But since CRCs are somewhat sensitive to data patterns, there could be degenerate cases where you'd lose several binary orders of protection. It's probably not possible to come up with a hard number, but you'd likely be safe in assuming the worst case chance of collision would be less than 1 chance in 2** 50 or so.
You'd be assured of getting closer to the theoretical limit if you used a cryptographic hash instead of a CRC64, but the crypto hash is generally much more expensive to compute.