我应该使用什么校验和算法?

发布于 2024-10-03 16:44:32 字数 488 浏览 9 评论 0原文

我正在构建一个系统,该系统需要能够查找字节块是否已更新。 我认为我应该计算它的校验和,存储它并稍后计算相同的校验和,以查看该 blob 是否已更新,而不是存储整个 blob(它们最多可达 5MB)。

目标是最小化以下内容(按顺序):

  • 冲突可能性的校验和时间的大小
  • 计算
  • (即使内容已被修改,也会发生 2 个相同的校验和)。

我们的系统的碰撞次数不超过 1/1,000,000 是可以接受的。关心的不是安全性,而只是更新/错误检测,因此很少发生冲突是可以的。 (这就是为什么我把它放在要最小化的事情的最后)。

此外,我们无法自己修改文本块。

当然,我会想到 md5crcsha1,如果我想要一个快速的解决方案,我会这样做。然而,我不仅仅是在寻找快速解决方案,而是在寻找不同方法及其优缺点的比较

I'm building a system which needs to be able to find if blobs of bytes have been updated.
Rather than storing the whole blob (they can be up to 5MBs), I'm thinking I should compute a checksum of it, store this and compute the same checksum a little bit later, to see whether the blob has been updated.

The goal is to minimize the following (in that order) :

  • size of the checksum
  • time to compute
  • likeliness of collisions (2 identical checksums happening even if the content has been modified).

It is acceptable for our system to have collision not more than 1/1,000,000. The concern is not security, but simply update/error detection, so rare collisions are ok. (Which is why I put it last in the things to minimize).

Also, we cannot modify the blobs of text ourselves.

Of course, md5, crc or sha1 come to mind, and if I wanted a quick solution, I'd go for it. However, more than a quick solution, I'm looking for what could be a comparison of different methods as well as the pros and cons.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

甜味超标? 2024-10-10 16:44:33

Blake2 是您可以使用的最快的哈希函数,主要采用:

BLAKE2 不仅比其他好的哈希函数更快,而且
甚至比 MD5 或 SHA-1 更快
来源

SHA-3 竞赛的获胜者是 Keccak算法但尚未流行的实现,在 GNU/Linux 发行版中默认不采用。相反,作为 SHA-3 竞赛候选者的 Blake2 比 Keccak 更快,并且是 GNU coreutils。因此,在 GNU/Linux 发行版上,您可以使用 b2sum 来使用 Blake2 哈希算法。

Blake2 is the fastest hash function you can use and that is mainly adopted:

BLAKE2 is not only faster than the other good hash functions, it is
even faster than MD5 or SHA-1
Source

Winner of SHA-3 contest was Keccak algorithm but is not yet has a popular implementation is not adopted by default in GNU/Linux distributions. Instead, Blake2 that was a SHA-3 contest candidate is faster than Keccak and is part of GNU coreutils. So on you GNU/Linux distribution you can use b2sum to use Blake2 hash algorithm.

狼性发作 2024-10-10 16:44:32

我建议您查看此页面,CRC 与 MD5/SHA1。
这个其他线程中讨论了速度和碰撞。
一如既往,维基百科是您的朋友。

如果我必须选择,有一个重要的问题需要回答:你是否希望在任何情况下都不会发生碰撞——或者至少希望这种可能性如此之低,以至于接近月球与地球相撞的可能性在接下来的 5 分钟内?

如果是,请选择 SHA 系列。
对于您的情况,我会更改更新检查的完成方式。
例如,增量编号可以与 blob 关联,并代替哈希发送,如果另一个编号不同,则需要更新请求边。在这种情况下,碰撞概率从 ~10^-18 到 ~0(基本上是 0 + bug 概率)...

编辑以下评论

找到了这个算法,Adler-32 ,这对于 CRC 为 32 位的长消息 (MB) 很有用,即大约 ~1/10^9(MD5 的长度为 128 位)。
计算速度很快。
Adler-32。底部有一些示例(链接)。

I suggest you have a look to this SO page, CRC vs MD5/SHA1.
Speed and collisions are discussed in this other thread.
And as always Wikipedia is your friend.

If I had to choose, there is an important question to answer: do you want that in any case there is no collision - or, at least, that the probability is so low that it is close to the chance that the Moon collides with Earth within the next 5 minutes?

If yes, choose the SHA family.
In your case I would change the way the update check is being done.
For instance, an incremental number could be associated with the blob, and be sent instead of the hash, the request for update would be required if the number is different on the other side. The collision probability in this case goes from ~10^-18 to ~0 (basically 0 + bug probability )...

Edit following comments

Found this algorithm, Adler-32, which is good for long messages (MB) with a CRC of 32 bits, i.e. about ~1/10^9 (MD5 is 128 bits long).
It is fast to calculate.
Adler-32. There is some come sample (link) at the bottom.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文