什么时候使用 CRC 比 MD5/SHA1 更合适?
何时适合使用 CRC 进行错误检测,而不是使用 MD5 或 SHA1 等更现代的哈希函数? 前者更容易在嵌入式硬件上实现吗?
When is it appropriate to use CRC for error detection versus more modern hashing functions such as MD5 or SHA1? Is the former easier to implement on embedded hardware?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
CRC 可以很好地检测数据中可能出现的随机错误,例如网络干扰、线路噪声、失真等。CRC
的计算复杂度远低于 MD5 或 SHA1。 使用像 MD5 这样的哈希函数对于随机错误检测来说可能有点大材小用。 然而,使用 CRC 进行任何类型的安全检查都比 MD5 等更复杂的散列函数安全性低得多。
是的,CRC 在嵌入式硬件上实现起来要容易得多,您甚至可以在 I2C 上为此获得不同的封装解决方案。
更新
是的,这个答案已经很旧了。 出于安全目的,请不要使用 SHA1 或 MD5 ;)
CRC works fine for detecting random errors in data that might occur, for example, from network interference, line noise, distortion, etc.
CRC is computationally much less complex than MD5 or SHA1. Using a hash function like MD5 is probably overkill for random error detection. However, using CRC for any kind of security check would be much less secure than a more complex hashing function such as MD5.
And yes, CRC is much easier to implement on embedded hardware, you can even get different packaged solutions for this on IC.
Update
Yes, this answer is old. Please don't use SHA1 or MD5 for security purposes ;)
CRC 旨在防止数据的无意更改。
也就是说,它对于检测无意的错误很有用,但对于确保数据未被恶意处理的方式却毫无用处。
另请参阅 这个。
CRC is designed against unintentional changes in the data.
That is, it's good for detecting unintentional errors, but will be useless as a way of making sure a data was not maliciously handled.
Also see this.
我发现一项研究显示有多么不恰当 CRC 哈希用于哈希表。 它还解释了算法的实际特征。 该研究还包括对其他哈希算法的评估,是一个很好的参考.更新
该网站似乎已关闭。 互联网档案不过有一个副本。
更新2
天哪。 事实证明,这项研究的结论可能是错误的CRC 上用作哈希。 感谢@minexew 提供的链接。
I found a study that shows how inappropriate CRC hashes are for hash tables. It also explains the actual characteristics of the algorithm. The study also includes evaluation of other hash algorithms and is a good reference to keep.UPDATE
It seems the site is down. The internet archive has a copy though.
UPDATE 2
Oh dear. It turns out the study may have been faulty around the conclusions on CRC for use as a hash. Thanks @minexew for the link.
我在 1.000.000 循环中运行了该 PHP 代码的每一行。 结果在注释 (#) 中。
我的结论:
你不关心安全。
当您需要添加安全层时,请使用“sha256”(或更高版本)。
不要使用“md5”或“sha1”,因为它们具有:
I ran every line of this PHP code in 1.000.000 loop. Results are in comments (#).
My conclusion:
you do not care about security.
Use "sha256" (or higher) when you need added security layer.
Do not use "md5" or "sha1" because they have:
这完全取决于您的要求和期望。
以下是这些哈希函数算法之间的简要区别:
CRC (CRC-8/16/32/64)
MD5
SHA-1
是一种加密哈希算法,
生成一个 160 位(20 字节)哈希值,称为消息摘要
它是一个加密哈希,自 2005 年以来,它不再被认为是安全的,
可用于加密目的,
sha1 示例发现碰撞
首次发布于 1993 年(作为 SHA-0),然后于 1995 年作为 SHA-1,
系列:SHA-0、SHA-1、SHA-2、SHA-3,
总而言之,对于资金充足的对手来说,使用 SHA-1 不再被认为是安全的,因为在 2005 年,密码分析学家发现了对 SHA-1 的攻击,这表明它可能不够安全,无法持续使用schneier。 美国NIST建议联邦机构在需要抗碰撞的应用中应停止使用SHA1-1,而必须在2010年后使用SHA-2NIST。
因此,如果您正在寻找简单快速的解决方案来检查文件的完整性(防止损坏),或者为了性能方面的一些简单缓存目的,您可以考虑 CRC-32,对于散列,您可以考虑使用MD5,但是如果您正在开发专业应用程序(应该是安全且一致的),为了避免任何冲突概率 - 使用 SHA-2 及更高版本(例如 SHA-3)。
性能
PHP 中的一些简单基准测试:
相关:
It all depends on your requirements and expectation.
Here are quick brief differences between these hash function algorithms:
CRC (CRC-8/16/32/64)
MD5
SHA-1
is a cryptographic hash algorithm,
produces a 160-bit (20-byte) hash value known as a message digest
it is a cryptographic hash and since 2005 it's no longer considered secure,
can be used for encryption purposes,
an example of a sha1 collision has been found
first published in 1993 (as SHA-0), then 1995 as SHA-1,
series: SHA-0, SHA-1, SHA-2, SHA-3,
In summary, using SHA-1 is no longer considered secure against well-funded opponents, because in 2005, cryptanalysts found attacks on SHA-1 which suggests it may be not secure enough for ongoing useschneier. U.S. NIST advise that federal agencies should stop using SHA1-1 for application which require collision resistance and must use SHA-2 after 2010NIST.
Therefore, if you're looking for simple and quick solution for checking the integrity of a files (against the corruption), or for some simple caching purposes in terms of performance, you can consider CRC-32, for hashing you may consider to use MD5, however if you're developing professional application (which should be secure and consistent), to avoid any collision probabilities - use SHA-2 and above (such as SHA-3).
Performance
Some simple benchmark test in PHP:
Related:
有关 CRC 实施、速度和可靠性的信息,请参阅CRC 错误检测算法的轻松指南 。 它包含有关 CRC 的所有内容。
除非有人试图恶意修改您的数据并隐藏更改,CRC 就足够了。 只需使用“好”(标准)多项式即可。
For CRC information on implementation, speed and reliability see A painless guide to CRC error detection algorithms. It has everything on CRCs.
Unless somebody is going to try and modify your data maliciously and hide the change CRC is sufficient. Just use a "Good" (standard) polinomial.
在检测随机错误方面,短 CRC 比相同长度的伪随机哈希要好得多。
多年来,这个问题已经积累了大量答案,但大多数答案都是不必要的,但还没有一个答案指出这个关键事实。 即使您可以承受计算成本,也不应该使用短随机哈希(例如截断的 MD5 或 SHA-1)来捕获偶尔翻转的位,因为误报率会很高。
这是一个有效的例子。 假设您的消息是 12 个八位位组(96 位)的有效负载加上 1 个用于错误检测的八位位组。 还假设每个位在传输过程中有万分之一的独立机会被损坏(翻转)。 请注意,这意味着大约 1% 的数据包将至少有 1 个翻转位,大约 0.01% 的数据包将至少有 2 个翻转位,依此类推。
如果错误检测位是伪随机散列(例如截断为 8 位的 MD5 或 SHA-1),则始终会检测到仅限于校验位的损坏,而不限于这些位的损坏将在 255/256 左右被检测到的时间。 总而言之,大约 (12/13)×(1/256) ≈ 0.36% 的损坏数据包将逃避检测。
如果错误检测位是一个简单的校验和(其他字节的总和模 256),则将检测到所有单位翻转错误(占总数的 99%),而在剩余的 1% 中,优于 7/将检测到 8 个。 不到 0.13% 的损坏数据包将被丢失。 因此,即使是简单的校验和也优于随机哈希。
如果错误检测位是具有适当选择的多项式的 CRC-8(例如 CRC-8-CCITT),则将检测到 1、2 或 3 个翻转位的所有错误,以及大约 127/128 的其他错误将被检测到。 不到 0.00000002% 的损坏数据包将被丢失。
使用 CRC 不仅是因为它们计算速度快(尽管它们的计算速度很快,尤其是在硬件中),还因为它们非常擅长检测某些类型的错误。 即使您使用的硬件计算截断 MD5 的速度比计算 CRC-8 的速度快,您可能仍然应该使用 CRC。
如果您有更多空间用于校验和(例如 128 位),那么情况就会有所不同。 CRC-128 仍然比 128 位随机哈希具有理论上的优势,但随机哈希的漏报率(大约 2−128)已经很低,因此不妨采取为零; 降低它并没有真正的好处。 如果您有能力在这种情况下使用 MD5 哈希,那么您也可以使用它。
如果您试图检测恶意引入的错误,那么事情就会变得更加复杂。 在这种情况下,有必要使用某种加密哈希(不是 CRC),但这还远远不够。 如果您确实需要设计一个可以安全抵御恶意干扰的协议,那么您应该在 Cryptography Stack Exchange 上询问。 不要认为使用 SHA-3 或 BLAKE2 等现代哈希足以保证您的安全。 可能不是。
Short CRCs are much better than pseudorandom hashes of the same length at detecting random errors.
This question has accumulated a large number of answers, most unnecessary, over the years, but none has yet pointed out this crucial fact. You should never use a short random hash (such as truncated MD5 or SHA-1) to catch occasional flipped bits, even if you can afford the computational cost, because the false-negative rate will be high.
Here's a worked example. Say your messages are 12 octets (96 bits) of payload plus 1 octet for error detection. Also suppose that each bit has an independent 1-in-10,000 chance of being corrupted (flipped) in transit. Note that means that roughly 1% of packets will have at least one flipped bit, roughly 0.01% of packets will have at least 2 flipped bits, and so on.
If the error-detection bits are a pseudorandom hash (such as MD5 or SHA-1 truncated to 8 bits), then corruption confined to the check bits will always be detected, while corruption not confined to those bits will be detected around 255/256 of the time. In all, roughly (12/13)×(1/256) ≈ 0.36% of all corrupted packets will evade detection.
If the error-detection bits are a simple checksum (sum of the other bytes mod 256), then all single-bit-flip errors (99% of the total) will be detected, and of the remaining 1%, better than 7/8 will be detected. Less than 0.13% of corrupted packets will be missed. So even a simple checksum outperforms a random hash.
If the error-detection bits are a CRC-8 with an appropriately chosen polynomial (such as CRC-8-CCITT), then all errors of 1, 2, or 3 flipped bits will be detected, and roughly 127/128 of other errors will be detected. Less than 0.00000002% of corrupted packets will be missed.
CRCs are not just used because they're fast to compute (although they are—especially in hardware) but also because they're really really good at detecting certain types of errors. Even if you're working with hardware that can compute a truncated MD5 faster than it can compute a CRC-8, you should probably still use the CRC.
If you have far more space for the checksum – 128 bits, say – then the situation is different. A CRC-128 still has a theoretical advantage over a 128-bit random hash, but the false negative rate of the random hash (about 2−128) is already so low that it may as well be taken to be zero; there is no real benefit to making it any lower. If you can afford to use an MD5 hash in this situation then you may as well use it.
If you're trying to detect maliciously introduced errors then things become much more complicated. It's necessary in that situation to use some sort of cryptographic hash (not a CRC) but it's far from sufficient. If you really need to design a protocol that's safe against malicious interference then you should ask about it at the Cryptography Stack Exchange. Don't assume that using a modern hash like SHA-3 or BLAKE2 is enough to keep you safe. It likely isn't.
你没有说出你想要保护的是什么。
CRC 通常在嵌入式系统中用于检查意外数据损坏,而不是防止恶意系统修改。 CRC 有用的地方的示例是在系统初始化期间验证 EPROM 映像以防止固件损坏。 系统引导加载程序将计算应用程序代码的 CRC,并在允许代码运行之前与存储的值进行比较。 这可以防止意外程序损坏或下载失败的可能性。
CRC 还可以以类似的方式使用来保护存储在 FLASH 或 EEPROM 中的配置数据。 如果 CRC 不正确,则可以将数据标记为无效并使用默认或备份数据集。 由于设备故障或用户在配置数据存储更新期间断电,CRC 可能会无效。
有人评论说,与具有多位错误的 CRC 相比,散列提供了更大的检测损坏的可能性。 这是事实,是否使用 16 位或 32 位 CRC 的决定将取决于所使用的损坏数据块的安全后果,以及您是否可以证明 1 in 2^16 或 2^32 的机会是合理的数据块被错误地声明为有效。
许多设备都具有用于标准算法的内置 CRC 生成器。 来自德克萨斯州的 MSP430F5X 系列具有 CRC-CCITT 标准的硬件实现。
You do not say what it is that you are trying to protect.
A CRC is often used in embedded systems as a check against accidental data corruption as opposed to preventing malicious system modification. Examples of the places where a CRC can be useful is to validate an EPROM image during system initialisation to guard against firmware corruption. The system bootloader will calculate the CRC for the application code and compare with the stored value before allowing the code to run. This protects against the possibility of accidental program corruption or a failed download.
A CRC can also be used in a similar manner to protect configuration data stored in FLASH or EEPROM. If the CRC is incorrect then the data can be flagged as invalid and a default or backup data set used. The CRC may be invalid due to device failure or if the user removed power during an update of the configuration data store.
There have been comments that a hash provides greater probability of detecting corruption than a CRC with multiple bit errors. This is true, and the decision on whether or not to use a 16 or 32 bit CRC will hinge upon the safety consequences of a corrupted data block being used and whether you can justify the 1 in 2^16 or 2^32 chance of a data block being incorrectly declared valid.
Many devices have a built in CRC generator for standard algorithms. The MSP430F5X series from Texas have a hardware implementation of the CRC-CCITT Standard.
CRC32 速度更快,并且哈希值只有 32 位长。
当您只想要快速且简单的校验和时使用它。 CRC 用于以太网。
如果您需要更高的可靠性,最好使用现代哈希函数。
CRC32 is faster and the hash is only 32bits long.
Use it when you just want a quick and light checksum. CRC is used in ethernet.
If you need more reliability it's preferable to use a modern hashing function.
仅当计算资源非常紧张(即某些嵌入环境)或者您需要存储/传输许多输出值并且空间/带宽紧张时才使用 CRC(因为 CRC 通常为 32 位,其中 MD5 输出为 128 位,SHA1 160位,以及高达 512 位的其他 SHA 变体)。
切勿使用 CRC 进行安全检查,因为 CRC 很容易“伪造”。
即使对于意外错误检测(而不是恶意更改检测),哈希也比简单的 CRC 更好。 部分是因为 CRC 的计算方式很简单(部分是因为 CRC 值通常比常见的哈希输出短,因此可能值的范围要小得多),在存在两个或多个错误的情况下,更有可能的是,一个错误会掩盖另一个错误,因此尽管有两个错误,但最终会得到相同的 CRC。
简而言之:除非您有理由不使用合适的哈希算法,否则请避免使用简单的 CRC。
Only use CRC if computation resources are very tight (i.e. some embed environments) or you need to store/transport many output values and space/bandwidth is tight (as CRCs are usually 32-bit where an MD5 output is 128-bit, SHA1 160 bit, and other SHA variants up to 512 bit).
Never use CRC for security checks as a CRC is very easy to "fake".
Even for accidental error detection (rather than malicious change detection) hashes are better than a simple CRC. Partly because of the simple way a CRC is calculated (and partly because CRC values are usual shorter than common hash outputs so have a much smaller range of possible values) it is much more likely that, in a situation where there are two or more errors, one error will mask another so you end up with the same CRC despite two errors.
In short: unless you have reason not to use a decent hash algorithm, avoid simple CRCs.
我最近遇到了 CRC 的使用,它很聪明。 jdupe 文件重复识别和删除工具的作者(与流行的 exif 工具 jhead)在第一次遍历文件时使用它。 对每个文件的前 32K 计算 CRC,以标记看起来相同的文件,并且文件必须具有相同的大小。 这些文件被添加到要进行完整二进制比较的文件列表中。 它可以加快检查大型媒体文件的速度。
I came across a use of CRC recently which was smart. The author of the jdupe file duplication identification and removal tool (the same author of the popular exif tool jhead) uses it during the first pass through the files. A CRC is computed on the first 32K of each file to mark files that appear to be the same, also the files must have the same size. These files are added to a list of files on which to do a full binary comparison. It speeds up checking large media files.
让我们从基础开始。
在密码学中,哈希算法通过摘要操作将许多位转换为更少的位。 哈希用于确认消息和文件的完整性。
所有散列算法都会产生冲突。冲突是指多个多位组合产生相同的较少位输出。 散列算法的加密强度是由个人无法确定给定输入的输出是什么来定义的,因为如果他们可以的话,他们就可以使用与合法文件匹配的散列来构建文件并损害假定的完整性系统的。
CRC32 和 MD5 之间的区别在于 MD5 生成更大的哈希值,更难以预测。
当您想要实现消息完整性(即消息在传输过程中未被篡改)时,无法预测冲突是一个重要的属性。 32 位哈希可以使用 40 亿个不同的唯一哈希来描述40 亿不同的消息或文件。 如果你有 40 亿个文件和 1 个文件,那么你肯定会发生 1 次冲突。 1 TB 位空间有可能发生数十亿次冲突。 如果我是攻击者并且我可以预测 32 位哈希值是什么,我就可以构造一个与目标文件冲突的受感染文件; 具有相同的哈希值。
此外,如果我正在进行 10mbps 传输,那么数据包被损坏以绕过 crc32 并继续到达目的地并执行的可能性非常低。 假设在 10mbps 时我遇到了10 个错误\秒。 如果我将其提高到 1gbps,那么现在我每秒会收到 1,000 个错误。 如果我的内存速度达到每秒 1 exabit,那么错误率为每秒 1,000,000,000 个错误。 假设我们的传输错误率为 1\1,000,000,这意味着百万分之一的传输错误会导致损坏的数据未被检测到。 在 10mbps 下,我会收到每 100,000 秒或大约每天发送一次的错误数据。 在 1gbps 下,每 5 分钟发生一次。 以每秒 1 艾比特的速度,我们每秒通话数次。
如果您打开 Wireshark,您将看到典型的以太网标头具有 CRC32,您的 IP 标头具有 CRC32,TCP 标头具有 CRC32,这是除了高层协议可能执行的操作之外的; 例如,除了上述之外,IPSEC 还可能使用 MD5 或 SHA 进行完整性检查。 典型的网络通信中有多层错误检查,但在低于 10mbps 的速度下,它们仍然时不时地出现错误。
循环冗余校验 (CRC) 有几个常见版本和几个不常见版本,但通常旨在仅告知消息或文件在传输过程中何时损坏(多个位翻转)。 由于冲突率的原因,在大型标量企业环境中,按照当今的标准,CRC32 本身并不是一个很好的错误检查协议。 普通用户的硬盘驱动器可以有超过 10 万个文件,而公司的文件共享可以有数千万个。 哈希空间与文件数量的比率太低。 CRC32 的实现成本较低,而 MD5 则不然。
MD5 旨在阻止故意使用冲突来使恶意文件看起来是良性的。 它被认为是不安全的,因为哈希空间已被充分映射以允许发生某些攻击,并且某些冲突是可预测的。 SHA1 和 SHA2 是新出现的。
对于文件验证,Md5 开始被许多供应商使用,因为您可以使用它快速处理数千兆字节文件或多太字节文件,并将其堆叠在通用操作系统的使用和 CRC32 支持之上。 如果在未来十年内文件系统开始使用 MD5 进行错误检查,请不要感到惊讶。
Lets start with the basics.
In Cryptography, a hashing algorithm converts many bits to fewer bits through a digest operation. Hashes are used to confirm integrity of messages and files.
All hashing algorithms generate collisions. A collision is when several many-bit combinations produce the same fewer bit output. The cryptographic strength of a hashing algorithm is defined by the inability for an individual to determine what the output is going to be for a given input because if they could they could construct a file with a hash that matches a legitimate file and compromise the assumed integrity of the system.
The difference between CRC32 and MD5 is that MD5 generates a larger hash that's harder to predict.
When you want to implement message integrity - meaning the message hasn't been tampered with in transit - the inability to predict collisions is an important property. A 32-bit hash can describe 4 billion different messages or files using 4 billion different unique hashes. If you have 4 billion and 1 files, you are guaranteed to have 1 collision. 1 TB Bitspace has the possibility for Billions of Collisions. If I'm an attacker and I can predict what that 32 bit hash is going to be, I can construct an infected file that collides with the target file; that has the same hash.
Additionally if I'm doing 10mbps transmission then the possibility of a packet getting corrupted just right to bypass crc32 and continue along the to the destination and execute is very low. Lets say at 10mbps I get 10 errors\second. If I ramp that up to 1gbps, now I'm getting 1,000 errors per second. If I ram up to 1 exabit per second, then I have an error rate of 1,000,000,000 errors per second. Say we have a collision rate of 1\1,000,000 transmission errors, Meaning 1 in a million transmission errors results in the corrupt data getting through undetected. At 10mbps I'd get error data being sent every 100,000 seconds or about once a day. At 1gbps it'd happen once every 5 minutes. At 1 exabit per second, we're talking several times a second.
If you pop open Wireshark you'll see your typical Ethernet header has a CRC32, your IP header has a CRC32, and your TCP Header has a CRC32, and that's in addition to the what the higher layer protocols may do; e.g. IPSEC might use MD5 or SHA for integrity checking in addition to the above. There are several layers of error checking in typical network communications, and they STILL goof now and again at sub 10mbps speeds.
Cyclic Redundancy Check (CRC) has several common versions and several uncommon but generally is designed to just tell when a message or file has been damaged in transit (multiple bits flipping). CRC32 by itself is not a very good error checking protocol by today's standards in large, scalar enterprise environments because of the collision rate; the average users hard-drive can have upwards of 100k files, and file-shares on a company can have tens of millions. The ratio of hash-space to the number of files is just too low. CRC32 is computationally cheap to implement whereas MD5 isn't.
MD5 was designed to stop intentional use of collisions to make a malicious file look benign. It's considered insecure because the hashspace has been sufficiently mapped to enable some attacks to occur, and some collisions are predictable. SHA1 and SHA2 are the new kids on the block.
For file verification, Md5 is starting to be used by a lot of vendors because you can do multigigabyte files or multiterrabyte files quickly with it and stack that on top of the general OS's use and support of CRC32's. Do not be surprised if within the next decade filesystems start using MD5 for error checking.
CRC32 速度更快,有时有硬件支持(即在 Nehalem 处理器上)。 实际上,您唯一会使用它的时候是在与硬件连接时,或者您对性能非常
CRC32 is way faster and sometimes has hardware support (i.e. on Nehalem processors). Really, the only time you'd use it is if you're interfacing with hardware, or if you're really tight on performance
CRC 码更简单、更快。
你需要什么?
CRC code is simpler and faster.
For what do you need any?