关于Http大文件下载的MD5校验
MD5校验和广泛用于Http下载大文件的完整性检查。我的问题是,由于TCP本身提供了可靠的机制(即每个TCP包的校验和以确保其完整性)。所以,总之TCP是可靠的。 Http是基于TCP的(所以Http也应该是可靠的),那么为什么我们需要另一种完整性检查机制(即MD5校验和)呢?
提前致谢, 乔治
MD5 checksum is widely used for integrity checking for Http downloading big files. My question is, since TCP itself provides reliable mechanism (i.e. checksum for each TCP package to ensure its integrity). So, in short TCP is reliable. Http is based on TCP (so Http should also be reliable), so why we need another mechanism of integrity checking (i.e. MD5 checksum)?
thanks in advance,
George
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
大多数情况下,您使用哈希和进行带外(例如打印在网站上)下载完整性检查,而不是编程。
这可以防止对下载工件的操纵。
Most often you use the hash sum for an out of band (printed on the webiste for example) check of the download integrity, not programmatic.
This prevents manipulation of the download artifact.
我一生中不止 3 次下载了损坏的 ISO 或 EXE,当我再次下载时它又可以工作了。这向我证明 TCP 机制不足以确保完整性。
More than 3 times in my life I downloaded a broken ISO or EXE and when I downloaded it again it worked. This proves to me that the TCP mechanism isn't enough to ensure integrity.
答案很简单。在您开始下载之前,源文件可能已经损坏。 TCP 仅验证您下载的文件是否与源相同。 MD5 保证您可以知道文件是否损坏,无论原因是传输问题还是初始文件本身。
Answer is simple. The source file may already be corrupt before you even begin downloading. TCP only verifies that the file you download is the same as the source. MD5 guarantees that you could know if it's corrupt whether the cause be a problem in transfer or the initial file itself.
当涉及到 35G 的 TED-LIUM 语料库或更大的 400G 的小图像时,下载的文件几乎每次都会出现错误。对于35G的TED-LIUM语料库,我下载了至少20次,总共700G的网络传输,持续了几个月。 CRC只是一场噩梦。
When it comes to the 35G of TED-LIUM corpus or the even larger 400G of tiny-images, it seems almost something error every time in the downloaded file. For the 35G TED-LIUM corpus, I did the download for at least 20 times and totally 700G of the network transmission for several months. CRC is just a nightmare.