如何传输哈希值以进行文件完整性检查?
我有一个从服务器下载文件的应用程序。连接非常不稳定,因此我们正在实施一项功能来检查文件完整性,以便我们可以知道文件是否未正确下载并进行相应管理。
我应该如何进行这个过程?现在,我向服务器请求文件的哈希值,然后向文件本身发出另一个请求,然后计算下载文件的哈希值,并比较两个哈希值。
这是正确的方法吗?有些东西告诉我事实并非如此。如果发现哈希值不同,我会多次执行完全相同的过程,包括再次请求哈希值(应该是相同的)。我应该每次都麻烦地请求哈希值吗?我这样做是为了防止传输不正确?这有必要吗?有没有办法让我减少请求的数量,因为它们很昂贵而且现在速度非常慢。
有什么想法吗?
以防万一,服务器使用 C#,客户端是 Android 设备 (JAVA)。
谢谢,
I have an application that downloads a file from the server. The connection is very unstable and so we are implementing a feature to check for file integrity so that we can know if the file was not downloaded correctly and manage accordingly.
How should I go about this process? Right now I make a request to the server for the file's hash, then I make another request for the file itself, then compute the hash for the downloaded file and file compare the 2 hashes.
Is this the right approach? Something tells me it is not. If the hashes are found to be different I go through the exact same process a few times including requesting the hash again (which should be the same). Should I bother requesting the hash every time ? I'm doing it in case that is not transferred correctly ? Is this unnecessary ? Would there be a way for me to reduce the number of requests since they are expensive and things are veeery slow right now.
Any ideas?
Just in case it matters the server is using C# and the client is an android device (JAVA).
Thanks,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
TCP/IP 自行进行完整性检查;你不必这样做。通过 CRC 确保每个数据包的完整性,TCP 协议检查丢失的数据包并请求重新提交。因此,只要您的服务器生成 Content-Length 标头,您就可以确定检测到错误传输并导致客户端出错。
也就是说,自定义 HTTP 标头是存放文件哈希的好地方。其名称前面加上“X-”前缀,这样就不会与现有或未来的标准标头发生冲突。
TCP/IP does integrity checking on its own; you don't have to. Integrity of each data packet is ensured with CRC, and the TCP protocol checks for lost packets and requests resubmission. So as long as your server generates the Content-Length header, you can be sure that mistransmission is detected and the client errors out.
That said, a good place for a file hash would be a custom HTTP header. Prefix its name with "X-", so that it does not collide with existing or future standard headers.
是的,有更好的方法。首先,不是请求整个文件的哈希值,而是压缩文件并将压缩数据分段为(例如)100KB 块,并提供一系列哈希值(每个块一个),然后是这些哈希值序列的自哈希值。我所说的自散列只是指获取散列向量,对其进行散列并将其粘贴在向量的末尾。
现在,您可以通过检查自哈希来验证该哈希向量是否正确传输。如果没有通过,则重新请求哈希向量。
第二阶段是请求传输压缩数据。遇到这种情况时,您可以以 100KB 的间隔检查传输是否正确,一旦出现错误就中止。然后(如果可能)从您上次中断的位置(“高潮标记”)开始重新请求。
最后您可以安全地解压缩数据。许多解压缩算法将执行进一步的完整性检查,这为您提供了进一步的验证 - 防止任何编程错误。免费支票是值得的。
无论您是否使用经过检查的协议(如 TCP/IP)或不可靠的协议(如 UDP),此方法都将起作用。压缩数据(如果您还没有这样做)也将是一个重大改进。
唯一的缺点——显然需要更多的工作。
Yes there is a better way. Firstly, instead of requesting a hash of the entire file, compress the file and segment the compressed data into (say) 100KB blocks and supply a sequence of hashes, one per block, followed by a self-hash of those sequence of hashes. By a self-hash I just mean taking the vector of hashes, hashing that and sticking that on the end of the vector.
You can now verify that this vector of hashes transferred correctly by checking the self-hash. If it doesn't pass, re-request the hash vector.
The second phase is then to request the transfer of the compressed data. As this comes across, you can check at 100KB intervals that the transfer is correct, aborting as soon as you get an error. Then (if possible) start the re-request from where you left off, a "high tide mark".
Finally you can safely decompress the data. Many decompression algorithm will perform a further integrity check, which gives you a further round of verification - defending against any programming mistakes. A free check is worth it.
This approach will work regardless of whether or not you're working over a checked protocol like TCP/IP or an unreliable protocol like UDP. Compressing the data, if you don't do it already, will be a significant improvement too.
The only downside - it is obviously a lot more work.