计算文件的MD5以保证完整性
我试图保证下载后文件的完整性。 我将文件的 MD5 存储在数据库中,并在下载后将 MD5 与文件进行比较。 但是,下载文件后对文件进行哈希处理时,我总是得到不同的 MD5 结果。 我想知道正在散列的字节数组是否包含上次修改的元数据并且正在丢弃散列。 如果其他人以前这样做过,我们将不胜感激您的帮助。
I am trying to guarantee the integrity of a file after download. I store the MD5 of the file in database and compare that MD5 to the file after it is downloaded. However, I always get different MD5 results when I hash the file after it is downloaded. I am wondering if the byte array that is being hashed contains the meta data like last modified and is throwing off the hash. If anyone else has done this before, your help would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
MD5 哈希是根据文件内容计算的,不受文档元数据的影响。 这是一个确定性过程,如果您从相同的内容开始,总是会产生相同的结果(尽管有一些方法可以因冲突而伪造 MD5 签名)。
您如何为文件创建 MD5 哈希值? 您是否尝试过使用其他工具来重现该问题?
如果存在不同的 MD5 签名,那么您的文件就会以某种方式有所不同。
先前建议的 EOL 字符或以 ASCII 模式传输二进制文件很可能是文件被更改的原因。 使用diff 工具可以帮助识别文件的不同之处/方式。 如果您的文件是二进制格式,请尝试使用二进制差异工具。
The MD5 hash is calculated on the file contents, and is not affected by document metadata. It is a deterministic process that will always produce the same result, if you start with the same content (although, there are ways to fake an MD5 signature due to collision).
How are you creating the MD5 hash for the file? Have you tried using another tool to reproduce the problem?
If there is a different MD5 signature, then your files are different somehow.
The previous suggestions of EOL characters, or transferring a binary file in ASCII mode are very likely reasons why the files could be changed. Using a diff tool can help identify where/how the files are different. If your file is binary format, try using a binary diff tool.
找出答案的简单方法:针对两个不同的下载运行差异(我假设是二进制的,但可能不是)。 这应该可以快速查明问题。
A simple way to find out: run a diff (I assume binary but maybe not) against two different downloads. This should quickly pinpoint the problem.
如果我在这里没有完全错的话,md5 哈希仅适用于实际数据,而不适用于时间戳和其他元数据。 也许您正在使用 ftp 传输文本文件,在这种情况下,ftpclient 可能会重写换行符以适合您的系统,然后散列将有所不同
If im not totally wrong here the md5 hash is only working on the actual data not the timestamps and other metadata. Maybe you are transfering text-files with ftp, in that case the ftpclient might rewrite the newline characters to fit your system and then the hash will be diffrent
如果您使用 FTP 下载,问题可能是:
二进制下载选项而不是 ASCII(反之亦然)。
跨平台传输,例如 Windows 到 Unix(其中已处理 EOL)
不同的方式。
If you are using FTP to download, the problem could be:
Binary download option instead of ASCII (or vice versa).
Transferring across platforms e.g. Windows to Unix where the EOL is treated
differently.
您可以通过仅对文件的特定部分进行哈希来测试您的理论...比如说,中间 50%...如果不同,那么您知道它不仅仅是时间戳或其他东西...也就是说,您确实需要给我们更多信息以获得更好的答案...
You could test your theory by only hashing against a particular part of file... Say, the middle 50%... If that is different then you know its not just a timestamp or something... That said, you really need to give us more info to get a better answer...
确保您实际上是在文件的字节上计算 MD5,而不是文件名或其他字符串。
Make sure you are actually calculating the MD5 on the bytes of the file, not the filename or some other string.