为什么某个文件的md5总是改变?
我有一项任务需要调查为什么文件的 md5 值不断变化。
示例:
我需要生成某台机器的诊断文件。
生成文件后,它会生成一个 .zip 文件,例如 Diag.zip,其中包含该特定计算机的所有信息/文件。
Diag.zip 文件内部包含一个 .xls,例如 Data.xls,其中包含该计算机中所有文件的所有摘要,包括文件目录、文件版本、文件大小、创建时间和 md5。
然后将Data.xls的所有信息保存到数据库中。
大约一天后,再次执行步骤 1-4。
然后当我查询数据库中Data.xls在两周范围内的所有保存数据时,发现该机器上几乎所有文件的md5值都发生了变化。
问题是:为什么每次生成新的诊断文件时md5值总是改变?
I have this task that needs investigation as to why the md5 value of a file keeps changing.
Example:
I need to generate the diagnostic file of a certain machine.
After generating the file, it produces a .zip file, say, Diag.zip which contains all the information/files of that certain machine.
Inside Diag.zip file contain a .xls, say, Data.xls which contains all the summary of all files in that certain machine, includes, the directory of the file, file version, file size, create time and md5.
Then save all the information of Data.xls in database.
After a day or so, do it again back in Step 1-4.
Then when I queried all the save data of Data.xls in the database in a 2 weeks range, and it shows that almost all files in that certain machine have its md5 value changed.
The question is: Why is it that md5 value always changed every time I generated a new diagnostic files?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Excel 文件(尤其是 Excel 2003 xls 文件)似乎存在问题。每当在 Excel 中打开它们时,即使它们没有更改且没有保存,Excel 也会自动更新一些文件的元数据,例如“文档属性和个人信息”和“上次访问的统计信息”。因此,文件每次打开时都会发生一点变化,这使得MD5也会发生变化。
避免此问题的一种方法是删除“文档属性和个人信息”。
从 Office 文档中删除隐藏数据和个人信息。 Excel 2007: 从 Office 文档中删除隐藏数据和个人信息
从 Office 文档中删除隐藏数据和个人信息。 Excel 2013、Excel 2010:通过检查工作簿删除隐藏数据和个人信息
避免这种情况的其他方法是使用 xlsx 文件。我一直试图在 xlsx 文件中复制此行为,但似乎只发生在 xls (2003) 上。
There seems to be an issue with excel files, in particular Excel 2003 xls files. Whenever they get opened in Excel, even if they don't get changed and don't get saved, Excel automatically updates some of the file's metadata, such as the "Document Properties and Personal Information" and "Last Accessed Statistics". Therefore, the file every time it gets opened changes a little bit, and this makes that the MD5 changes also.
One way to avoid this problem is to remove "document properties and personal information".
Remove hidden data and personal information from Office documents. Excel 2007: Remove Hidden Data and Personal Information from Office Documents
Remove hidden data and personal information from Office documents. Excel 2013, Excel 2010: Remove Hidden Data and Personal Information by Inspecting Workbooks
Other way to avoid this would be to use xlsx files. I have been trying to replicate this behavior in xlsx files, but it seems it only happens on xls (2003).
MD5 基于很多东西。但我可以假设文件大小、文件名和文件大小。创建日期。
如果其中一项发生变化,md5 哈希值就会发生变化。完全相同的文件将始终返回完全相同的 md5 哈希值。新文件总是会生成新的 md5 哈希值。
The MD5 is based on a lot of things. But I can assume filesize, filename & creationdate.
If one of those changes, the md5 hash changes. The exact same file will always return the exact same md5 hash. A new file always generates a new md5 hash.