确定目录中是否添加、删除或修改了任何文件
我正在尝试编写一个Python脚本来获取目录中所有文件的md5sum(在Linux中)。我相信我已经在下面的代码中完成了这一点。
我希望能够运行它以确保目录中的文件没有更改,并且没有添加要删除的文件。
问题是,如果我对目录中的文件进行更改,然后又将其更改回来。运行下面的函数我得到了不同的结果。 (即使我将修改后的文件改回来。
谁能解释一下。如果您能想到解决方法,请告诉我?
def get_dir_md5(dir_path):
"""Build a tar file of the directory and return its md5 sum"""
temp_tar_path = 'tests.tar'
t = tarfile.TarFile(temp_tar_path,mode='w')
t.add(dir_path)
t.close()
m = hashlib.md5()
m.update(open(temp_tar_path,'rb').read())
ret_str = m.hexdigest()
#delete tar file
os.remove(temp_tar_path)
return ret_str
编辑: 正如这些好人所回答的那样,看起来 tar 包含标题信息,例如修改日期。使用 zip 或其他格式会有什么不同吗?
还有其他解决方法吗?
I'm trying to write a Python script that will get the md5sum of all files in a directory (in Linux). Which I believe I have done in the code below.
I want to be able to run this to make sure no files within the directory have changed, and no files have been added for deleted.
The problem is if I make a change to a file in the directory but then change it back. I get a different result from running the function below. (Even though I changed the modified file back.
Can anyone explain this. And let me know if you can think of a work-around?
def get_dir_md5(dir_path):
"""Build a tar file of the directory and return its md5 sum"""
temp_tar_path = 'tests.tar'
t = tarfile.TarFile(temp_tar_path,mode='w')
t.add(dir_path)
t.close()
m = hashlib.md5()
m.update(open(temp_tar_path,'rb').read())
ret_str = m.hexdigest()
#delete tar file
os.remove(temp_tar_path)
return ret_str
Edit:
As these fine folks have answered, it looks like tar includes header information like date modified. Would using zip work any differently or another format?
Any other ideas for work arounds?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
正如其他答案提到的,即使内容相同,两个 tar 文件也可能不同,因为 tar 元数据更改或文件顺序更改。您应该直接对文件数据运行校验和,对目录列表进行排序以确保它们始终处于相同的顺序。如果您想在校验和中包含一些元数据,请手动包含它。
使用 os.walk 的未经测试的示例:
As the other answers mentioned, two tar files can be different even if the contents are the same either due to tar metadata changes or to file order changes. You should run the checksum on the file data directly, sorting the directory lists to ensure they are always in the same order. If you want to include some metadata in the checksum, include it manually.
Untested example using
os.walk
:TAR 文件头包含一个文件修改时间字段;更改文件的行为,即使该更改后来又改回来,也意味着 TAR 文件头将不同,从而导致不同的哈希值。
TAR file headers include a field for the modified time of the file; the act of changing a file, even if that change is later changed back, will mean the TAR file headers will be different, leading to different hashes.
您不需要制作 TAR 文件来执行您建议的操作。
这是您的解决算法:
生成的单个签名将是您正在寻找的。
哎呀,你甚至不需要Python。你可以这样做:
You do not need to make the TAR file to do what you propose.
Here is your workaround algorithm:
The single resulting signature will be what you are looking for.
Heck, you don't even need Python. You can do this:
tar
文件包含实际文件内容之外的元数据,例如文件访问时间、修改时间等。即使文件内容没有更改,tar
文件也会事实有所不同。tar
files contain metadata beyond the actual file contents, such as file access times, modification times, etc. Even if the file contents don't change, thetar
file will in fact be different.