递归比较两个目录以确保它们具有相同的文件和子目录
据我观察 filecmp.dircmp
< /a> 是递归的,但不足以满足我的需求,至少在 py2 中是这样。我想比较两个目录及其所有包含的文件。这是否存在,或者我需要构建(使用 os.walk
,例如)。我更喜欢预先构建,其他人已经完成了单元测试:)
实际的“比较”可能会很草率(例如忽略权限),如果这有帮助的话。
我想要一些布尔值,而 report_full_closure
是一份打印报告。它也只进入公共子目录。 AFIAC,如果他们在左侧或右侧目录中有任何内容,仅这些是不同的目录。我使用 os.walk 来构建它。
From what I observe filecmp.dircmp
is recursive, but inadequate for my needs, at least in py2. I want to compare two directories and all their contained files. Does this exist, or do I need to build (using os.walk
, for example). I prefer pre-built, where someone else has already done the unit-testing :)
The actual 'comparison' can be sloppy (ignore permissions, for example), if that helps.
I would like something boolean, and report_full_closure
is a printed report. It also only goes down common subdirs. AFIAC, if they have anything in the left or right dir only those are different dirs. I build this using os.walk
instead.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
这是使用
filecmp
模块比较函数的替代实现。它使用递归而不是 os.walk,因此更简单一些。但是,它不会简单地通过使用common_dirs
和subdirs
属性来递归,因为在这种情况下,我们将隐式使用文件比较的默认“浅”实现,这可能不是你想要什么。在下面的实现中,当比较具有相同名称的文件时,我们始终只比较它们的内容。Here's an alternative implementation of the comparison function with
filecmp
module. It uses a recursion instead ofos.walk
, so it is a little simpler. However, it does not recurse simply by usingcommon_dirs
andsubdirs
attributes since in that case we would be implicitly using the default "shallow" implementation of files comparison, which is probably not what you want. In the implementation below, when comparing files with the same name, we're always comparing only their contents.filecmp.dircmp
就是要走的路。但它不会比较两个比较目录中具有相同路径的文件的内容。相反,filecmp.dircmp
仅查看文件属性。由于dircmp
是一个类,因此您可以使用dircmp
子类来修复该问题,并重写其比较文件的phase3
函数,以确保比较内容而不是仅比较内容比较 os.stat 属性。然后您可以使用它返回一个布尔值:
如果您想重用此代码片段,特此将其专用于您选择的公共领域或知识共享 CC0(除了由所以)。
filecmp.dircmp
is the way to go. But it does not compare the content of files found with the same path in two compared directories. Insteadfilecmp.dircmp
only looks at files attributes. Sincedircmp
is a class, you fix that with adircmp
subclass and override itsphase3
function that compares files to ensure content is compared instead of only comparingos.stat
attributes.Then you can use this to return a boolean:
In case you want to reuse this code snippet, it is hereby dedicated to the Public Domain or the Creative Commons CC0 at your choice (in addition to the default license CC-BY-SA provided by SO).
这是一个带有递归函数的简单解决方案:
Here a simple solution with a recursive function :
report_full_closure()
方法是递归的:编辑:OP 编辑后,我想说最好只使用
filecmp
中的其他函数。我认为 os.walk 是不必要的;最好简单地递归通过common_dirs
等生成的列表,尽管在某些情况下(大型目录树),如果实施不当,这可能会带来最大递归深度错误的风险。The
report_full_closure()
method is recursive:Edit: After the OP's edit, I would say that it's best to just use the other functions in
filecmp
. I thinkos.walk
is unnecessary; better to simply recurse through the lists produced bycommon_dirs
, etc., although in some cases (large directory trees) this might risk a Max Recursion Depth error if implemented poorly.dircmp
可以是递归的:请参阅report_full_closure
。据我所知
dircmp
不提供目录比较功能。不过,自己编写也很容易;在dircmp
上使用left_only
和right_only
检查目录中的文件是否相同,然后在子目录 属性。
dircmp
can be recursive: seereport_full_closure
.As far as I know
dircmp
does not offer a directory comparison function. It would be very easy to write your own, though; useleft_only
andright_only
ondircmp
to check that the files in the directories are the same and then recurse on thesubdirs
attribute.比较 dir1 和 dir2 布局的另一种解决方案,忽略文件内容
请参阅此处的要点:https://gist。 github.com/4164344
编辑:这是代码,以防由于某种原因丢失要点:
Another solution to Compare the lay out of dir1 and dir2, ignore the content of files
See gist here: https://gist.github.com/4164344
Edit: here's the code, in case the gist gets lost for some reason:
这个递归函数似乎对我有用:
假设我没有忽略任何东西,如果你想知道目录是否相同,你可以否定结果:
This recursive function seems to work for me:
Assuming I haven't overlooked anything, you could just negate the result if you wanna know if directories are the same:
由于 True 或 False 结果就是您想要的,如果您安装了
diff
:Since a True or False result is all you want, if you have
diff
installed:基于 python 问题 12932 和 filecmp 文档 您可以使用以下示例:
Based on python issue 12932 and filecmp documentation you may use following example:
这是一个没有我们自己的递归和算法的小技巧:
Here's a tiny hack without our own recursion and algorithm:
这是我的解决方案: gist
Here is my solution: gist
这将检查文件是否位于相同位置以及它们的内容是否相同。它不会正确验证空子文件夹。
This will check if files are in the same locations and if their content is the same. It will not correctly validate for empty subfolders.
对于任何寻找简单库的人:
https://github.com/mitar/python-deep- dircmp
DeepDirCmp 基本上是 filecmp.dircmp 的子类,并显示与
diff -qr dir1 dir2
相同的输出。用法:
To anyone looking for a simple library:
https://github.com/mitar/python-deep-dircmp
DeepDirCmp basically subclasses filecmp.dircmp and shows output identical to
diff -qr dir1 dir2
.Usage:
根据 @Mateusz Kobos 当前接受的答案,事实证明第二个带有
shallow=False
的filecmp.cmpfiles
是不必要的,因此我们将其删除。可以从第一个dircmp
获取dirs_cmp.diff_files
。一个常见的误解(我们也犯过!)是dir_cmp
只是浅层的,不比较文件内容!事实证明这不是真的!shallow=True
的含义只是为了节省时间,并没有真正认为两个最后修改时间不同的文件是不同的。如果两个文件的最后修改时间不同,则会读取每个文件的内容并比较它们的内容。如果内容相同,那么即使最后修改日期不同,它也是匹配的!为了更加清晰,我们在这里添加了详细的打印内容。请参阅其他地方(filecmp.cmp() 忽略不同的 os.stat() 签名?< /a>) 如果您想将st_modtime
中的差异视为不匹配。我们还更改为使用更新的 pathlib 而不是 os 库。Based on @Mateusz Kobos currently accepted answer, it turns out that the second
filecmp.cmpfiles
withshallow=False
is not necessary, so we've removed it. One can getdirs_cmp.diff_files
from the firstdircmp
. A common misunderstanding (one that we made as well!) is thatdir_cmp
is shallow only and doesn't compare file contents! Turns out that is not true! The meaning ofshallow=True
is only to save time, and does not actually consider two files with differing last modification times to be different. If the last modified time is different between two files, it moves into reading each file's contents and comparing their contents. If contents are identical, then it's a match even if last modification date is different! We've added verbose prints here for added clarity. See elsewhere (filecmp.cmp() ignoring differing os.stat() signatures?) if you want to consider differences inst_modtime
to be considered a mismatch. We also changed to use newer pathlib instead of os library.