filecmp.cmp() 忽略不同的 os.stat() 签名?
除非给定shallow且为假,否则文件具有相同的
os.stat()
签名被视为相等。
这听起来像是两个除了 os.stat() 签名之外相同的文件将被视为不相等,但情况似乎并非如此,如运行以下代码片段所示:
import filecmp
import os
import shutil
import time
with open('test_file_1', 'w') as f:
f.write('file contents')
shutil.copy('test_file_1', 'test_file_2')
time.sleep(5) # pause to get a different time-stamp
os.utime('test_file_2', None) # change copied file's time-stamp
print 'test_file_1:', os.stat('test_file_1')
print 'test_file_2:', os.stat('test_file_2')
print 'filecmp.cmp():', filecmp.cmp('test_file_1', 'test_file_2')
输出:
test_file_1: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0,
st_uid=0, st_gid=0, st_size=13L, st_atime=1320719522L, st_mtime=1320720444L,
st_ctime=1320719522L)
test_file_2: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0,
st_uid=0, st_gid=0, st_size=13L, st_atime=1320720504L, st_mtime=1320720504L,
st_ctime=1320719539L)
filecmp.cmp(): True
正如您所看到的,两个文件的时间戳 — st_atime
、st_mtime
和 st_ctime
— 显然不相同,但filecmp.cmp()
表明两者是相同的。我是否误解了某些内容,或者 filecmp.cmp()
的实现或其文档中是否存在错误?
更新
Python 3 文档有已被改写,目前表示如下,恕我直言,这只是一种改进,因为它更好地暗示即使 shallow
为 True 时,具有不同时间戳的文件仍可能被视为相同。
如果shallow为true,则文件具有相同的
os.stat()
签名是 视为相等。否则,将比较文件的内容。
FWIW 我认为简单地说这样的话会更好:
如果shallow为true,则仅在以下情况下比较文件内容:
os.stat()
签名不相等。
The Python 2 docs for filecmp()
say:
Unless shallow is given and is false, files with identical
os.stat()
signatures are taken to be equal.
Which sounds like two files which are identical except for their os.stat()
signature will be considered unequal, however this does not seem to be the case, as illustrated by running the following code snippet:
import filecmp
import os
import shutil
import time
with open('test_file_1', 'w') as f:
f.write('file contents')
shutil.copy('test_file_1', 'test_file_2')
time.sleep(5) # pause to get a different time-stamp
os.utime('test_file_2', None) # change copied file's time-stamp
print 'test_file_1:', os.stat('test_file_1')
print 'test_file_2:', os.stat('test_file_2')
print 'filecmp.cmp():', filecmp.cmp('test_file_1', 'test_file_2')
Output:
test_file_1: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0,
st_uid=0, st_gid=0, st_size=13L, st_atime=1320719522L, st_mtime=1320720444L,
st_ctime=1320719522L)
test_file_2: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0,
st_uid=0, st_gid=0, st_size=13L, st_atime=1320720504L, st_mtime=1320720504L,
st_ctime=1320719539L)
filecmp.cmp(): True
As you can see the two files' time stamps — st_atime
, st_mtime
, and st_ctime
— are clearly not the same, yet filecmp.cmp()
indicates that the two are identical. Am I misunderstanding something or is there a bug in either filecmp.cmp()
's implementation or its documentation?
Update
The Python 3 documentation has been rephrased and currently says the following, which IMHO is an improvement only in the sense that it better implies that files with different time stamps might still be considered equal even when shallow
is True.
If shallow is true, files with identical
os.stat()
signatures are
taken to be equal. Otherwise, the contents of the files are compared.
FWIW I think it would have been better to simply have said something like this:
If shallow is true, file content is compared only when
os.stat()
signatures are unequal.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您误解了文档。第 2 行说:
具有相同
os.stat()
签名的文件被视为相同,但逻辑相反 不正确:具有不相等os.stat()
签名的文件不一定被视为不相等。相反,它们可能不相等,在这种情况下将比较实际的文件内容。由于发现文件内容相同,filecmp.cmp()
返回True
。根据第三个子句,一旦确定文件相等,它就会缓存该结果,并且如果您要求它再次比较相同的文件,则不会重新读取文件内容,只要这些文件的 < code>os.stat 结构不会改变。
You're misunderstanding the documentation. Line #2 says:
Files with identical
os.stat()
signatures are taken to be equal, but the logical inverse is not true: files with unequalos.stat()
signatures are not necessarily taken to be unequal. Rather, they may be unequal, in which case the actual file contents are compared. Since the file contents are found to be identical,filecmp.cmp()
returnsTrue
.As per the third clause, once it determines that the files are equal, it will cache that result and not bother re-reading the file contents if you ask it to compare the same files again, so long as those files'
os.stat
structures don't change.看来“自己动手”确实是产生理想结果所需要的。如果文档足够清晰,能让普通读者得出这个结论,那就太好了。
这是我目前使用的功能:
It seems that 'rolling your own' is indeed what is required to produce a desirable result. It would simply be nice if the documentation were clear enough to make a casual reader reach that conclusion.
Here's the function I am presently using: