filecmp.cmp() 忽略不同的 os.stat() 签名?

发布于 2024-12-14 08:18:14 字数 2162 浏览 4 评论 0原文

filecmp() 说:

除非给定shallow且为假,否则文件具有相同的os.stat() 签名被视为相等。

这听起来像是两个除了 os.stat() 签名之外相同的文件将被视为不相等,但情况似乎并非如此,如运行以下代码片段所示:

import filecmp
import os
import shutil
import time

with open('test_file_1', 'w') as f:
    f.write('file contents')
shutil.copy('test_file_1', 'test_file_2')
time.sleep(5)  # pause to get a different time-stamp
os.utime('test_file_2', None)  # change copied file's time-stamp

print 'test_file_1:', os.stat('test_file_1')
print 'test_file_2:', os.stat('test_file_2')
print 'filecmp.cmp():', filecmp.cmp('test_file_1', 'test_file_2')

输出:

test_file_1: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0,
  st_uid=0, st_gid=0, st_size=13L, st_atime=1320719522L, st_mtime=1320720444L, 
  st_ctime=1320719522L)
test_file_2: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0, 
  st_uid=0, st_gid=0, st_size=13L, st_atime=1320720504L, st_mtime=1320720504L, 
  st_ctime=1320719539L)
filecmp.cmp(): True

正如您所看到的,两个文件的时间戳 — st_atimest_mtimest_ctime — 显然不相同,但filecmp.cmp() 表明两者是相同的。我是否误解了某些内容,或者 filecmp.cmp() 的实现或其文档中是否存在错误?

更新

Python 3 文档有已被改写,目前表示如下,恕我直言,这只是一种改进,因为它更好地暗示即使 shallow 为 True 时,具有不同时间戳的文件仍可能被视为相同。

如果shallow为true,则文件具有相同的os.stat() 签名是 视为相等。否则,将比较文件的内容。

FWIW 我认为简单地说这样的话会更好:

如果shallow为true,则仅在以下情况下比较文件内容: os.stat() 签名不相等。

The Python 2 docs for filecmp() say:

Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal.

Which sounds like two files which are identical except for their os.stat() signature will be considered unequal, however this does not seem to be the case, as illustrated by running the following code snippet:

import filecmp
import os
import shutil
import time

with open('test_file_1', 'w') as f:
    f.write('file contents')
shutil.copy('test_file_1', 'test_file_2')
time.sleep(5)  # pause to get a different time-stamp
os.utime('test_file_2', None)  # change copied file's time-stamp

print 'test_file_1:', os.stat('test_file_1')
print 'test_file_2:', os.stat('test_file_2')
print 'filecmp.cmp():', filecmp.cmp('test_file_1', 'test_file_2')

Output:

test_file_1: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0,
  st_uid=0, st_gid=0, st_size=13L, st_atime=1320719522L, st_mtime=1320720444L, 
  st_ctime=1320719522L)
test_file_2: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0, 
  st_uid=0, st_gid=0, st_size=13L, st_atime=1320720504L, st_mtime=1320720504L, 
  st_ctime=1320719539L)
filecmp.cmp(): True

As you can see the two files' time stamps — st_atime, st_mtime, and st_ctime— are clearly not the same, yet filecmp.cmp() indicates that the two are identical. Am I misunderstanding something or is there a bug in either filecmp.cmp()'s implementation or its documentation?

Update

The Python 3 documentation has been rephrased and currently says the following, which IMHO is an improvement only in the sense that it better implies that files with different time stamps might still be considered equal even when shallow is True.

If shallow is true, files with identical os.stat() signatures are
taken to be equal. Otherwise, the contents of the files are compared.

FWIW I think it would have been better to simply have said something like this:

If shallow is true, file content is compared only when
os.stat()
signatures are unequal.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

紫南 2024-12-21 08:18:14

您误解了文档。第 2 行说:

除非给出了 shallow 并且为 false,否则具有相同 os.stat() 签名的文件将被视为相等。

具有相同 os.stat() 签名的文件被视为相同,但逻辑相反 不正确:具有不相等 os.stat() 签名的文件不一定被视为不相等。相反,它们可能不相等,在这种情况下将比较实际的文件内容。由于发现文件内容相同,filecmp.cmp() 返回True

根据第三个子句,一旦确定文件相等,它就会缓存该结果,并且如果您要求它再次比较相同的文件,则不会重新读取文件内容,只要这些文件的 < code>os.stat 结构不会改变

You're misunderstanding the documentation. Line #2 says:

Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal.

Files with identical os.stat() signatures are taken to be equal, but the logical inverse is not true: files with unequal os.stat() signatures are not necessarily taken to be unequal. Rather, they may be unequal, in which case the actual file contents are compared. Since the file contents are found to be identical, filecmp.cmp() returns True.

As per the third clause, once it determines that the files are equal, it will cache that result and not bother re-reading the file contents if you ask it to compare the same files again, so long as those files' os.stat structures don't change.

冷情妓 2024-12-21 08:18:14

看来“自己动手”确实是产生理想结果所需要的。如果文档足够清晰,能让普通读者得出这个结论,那就太好了。

这是我目前使用的功能:

def cmp_stat_weak(a, b):
    sa = os.stat(a)
    sb = os.stat(b)
    return (sa.st_size == sb.st_size and sa.st_mtime == sb.st_mtime)

It seems that 'rolling your own' is indeed what is required to produce a desirable result. It would simply be nice if the documentation were clear enough to make a casual reader reach that conclusion.

Here's the function I am presently using:

def cmp_stat_weak(a, b):
    sa = os.stat(a)
    sb = os.stat(b)
    return (sa.st_size == sb.st_size and sa.st_mtime == sb.st_mtime)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文