当前位置：文江博客话题详情

检查文件是否相等

发布于 2024-10-04 11:58:50 字数 63 浏览 3 评论 0原文

在Python中检查文件是否相等的最优雅的方法是什么？校验和？字节比较？认为文件不会大于 100-200 MB

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花开雨落又逢春i 2024-10-11 11:58:50

filecmp 模块怎么样？它可以通过多种不同的方式进行文件比较，并进行不同的权衡。

更好的是，它是标准库的一部分：

http://docs.python.org/library /filecmp.html

回复收藏 0 原文

肥爪爪 2024-10-11 11:58:50

使用hashlib获取每个文件的md5，并比较结果。

#! /bin/env python
import hashlib
def filemd5(filename, block_size=2**20):
    f = open(filename)
    md5 = hashlib.md5()
    while True:
        data = f.read(block_size)
        if not data:
            break
        md5.update(data)
    f.close()
    return md5.digest()

if __name__ == "__main__":
    a = filemd5('/home/neo/todo')
    b = filemd5('/home/neo/todo2')
    print(a == b)

更新：从 Python 2.1 开始，有一个 filecmp 模块做你想要的，并且也有比较目录的方法。 我从来不知道这个模块，我自己仍在学习 Python :-)

>>> import filecmp
>>> filecmp.cmp('undoc.rst', 'undoc.rst')
True
>>> filecmp.cmp('undoc.rst', 'index.rst')
False

use hashlib to get the md5 of each file, and compare the results.

#! /bin/env python
import hashlib
def filemd5(filename, block_size=2**20):
    f = open(filename)
    md5 = hashlib.md5()
    while True:
        data = f.read(block_size)
        if not data:
            break
        md5.update(data)
    f.close()
    return md5.digest()

if __name__ == "__main__":
    a = filemd5('/home/neo/todo')
    b = filemd5('/home/neo/todo2')
    print(a == b)

Update: As of Python 2.1 there is a filecmp module that does just what you want, and has methods to compare directories too. I never knew about this module, I'm still learning Python myself :-)

>>> import filecmp
>>> filecmp.cmp('undoc.rst', 'undoc.rst')
True
>>> filecmp.cmp('undoc.rst', 'index.rst')
False

回复收藏 0 原文

ぇ气 2024-10-11 11:58:50

好吧，这可能需要两个单独的答案。

如果您有许多文件要比较，请查找校验和并缓存每个文件的校验和。可以肯定的是，之后逐字节比较匹配的文件。

如果您只有两个文件，请直接进行字节比较，因为无论如何您都必须读取文件来计算校验和。

在这两种情况下，都使用文件大小作为检查不平等的早期方法。

回复收藏 0 原文

负佳期 2024-10-11 11:58:50

在尝试任何其他解决方案之前，您可能需要对这两个文件执行 os.path.getsize(...) 操作。
如果不同，则无需比较字节或计算校验和。

当然，这只有在文件大小不固定的情况下才有用。

例子：

def foo(f1, f2):
    if not os.path.getsize(f1) == os.path.getsize(f2):
        return False # Or similar

    ... # Checksumming / byte-comparing / whatever

Before attempting any of the other solutions, you might want to do os.path.getsize(...) on both files.
If that differs, there is no need to compare bytes or calculate checksum.

Of course, this only helps if the filesize isn't fixed.

Example:

def foo(f1, f2):
    if not os.path.getsize(f1) == os.path.getsize(f2):
        return False # Or similar

    ... # Checksumming / byte-comparing / whatever

回复收藏 0 原文

儭儭莪哋寶赑 2024-10-11 11:58:50

我会使用 MD5（例如）进行校验和，而不是字节比较加上日期检查，并取决于您需要的名称检查。

回复收藏 0 原文

终难遇 2024-10-11 11:58:50

去 cmp 怎么样？

import commands
status, output = commands.getstatusoutput("/usr/bin/cmp file1 file2")
if (status == 0):
  print "files are same"
elif (status == 1):
  print "files differ"
else:
  print "uh oh!"

What about shelling out to cmp?

import commands
status, output = commands.getstatusoutput("/usr/bin/cmp file1 file2")
if (status == 0):
  print "files are same"
elif (status == 1):
  print "files differ"
else:
  print "uh oh!"

回复收藏 0 原文

~没有更多了~