使用 python 检查两个文件的新行独立标识的最佳方法

发布于 2024-09-10 03:38:32 字数 148 浏览 2 评论 0原文

我尝试过

filecmp.cmp(file1,file2)

，但它不起作用，因为除了换行符之外，文件是相同的。 filecmp 或其他一些方便的函数/库中是否有一个选项，或者我是否必须逐行读取这两个文件并进行比较？

原文

I tried

filecmp.cmp(file1,file2)

but it doesn't work since files are identically except for new line characters. Is there an option for that in filecmp or some other convenience function/library or do I have to read both files line by line and compare those?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

╰つ倒转 2024-09-17 03:38:32

我认为像这样的简单便利功能应该可以完成这项工作：

from itertools import izip

def areFilesIdentical(filename1, filename2):
    with open(filename1, "rtU") as a:
        with open(filename2, "rtU") as b:
            # Note that "all" and "izip" are lazy
            # (will stop at the first line that's not identical)
            return all(myprint() and lineA == lineB
                       for lineA, lineB in izip(a.xreadlines(), b.xreadlines()))

I think a simple convenience function like this should do the job:

from itertools import izip

def areFilesIdentical(filename1, filename2):
    with open(filename1, "rtU") as a:
        with open(filename2, "rtU") as b:
            # Note that "all" and "izip" are lazy
            # (will stop at the first line that's not identical)
            return all(myprint() and lineA == lineB
                       for lineA, lineB in izip(a.xreadlines(), b.xreadlines()))

回复收藏 0 原文

凡尘雨 2024-09-17 03:38:32

尝试 difflib 模块 - 它提供用于比较序列的类和函数。

根据您的需要，difflib.Differ 课程看起来很有趣。

类 difflib.Differ
这是一个用于比较文本行序列并产生人类可读的差异或增量的类。 Differ 使用 SequenceMatcher 来比较行序列，以及比较相似（接近匹配）行内的字符序列。

请参阅比较两个文本的差异示例。被比较的序列也可以从类文件对象的 readlines() 方法获得。

回复收藏 0 原文

汹涌人海 2024-09-17 03:38:32

filecmp.cmp() 的源代码具有以下用于比较部分：

BUFSIZE = 8*1024

def _do_cmp(f1, f2):
    bufsize = BUFSIZE
    with open(f1, 'rb') as fp1, open(f2, 'rb') as fp2:
        while True:
            b1 = fp1.read(bufsize)
            b2 = fp2.read(bufsize)
            if b1 != b2:
                return False
            if not b1:
                return True

我将其修改为：

def universal_filecmp(f1, f2):
    with open(f1, 'r') as fp1, open(f2, 'r') as fp2:
        while True:
            b1 = fp1.readline()
            b2 = fp2.readline()
            if b1 != b2:
                return False
            if not b1:
                return True

对于以读取模式打开的 Python 3，会自动为您转换换行符。对于旧版本，您可以将“U”添加到模式中。我在我正在开发的一个包的测试台上测试了这段代码，它似乎有效。

The source code for filecmp.cmp() has this for the comparison part:

BUFSIZE = 8*1024

def _do_cmp(f1, f2):
    bufsize = BUFSIZE
    with open(f1, 'rb') as fp1, open(f2, 'rb') as fp2:
        while True:
            b1 = fp1.read(bufsize)
            b2 = fp2.read(bufsize)
            if b1 != b2:
                return False
            if not b1:
                return True

I modified that to make:

def universal_filecmp(f1, f2):
    with open(f1, 'r') as fp1, open(f2, 'r') as fp2:
        while True:
            b1 = fp1.readline()
            b2 = fp2.readline()
            if b1 != b2:
                return False
            if not b1:
                return True

For Python 3 opening in read mode automatically converts newlines for you. For older versions you can add 'U' to the mode. I tested this code in a test bench for a package I am working on and it seems to work.

回复收藏 0 原文

冰葑 2024-09-17 03:38:32

看起来您只需要检查文件是否相同或不忽略空格/换行符。

您可以使用这样的函数，

def do_cmp(f1, f2):
    bufsize = 8*1024
    fp1 = open(f1, 'rb')
    fp2 = open(f2, 'rb')
    while True:
        b1 = fp1.read(bufsize)
        b2 = fp2.read(bufsize)
        if not is_same(b1, b2):
            return False
        if not b1:
            return True

def is_same(text1, text2):
    return text1.replace("\n","") == text2.replace("\n","")

您可以改进 is_same ，以便它根据您的要求进行匹配，例如您也可以忽略大小写。

Looks like you just need to check if files are same or not ignoring whitespace/newlines.

You can use a function like this

def do_cmp(f1, f2):
    bufsize = 8*1024
    fp1 = open(f1, 'rb')
    fp2 = open(f2, 'rb')
    while True:
        b1 = fp1.read(bufsize)
        b2 = fp2.read(bufsize)
        if not is_same(b1, b2):
            return False
        if not b1:
            return True

def is_same(text1, text2):
    return text1.replace("\n","") == text2.replace("\n","")

you can improve is_same so that it matches according to your requirements e.g. you may ignore case too.

回复收藏 0 原文

~没有更多了~