diff 文件仅比较每行的前 n 个字符

发布于 2024-11-08 06:03:00 字数 341 浏览 7 评论 0原文

我有2个文件。我们将它们称为 md5s1.txt 和 md5s2.txt。两者都在不同的目录中包含命令的输出

find -type f -print0 | xargs -0 md5sum | sort > md5s.txt

。许多文件被重命名,但内容保持不变。因此,它们应该具有相同的 md5sum。我想生成一个 diff

diff md5s1.txt md5s2.txt

,但它应该只比较每行的前 32 个字符,即只比较 md5sum,而不是文件名。 md5sum 相等的行应被视为相等。输出应该是正常的 diff 格式。

I have got 2 files. Let us call them md5s1.txt and md5s2.txt. Both contain the output of a

find -type f -print0 | xargs -0 md5sum | sort > md5s.txt

command in different directories. Many files were renamed, but the content stayed the same. Hence, they should have the same md5sum. I want to generate a diff like

diff md5s1.txt md5s2.txt

but it should compare only the first 32 characters of each line, i.e. only the md5sum, not the filename. Lines with equal md5sum should be considered equal. The output should be in normal diff format.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

初心 2024-11-15 06:03:00

简单的入门:

diff <(cut -d' ' -f1 md5s1.txt)  <(cut -d' ' -f1 md5s2.txt)

另外,考虑一下

diff -EwburqN folder1/ folder2/

Easy starter:

diff <(cut -d' ' -f1 md5s1.txt)  <(cut -d' ' -f1 md5s2.txt)

Also, consider just

diff -EwburqN folder1/ folder2/
も星光 2024-11-15 06:03:00

<(cut -c -32 md5sums.sort.XXX) 上使用 diff 仅比较 md5 列,并告诉 diff 仅打印添加或删除的行的行号,使用 --old/new-line-format='%dn'$'\n'。将其通过管道传输到 ed md5sums.sort.XXX 中,以便它仅打印 md5sums.sort.XXX 文件中的那些行。

diff \
    --new-line-format='%dn'

ed 的问题是它会将整个文件加载到内存中,如果您有很多校验和,这可能会成为问题。不要将 diff 的输出通过管道传送到 ed 中,而是将其传送到以下命令中,这将使用更少的内存。

diff … | (
    lnum=0;
    while read lprint; do
        while [ $lnum -lt $lprint ]; do read line <&3; ((lnum++)); done;
        echo $line;
    done
) 3<md5sums.sort.XXX
\n' \ --old-line-format='' \ --unchanged-line-format='' \ <(cut -c -32 md5sums.sort.old) \ <(cut -c -32 md5sums.sort.new) \ | ed md5sums.sort.new \ > files-added diff \ --new-line-format='' \ --old-line-format='%dn'

ed 的问题是它会将整个文件加载到内存中,如果您有很多校验和,这可能会成为问题。不要将 diff 的输出通过管道传送到 ed 中,而是将其传送到以下命令中,这将使用更少的内存。


\n' \
    --unchanged-line-format='' \
    <(cut -c -32 md5sums.sort.old) \
    <(cut -c -32 md5sums.sort.new) \
    | ed md5sums.sort.old \
    > files-removed

ed 的问题是它会将整个文件加载到内存中,如果您有很多校验和,这可能会成为问题。不要将 diff 的输出通过管道传送到 ed 中,而是将其传送到以下命令中,这将使用更少的内存。

Compare only the md5 column using diff on <(cut -c -32 md5sums.sort.XXX), and tell diff to print just the line numbers of added or removed lines, using --old/new-line-format='%dn'$'\n'. Pipe this into ed md5sums.sort.XXX so it will print only those lines from the md5sums.sort.XXX file.

diff \
    --new-line-format='%dn'

The problem with ed is that it will load the entire file into memory, which can be a problem if you have a lot of checksums. Instead of piping the output of diff into ed, pipe it into the following command, which will use much less memory.

diff … | (
    lnum=0;
    while read lprint; do
        while [ $lnum -lt $lprint ]; do read line <&3; ((lnum++)); done;
        echo $line;
    done
) 3<md5sums.sort.XXX
\n' \ --old-line-format='' \ --unchanged-line-format='' \ <(cut -c -32 md5sums.sort.old) \ <(cut -c -32 md5sums.sort.new) \ | ed md5sums.sort.new \ > files-added diff \ --new-line-format='' \ --old-line-format='%dn'

The problem with ed is that it will load the entire file into memory, which can be a problem if you have a lot of checksums. Instead of piping the output of diff into ed, pipe it into the following command, which will use much less memory.


\n' \
    --unchanged-line-format='' \
    <(cut -c -32 md5sums.sort.old) \
    <(cut -c -32 md5sums.sort.new) \
    | ed md5sums.sort.old \
    > files-removed

The problem with ed is that it will load the entire file into memory, which can be a problem if you have a lot of checksums. Instead of piping the output of diff into ed, pipe it into the following command, which will use much less memory.

很糊涂小朋友 2024-11-15 06:03:00

如果您正在寻找重复的文件,fdupes 可以为您执行此操作:

$ fdupes --recurse

在 ubuntu 上,您可以通过执行以下操作来安装它

$ apt-get install fdupes

If you are looking for duplicate files fdupes can do this for you:

$ fdupes --recurse

On ubuntu you can install it by doing

$ apt-get install fdupes
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文