diff 文件仅比较每行的前 n 个字符
我有2个文件。我们将它们称为 md5s1.txt 和 md5s2.txt。两者都在不同的目录中包含命令的输出
find -type f -print0 | xargs -0 md5sum | sort > md5s.txt
。许多文件被重命名,但内容保持不变。因此,它们应该具有相同的 md5sum。我想生成一个 diff
diff md5s1.txt md5s2.txt
,但它应该只比较每行的前 32 个字符,即只比较 md5sum,而不是文件名。 md5sum 相等的行应被视为相等。输出应该是正常的 diff 格式。
I have got 2 files. Let us call them md5s1.txt and md5s2.txt. Both contain the output of a
find -type f -print0 | xargs -0 md5sum | sort > md5s.txt
command in different directories. Many files were renamed, but the content stayed the same. Hence, they should have the same md5sum. I want to generate a diff like
diff md5s1.txt md5s2.txt
but it should compare only the first 32 characters of each line, i.e. only the md5sum, not the filename. Lines with equal md5sum should be considered equal. The output should be in normal diff format.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
简单的入门:
另外,考虑一下
Easy starter:
Also, consider just
在
<(cut -c -32 md5sums.sort.XXX)
上使用diff
仅比较 md5 列,并告诉diff
仅打印添加或删除的行的行号,使用--old/new-line-format='%dn'$'\n'
。将其通过管道传输到ed md5sums.sort.XXX
中,以便它仅打印md5sums.sort.XXX
文件中的那些行。ed
的问题是它会将整个文件加载到内存中,如果您有很多校验和,这可能会成为问题。不要将 diff 的输出通过管道传送到 ed 中,而是将其传送到以下命令中,这将使用更少的内存。Compare only the md5 column using
diff
on<(cut -c -32 md5sums.sort.XXX)
, and telldiff
to print just the line numbers of added or removed lines, using--old/new-line-format='%dn'$'\n'
. Pipe this intoed md5sums.sort.XXX
so it will print only those lines from themd5sums.sort.XXX
file.The problem with
ed
is that it will load the entire file into memory, which can be a problem if you have a lot of checksums. Instead of piping the output of diff intoed
, pipe it into the following command, which will use much less memory.如果您正在寻找重复的文件,fdupes 可以为您执行此操作:
在 ubuntu 上,您可以通过执行以下操作来安装它
If you are looking for duplicate files fdupes can do this for you:
On ubuntu you can install it by doing