递归地比较目录,忽略所有二进制文件
在 Fedora Constantine 盒子上工作。我希望递归地diff
两个目录来检查源代码更改。由于项目的设置(在我自己参与该项目之前!叹息),这些目录包含源代码和二进制文件,以及大型二进制数据集。虽然 diff 最终可以在这些目录上工作,但如果我可以忽略二进制文件,可能需要二十秒的时间。
据我了解, diff 没有“忽略二进制文件”模式,但有一个忽略参数,该参数将忽略文件中的正则表达式。我不知道在那里写什么来忽略二进制文件,无论扩展名如何。
我正在使用以下命令,但它不会忽略二进制文件。有谁知道如何修改这个命令来做到这一点?
diff -rq dir1 dir2
Working on a Fedora Constantine box. I am looking to diff
two directories recursively to check for source changes. Due to the setup of the project (prior to my own engagement with said project! sigh), the directories contain both source and binaries, as well as large binary datasets. While diffing eventually works on these directories, it would take perhaps twenty seconds if I could ignore the binary files.
As far as I understand, diff does not have an 'ignore binary file' mode, but does have an ignore argument which will ignore regular expression within a file. I don't know what to write there to ignore binary files, regardless of extension.
I'm using the following command, but it does not ignore binary files. Does anyone know how to modify this command to do this?
diff -rq dir1 dir2
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
有点作弊,但这是我使用的:
递归比较 dir1 和 dir2,sed 删除二进制文件的行(以“二进制文件”开头),然后将其重定向到输出文件。
Kind of cheating but here's what I used:
This recursively compares dir1 to dir2, sed removes the lines for binary files(begins with "Binary files "), then it's redirected to the outputfile.
也许使用 grep -I(相当于 grep --binary-files=without-match)作为过滤器来排序二进制文件。
Maybe use
grep -I
(which is equivalent togrep --binary-files=without-match
) as a filter to sort out binary files.我来到这个(老)问题寻找类似的东西(遗留生产服务器上的配置文件与默认的apache安装相比)。按照@fearlesstost 在评论中的建议,
git
足够轻量且快速,它可能比上述任何建议都更简单。 复制 version1 到新目录。然后执行以下操作:现在删除该目录中版本 1 的所有文件,并将版本 2 复制到该目录中。现在做:
这将向您显示第一次提交和第二次提交之间所有差异的 Git 版本。对于二进制文件,它只会说它们不同。或者,您可以为每个版本创建一个分支,并尝试使用 git 的合并工具来合并它们。
I came to this (old) question looking for something similar (Config files on a legacy production server compared to default apache installation). Following @fearlesstost's suggestion in the comments,
git
is sufficiently lightweight and fast that it's probably more straightforward than any of the above suggestions. Copy version1 to a new directory. Then do:Now delete all the files from version 1 in this directory and copy version 2 into the directory. Now do:
This will show you Git's version of all the differences between the first commit and the second. For binary files it will just say that they differ. Alternatively, you could create a branch for each version and try to merge them using git's merge tools.
如果项目中的二进制文件的名称像通常一样遵循特定模式(
*.o
、*.so
...),则可以将这些文件中的模式并使用-X
(连字符 X)指定它。的内容
我的
exclude_file
命令: 更新:
可以使用
-x
代替-X
来指定排除模式在命令行上而不是在文件中:If the names of the binary files in your project follow a specific pattern (
*.o
,*.so
, ...) as they usually do, you can put those patterns in a file and specify it using-X
(hyphen X).Contents of my
exclude_file
Command:
UPDATE:
-x
can be used instead of-X
, to specify exclusion patterns on the command line rather than in a file:结合使用
find
和file
命令。这需要您对目录中file
命令的输出进行一些研究;下面我假设您想要比较的文件被报告为 ascii。或者,使用 grep -v 过滤掉二进制文件。由于您可能知道大型二进制文件的名称,因此将它们放入哈希数组中,并且仅当文件不在哈希中时才进行比较,如下所示:
Use a combination of
find
and thefile
command. This requires you to do some research on the output of thefile
command in your directory; below I'm assuming that the files you want to diff is reported as ascii. OR, usegrep -v
to filter out the binary files.Since you probably know the names of the huge binaries, place them in a hash-array and only do the diff when a file is not in the hash,something like this:
好吧,作为一种粗略的检查,您可以忽略与 /\0/ 匹配的文件。
Well, as a crude sort of check, you could ignore files that match /\0/.