递归地比较目录,忽略所有二进制文件

发布于 2024-11-24 08:35:28 字数 357 浏览 4 评论 0原文

在 Fedora Constantine 盒子上工作。我希望递归地diff两个目录来检查源代码更改。由于项目的设置(在我自己参与该项目之前!叹息),这些目录包含源代码和二进制文件,以及大型二进制数据集。虽然 diff 最终可以在这些目录上工作,但如果我可以忽略二进制文件,可能需要二十秒的时间。

据我了解, diff 没有“忽略二进制文件”模式,但有一个忽略参数,该参数将忽略文件中的正则表达式。我不知道在那里写什么来忽略二进制文件,无论扩展名如何。

我正在使用以下命令,但它不会忽略二进制文件。有谁知道如何修改这个命令来做到这一点?

diff -rq dir1 dir2

Working on a Fedora Constantine box. I am looking to diff two directories recursively to check for source changes. Due to the setup of the project (prior to my own engagement with said project! sigh), the directories contain both source and binaries, as well as large binary datasets. While diffing eventually works on these directories, it would take perhaps twenty seconds if I could ignore the binary files.

As far as I understand, diff does not have an 'ignore binary file' mode, but does have an ignore argument which will ignore regular expression within a file. I don't know what to write there to ignore binary files, regardless of extension.

I'm using the following command, but it does not ignore binary files. Does anyone know how to modify this command to do this?

diff -rq dir1 dir2

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

Saygoodbye 2024-12-01 08:35:28

有点作弊,但这是我使用的:

diff -r dir1/ dir2/ | sed '/Binary\ files\ /d' >outputfile

递归比较 dir1 和 dir2,sed 删除二进制文件的行(以“二进制文件”开头),然后将其重定向到输出文件。

Kind of cheating but here's what I used:

diff -r dir1/ dir2/ | sed '/Binary\ files\ /d' >outputfile

This recursively compares dir1 to dir2, sed removes the lines for binary files(begins with "Binary files "), then it's redirected to the outputfile.

最偏执的依靠 2024-12-01 08:35:28

也许使用 grep -I(相当于 grep --binary-files=without-match)作为过滤器来排序二进制文件。

dir1='folder-1'
dir2='folder-2'
IFS=
\n'
for file in $(grep -Ilsr -m 1 '.' "$dir1"); do
   diff -q "$file" "${file/${dir1}/${dir2}}"
done

Maybe use grep -I (which is equivalent to grep --binary-files=without-match) as a filter to sort out binary files.

dir1='folder-1'
dir2='folder-2'
IFS=
\n'
for file in $(grep -Ilsr -m 1 '.' "$dir1"); do
   diff -q "$file" "${file/${dir1}/${dir2}}"
done
寒冷纷飞旳雪 2024-12-01 08:35:28

我来到这个(老)问题寻找类似的东西(遗留生产服务器上的配置文件与默认的apache安装相比)。按照@fearlesstost 在评论中的建议,git 足够轻量且快速,它可能比上述任何建议都更简单。 复制 version1 到新目录。然后执行以下操作:

git init
git add .
git commit -m 'Version 1'

现在删除该目录中版本 1 的所有文件,并将版本 2 复制到该目录中。现在做:

git add .
git commit -m 'Version 2'
git show

这将向您显示第一次提交和第二次提交之间所有差异的 Git 版本。对于二进制文件,它只会说它们不同。或者,您可以为每个版本创建一个分支,并尝试使用 git 的合并工具来合并它们。

I came to this (old) question looking for something similar (Config files on a legacy production server compared to default apache installation). Following @fearlesstost's suggestion in the comments, git is sufficiently lightweight and fast that it's probably more straightforward than any of the above suggestions. Copy version1 to a new directory. Then do:

git init
git add .
git commit -m 'Version 1'

Now delete all the files from version 1 in this directory and copy version 2 into the directory. Now do:

git add .
git commit -m 'Version 2'
git show

This will show you Git's version of all the differences between the first commit and the second. For binary files it will just say that they differ. Alternatively, you could create a branch for each version and try to merge them using git's merge tools.

○闲身 2024-12-01 08:35:28

如果项目中的二进制文件的名称像通常一样遵循特定模式(*.o*.so...),则可以将这些文件中的模式并使用 -X (连字符 X)指定它。

的内容

*.o
*.so
*.git

我的 exclude_file命令

diff -X exclude_file -r . other_tree > my_diff_file

更新:

可以使用 -x 代替 -X 来指定排除模式在命令行上而不是在文件中:

diff -r -x '*.o' -x '*.so' -x '*.git' dir1 dir2

If the names of the binary files in your project follow a specific pattern (*.o, *.so, ...) as they usually do, you can put those patterns in a file and specify it using -X (hyphen X).

Contents of my exclude_file

*.o
*.so
*.git

Command:

diff -X exclude_file -r . other_tree > my_diff_file

UPDATE:

-x can be used instead of -X, to specify exclusion patterns on the command line rather than in a file:

diff -r -x '*.o' -x '*.so' -x '*.git' dir1 dir2
酒几许 2024-12-01 08:35:28

结合使用 findfile 命令。这需要您对目录中 file 命令的输出进行一些研究;下面我假设您想要比较的文件被报告为 ascii。或者,使用 grep -v 过滤掉二进制文件。

#!/bin/bash

dir1=/path/to/first/folder
dir2=/path/to/second/folder

cd $dir1
files=$(find . -type f -print | xargs file | grep ASCII | cut -d: -f1)

for i in $files;
do
    echo diffing $i ---- $dir2/$i
    diff -q $i $dir2/$i
done

由于您可能知道大型二进制文件的名称,因此将它们放入哈希数组中,并且仅当文件不在哈希中时才进行比较,如下所示:

#!/bin/bash

dir1=/path/to/first/directory
dir2=/path/to/second/directory

content_dir1=$(mktemp)
content_dir2=$(mktemp)

$(cd $dir1 && find . -type f -print > $content_dir1)
$(cd $dir2 && find . -type f -print > $content_dir2)

echo Files that only exist in one of the paths
echo -----------------------------------------
diff $content_dir1 $content_dir2    

#Files 2 Ignore
declare -A F2I
F2I=( [sqlite3]=1 [binfile2]=1 )

while read f;
do
    b=$(basename $f)
    if ! [[ ${F2I[$b]} ]]; then
        diff $dir1/$f $dir2/$f
    fi
done < $content_dir1

Use a combination of find and the file command. This requires you to do some research on the output of the file command in your directory; below I'm assuming that the files you want to diff is reported as ascii. OR, use grep -v to filter out the binary files.

#!/bin/bash

dir1=/path/to/first/folder
dir2=/path/to/second/folder

cd $dir1
files=$(find . -type f -print | xargs file | grep ASCII | cut -d: -f1)

for i in $files;
do
    echo diffing $i ---- $dir2/$i
    diff -q $i $dir2/$i
done

Since you probably know the names of the huge binaries, place them in a hash-array and only do the diff when a file is not in the hash,something like this:

#!/bin/bash

dir1=/path/to/first/directory
dir2=/path/to/second/directory

content_dir1=$(mktemp)
content_dir2=$(mktemp)

$(cd $dir1 && find . -type f -print > $content_dir1)
$(cd $dir2 && find . -type f -print > $content_dir2)

echo Files that only exist in one of the paths
echo -----------------------------------------
diff $content_dir1 $content_dir2    

#Files 2 Ignore
declare -A F2I
F2I=( [sqlite3]=1 [binfile2]=1 )

while read f;
do
    b=$(basename $f)
    if ! [[ ${F2I[$b]} ]]; then
        diff $dir1/$f $dir2/$f
    fi
done < $content_dir1
冷情妓 2024-12-01 08:35:28

好吧,作为一种粗略的检查,您可以忽略与 /\0/ 匹配的文件。

Well, as a crude sort of check, you could ignore files that match /\0/.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文