递归 diff 非常慢 - 检查目录内容
我正在两个目录上递归地运行差异,并有几个选项。这些目录有点大,但是,我试图使用 -q 选项查看文件夹内容的差异,而不是文件之间的差异(我使用这个对吗?)
我也尝试过 rsync dry run,这似乎花费同样的时间。输出通过 sed,我试过没有,它似乎没有任何影响。我也忽略隐藏文件。我想我可能错误地使用 diff -q 来比较两个目录的内容。
我使用了另一个技巧中的代码块来计算比较这些目录中的一个(1 个目录,14 个子目录)所需的时间,花了 88 分钟。然而,每个文件都是一个 30 分钟长的电视节目,所以如果 diff 正在比较这些文件,这是有道理的,但我认为 -q 会导致这种情况不会发生?
此外,一个目录通过 AFP 安装,一个目录是火线连接的外部驱动器。这并不重要,因为我在本地复制了两个目录,并且差异花费了相同的时间。
我有一个解决方案 - 我在两个目录上运行 ls -1 并比较输出 - 但为什么 diff 需要这么长时间才能运行?
这是代码;有什么建议吗?
#!/bin/bash
before="$(date +%s)"
diff -r -x '.*' /Volumes/directory1/ /Volumes/directory2/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory3/ /Volumes/directory4/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory5/ /Volumes/directory6/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory7/ /Volumes/directory8/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory9/ /Volumes/directory10/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory11/ /Volumes/directory12/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
after="$(date +%s)"
elapsed_seconds="$(expr $after - $before)"
echo Elapsed time for code block: $elapsed_seconds
I am running a diff on two directories, recursively, with a few options. The directories are somewhat large, however, I am trying to just see the differences in the contents of folders, not between the files, using the -q option (am i using this right?)
I have also tried rsync dry run, that seems to take equally as long. The output goes through sed, I have tried without, it doesn't seem to effect anything. I also ignore hidden files. I think I may be mis-using diff -q to just compare the contents of 2 directories.
I used a code block from another tip to time how long just comparing ONE of these directories was (1 directory, 14 subdirectories) and it took 88 minutes. However, every file was a 30 minutes long TV-show, so if diff is comparing these files, that makes sense, but I thought that -q would cause that to not happen?
Also, one directory is mounted over AFP, one is a firewire connected external drive. This doesn't matter, because I copied both directories locally and the diff took the same amount of time.
I have a solution to this - I ran ls -1 over both directories and diff'd the output - but why is diff taking so long to run?
Here is the code; any suggestions?
#!/bin/bash
before="$(date +%s)"
diff -r -x '.*' /Volumes/directory1/ /Volumes/directory2/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory3/ /Volumes/directory4/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory5/ /Volumes/directory6/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory7/ /Volumes/directory8/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory9/ /Volumes/directory10/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory11/ /Volumes/directory12/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
after="$(date +%s)"
elapsed_seconds="$(expr $after - $before)"
echo Elapsed time for code block: $elapsed_seconds
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当文件不同时,
diff
将能够相当快地找出答案。但是,当它们相同时,它必须完整扫描文件以验证它们确实是逐字节相同的。如果您只关心文件名的差异并且不想检查文件的内容,请尝试以下操作:
这假设您使用
-printf
操作进行 GNU find。如果你不这样做,请根据戈登的评论使用一些子shell魔法:When files are different
diff
will be able to figure that out fairly quickly. When they're the same, though, it has to scan the files in full to verify that they are indeed byte-for-byte identical.If all you care about is differences in file names and don't want to inspect the contents of the files, try something like:
This assumes you have GNU find with the
-printf
action. If you don't, use some subshell magic per Gordon's comment: