两个大目录的比较

发布于 2024-07-14 14:45:54 字数 440 浏览 3 评论 0原文

我有一个很大的目录,只包含计算机科学和数学方面的内容。 它的大小超过 16GB。 类型有text、png、pdf和chm。 我目前有两个分行:一个是我哥哥的分行,一个是我的分行。 初始文件是相同的。 我需要对它们进行比较。 我尝试过使用Git,但加载时间很长。

比较两个大目录的最佳方法是什么?

[混合解决方案]

  1. 在两个目录中执行“ls -R > different_files”[1]
  2. “sdiff <( echo file1 | md5deep) <(echo file2 | md5deep)" [2]

你觉得怎么样? 有什么缺点吗?

[1] 感谢 Paul Tomblin [2]非常感谢所有回复者!

I have a large directory that contains only stuff in CS and Math. It is over 16GB in size. The types are text, png, pdf and chm. I have currently two branches: a branch of my brother's and mine. The initial files were the same. I need to compare them. I have tried to use Git, but there is a long loading time.

What is the best way to compare two big directories?

[Mixed Solution]

  1. Do a "ls -R > different_files" in both directories [1]
  2. "sdiff <(echo file1 | md5deep) <(echo file2 | md5deep)" [2]

What do you think? Any drawbacks?

[1] thanks to Paul Tomblin
[2] great thanks to all repliers!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

给不了的爱 2024-07-21 14:45:54

使用 fslint:网站。 该工具的选项之一是“重复”。 根据该网站的描述:
FSlint 最常用的功能之一是查找重复文件的能力。 从硬盘驱动器中删除 lint 的最简单方法是丢弃可能存在的任何重复文件。 通常,计算机用户可能不知道他们的音乐收藏中以不同的名称或目录有四、五或更多个完全相同的歌曲的副本。 任何文件类型,无论是音乐、照片还是工作文档,都可以轻松地在计算机上复制和复制。 当收集重复项时,它们会占用可用的硬盘空间。 FSlint 提供的第一个菜单选项允许您查找并删除这些重复文件。

Use fslint: website. One of the options of the tool is "Duplicates". As per the description from the site:
One of the most commonly used features of FSlint is the ability to find duplicate files. The easiest way to remove lint from a hard drive is to discard any duplicate files that may exist. Often a computer user may not know that they have four, five, or more copies of the exact same song in their music collection under different names or directories. Any file type whether it be music, photos, or work documents can easily be copied and replicated on your computer. As the duplicates are collected, they eat away at the available hard drive space. The first menu option offered by FSlint allows you to find and remove these duplicate files.

月光色 2024-07-21 14:45:54

如何在没有预先存在的命令/产品的情况下比较 2 个文件夹:

只需创建一个程序来扫描每个目录并创建每个文件的文件哈希。 它输出一个包含每个相对文件路径和文件哈希的文件。

在两个文件夹上运行该程序。

然后,您只需比较两个输出文件,看看它们是否相同。 要比较这两个文件,只需将它们加载到字符串中并进行字符串比较即可。

您使用的哈希算法并不重要。 您可以使用 MD5、SHA、CRC...
您还可以使用输出文件中的文件大小来帮助减少冲突的可能性。

如何将 2 个文件夹与预先存在的命令/产品进行比较:

现在,如果您只想要一个执行此操作的程序,请使用 diff -rWindiff 用于基于 Windows 的系统。

How to compare 2 folders without pre-existing commands/products:

Simply create a program that scans each directory and creates a file hash of each file. It outputs a file with each relative file path and the file hash.

Run this program on both folders.

Then you simply compare the 2 output files to see if they are the same. To compare those 2 files you just load them into a string and do a string compare.

The hashing algorithm you use doesn't matter. You can use MD5, SHA, CRC, ...
You could also use the file size in the output files to help reduce the chance of collisions.

How to compare 2 folders with pre-existing commands/products:

Now if you just want a program that does it, use diff -r or windiff for windows based systems.

后知后觉 2024-07-21 14:45:54

使用 md5deep 创建这些目录中每个文件的递归 md5sum 列表。

您可以使用 diff 工具来比较生成的列表。

Use md5deep to create recursive md5sum listings of every file in those directories.

You can the use a diff tool to compare the generated listings.

沦落红尘 2024-07-21 14:45:54

您是否只是想发现一个文件中存在哪些文件而另一个文件中不存在,反之亦然? 一些建议:

  1. 在两个目录中执行“ls -R”,重定向到文件,然后比较文件。

  2. 在它们之间执行“rsync -n”以查看如果允许 rsync 复制,则 rsync 必须复制哪些内容。 (-n 表示不执行 rsync,只是向您展示如果您在没有 -n 的情况下运行它会做什么)

Are you just trying to discover what files are present in one that aren't in the other, and vice versa? A couple of suggestions:

  1. Do a "ls -R" in both directories, redirect to files, and diff the files.

  2. Do a "rsync -n" between them to see what rsync would have to copy if it were to be allowed to copy. (-n means don't do the rsync, just show you what it would do if you ran it without the -n)

半世蒼涼 2024-07-21 14:45:54

我会通过比较 md5sum * | 的输出来进行比较。 sort

这将带您找到不同/丢失的文件

I would diffing by comparing the output of md5sum * | sort

That will take you to the files that are different/missing

梦年海沫深 2024-07-21 14:45:54

我知道这个问题已经得到解答,但是如果您不想自己编写这样的工具,那么有一个运行良好的开源项目,名为 tardiff 在 sourceforge 上可用,它基本上完全符合您的要求,甚至支持自动创建补丁(显然是 tar 格式)以解决差异。

希望这可以帮助

I know this question has already been answered, however if you are not into writing such a tool yourself, there's a very well working open source project by the name of tardiff available on sourceforge which basically does exactly what you want, and even supports automated creation of patches (in tar format obviously) to account for differences.

Hope this helps

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文