检测同一目录中的重复二进制文件 (Windows)
我的目录中有大约 30 个文件,大小从 64KB 到 4MB 不等,都是 BIN 文件。我需要查找其中是否有重复的文件...许多文件具有相同的大小。
我想查找其中是否存在相同的二进制文件。
有人知道如何做到这一点吗?我在 Windows XP Pro 下。
谢谢!
I have about 30 files in a directory varying from 64KB to 4MB that are BIN files. I need to find if there is duplicate files in there... Many files have the same size.
I would like to find if there are binary identical files in there.
Anyone know a way to do this? I'm under Windows XP Pro.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这很容易。您可以在命令行上使用两个嵌套的
for
循环:如果要在批处理文件中使用它,则需要将
%
符号加倍。该代码只是对当前目录中的所有文件循环两次:
然后,如果两个文件名不相等(因为这样我们就知道文件相等),
则运行比较文件的
fc
实用程序If
fc
的退出代码为0
,这意味着文件相等(因此重复),在这种情况下,后的
被触发。echo
&&&&
表示“如果前一个命令以0
退出代码退出,则只需执行以下命令”。对于 30 个文件来说,这确实足够快了。我曾经批量实现过一些更复杂的东西,但这应该足够了。
预计到达时间:找到另一批;仍然没有公开解释,但我曾经将其发布在超级用户。
That's pretty easy. You can use two nested
for
loops on the commandline:If you want to use this in a batch file, you need to double the
%
signs.The code simply loops twice over all files in the current directory:
then, if the two file names aren't equal (because then we know the files are equal)
if runs the
fc
utility which compares filesIf
fc
had an exit code of0
it means that the files were equal (thus duplicates) and in that case theecho
after the&&
is triggered.&&
means “Just execute the following command if the previous one exited with a0
exit code”.And for 30 files this is certainly fast enough. I once implemented something more elaborate in batch, but this should suffice.
ETA: Found the other batch; still nowhere publicly explained but I once posted it at Super User.
使用 Md5Deep(或类似的)对它们进行哈希处理,或者尝试使用重复文件检查器,
http:// www.portablefreeware.com/index.php?sc=77
Hash them with Md5Deep (or similar), or try a duplicate file checker,
http://www.portablefreeware.com/index.php?sc=77
就我个人而言,我会首先按文件大小对文件进行排序。从二进制比较来看,不同文件大小的文件不可能相同。
那些文件大小相同的文件可能是相同的,因此我将生成文件内容的哈希值(MD5、SHA1 等)。具有相同哈希结果的那些文件是相同的。
为了从编程的角度保持一切“切题”(否则这个问题可能更适合 superuser.com),这里有一个 C# 项目,它实现了一个“shell 扩展”(即 Windows 资源管理器上下文菜单中的附加项目),将计算 Windows 资源管理器中所选文件的各种哈希值:
文件哈希生成器 Shell 扩展
Personally, I would sort the files by file size first. Files of different file size cannot the same from a binary comparison.
Those that are of the same file-size could potentially be the same, so I would then generate a hash of the files contents (either MD5, SHA1 etc.). Those files that have the same hash result are identical.
And to keep everything "on-topic" from a programming perspective (otherwise this question is perhaps more suited to superuser.com), here is a C# project that implements a "shell extension" (i.e. additional items in Windows Explorer's context menu) that will compute various hashes of files selected within Windows Explorer:
File Hash Generator Shell Extension
生成每个文件的哈希值(Md5 或 sha1)并进行比较。
显然,如果两个文件的大小不同,那么您可以立即折扣。
Generate a hash (Md5 or sha1) of each file and compare.
Obviously if two files are a different size then you can discount it immediately.
您没有指定这应该如何发生。也许这是一个属于 superuser.com 的问题,但您可以使用像 WinMerge 这样的工具。
如果您必须通过代码执行此操作,您可以计算文件的哈希值并比较该哈希值。
You don't specify, how this should happen. Maybe this is a question which belongs to superuser.com, but you may use a tool like WinMerge.
If you have to do this by code, you could calculate a hash value of the files and compare this hash value.
您可以使用 fc 或 fciv (用于校验和)
或者您可以下载 GNU 实用程序
获取包含 md5sum 的 Textutils 和包含 sort /uniq 的 coreutils。然后执行此操作
要迭代并对结果执行某些操作,请使用 for 循环
you can use fc or fciv (for checksum)
Or you could download GNU utilities
get Textutils which contains md5sum and coreutils, which contains sort /uniq. then do this
To iterate and do something to the results, use a for loop