检测文件是否重复,如果不重复则重命名
我有一些将文件移动到同一目录的代码。什么是一个好的策略:
- 检测文件是否与目录中现有文件重复?这是为了决定是删除源还是简单地保留它。
- 如果已存在同名但内容不同的目标文件,请重命名源文件?
I have some code that moves files to the same directory. What would be a good strategy to:
- Detect if a file is a duplicate of an existing file in the directory? This is to decide whether to delete the source or simply leave it.
- Rename the source file if a destination file with the same name but different content already exists?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
大多数编程语言都会有一个类似于 FileExists 的函数,它获取文件名并返回一个布尔值,指示文件系统上是否存在给定名称的文件。
计算两个文件的 SHA1 校验和并比较这些哈希值。大多数语言都会有一个 sha1 函数,它获取表示文件内容的字节数组,并返回表示 SHA1 哈希值的字节数组。
Most programming languages will have a function called something similar to
FileExists
taking a filename and returning a boolean indicating whether a file with the given name exists on the file system.Calculate a SHA1 checksum for both files and compare those hashes. Once again most languages will have a sha1 function taking an array of bytes representing the contents of the files and returning an array of bytes representing the SHA1 hash.
嗯,一个简单的方法是进行循环冗余检查。多种语言都有实现此功能的函数。您还可以计算文件的 md5 和。但这并不是 100% 可靠。
如果您需要检查它们是否相同,则需要打开两个文件的流,并逐字节比较它们。
重复文件名的检查很明显,比较一下。
编辑:如果您有很多文件,请比较文件大小。如果不匹配,则它们不能相等。
Well, an easy way to do it is to do a cyclic redundancy check. Several languages have functions for this implemented. You can also calculate the md5 sum for your files. This is not 100% reliable though.
If you need to check if they are IDENTICAL, you need to open up a stream to both files, and compare them byte by byte.
The check for duplicate filenames are obvious, compare them.
Edit: If you have many files, compare the file size. If it does not match, they cannot be equal.
如果您不需要原始文件名,作为一种方便的方法,只需计算文件“内容”的 MD5 或 SHA1 哈希值并将文件重命名为它即可。 :-)
If you don't need the original file names, as a handy way, just calculate MD5 or SHA1 hash of the file "content" and rename the file to it. :-)