如何使用java或groovy计算目录上的md5校验和?
我希望使用 java 或 groovy 来获取完整目录的 md5 校验和。
我必须将源目录复制到目标,校验源和目标,然后删除源目录。
我找到了这个文件脚本,但是如何对目录执行同样的操作?
import java.security.MessageDigest
def generateMD5(final file) {
MessageDigest digest = MessageDigest.getInstance("MD5")
file.withInputStream(){ is ->
byte[] buffer = new byte[8192]
int read = 0
while( (read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
}
byte[] md5sum = digest.digest()
BigInteger bigInt = new BigInteger(1, md5sum)
return bigInt.toString(16).padLeft(32, '0')
}
有更好的方法吗?
I am looking to use java or groovy to get the md5 checksum of a complete directory.
I have to copy directories for source to target, checksum source and target, and after delete source directories.
I find this script for files, but how to do the same thing with directories ?
import java.security.MessageDigest
def generateMD5(final file) {
MessageDigest digest = MessageDigest.getInstance("MD5")
file.withInputStream(){ is ->
byte[] buffer = new byte[8192]
int read = 0
while( (read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
}
byte[] md5sum = digest.digest()
BigInteger bigInt = new BigInteger(1, md5sum)
return bigInt.toString(16).padLeft(32, '0')
}
Is there a better approach ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我有同样的要求,并选择我的“目录哈希”作为目录中所有(非目录)文件的串联流的 MD5 哈希。正如 crozin 在 类似问题 的评论中提到的,您可以使用
SequenceInputStream
充当连接其他流的负载。我使用 Apache Commons Codec 作为 MD5 算法。基本上,您可以递归目录树,将
FileInputStream
实例添加到非目录文件的Vector
中。然后,Vector
可以方便地使用elements()
方法来提供SequenceInputStream
需要循环的Enumeration
。对于 MD5 算法,这仅显示为一个InputStream
。一个问题是,您需要每次都以相同的顺序呈现文件,以便哈希值与相同的输入相同。
File
中的listFiles()
方法不保证排序,因此我按文件名排序。我正在为 SVN 控制的文件执行此操作,并且希望避免散列隐藏的 SVN 文件,因此我实现了一个标志来避免隐藏文件。
相关基本代码如下。 (显然它可以被“强化”。)
I had the same requirement and chose my 'directory hash' to be an MD5 hash of the concatenated streams of all (non-directory) files within the directory. As crozin mentioned in comments on a similar question, you can use
SequenceInputStream
to act as a stream concatenating a load of other streams. I'm using Apache Commons Codec for the MD5 algorithm.Basically, you recurse through the directory tree, adding
FileInputStream
instances to aVector
for non-directory files.Vector
then conveniently has theelements()
method to provide theEnumeration
thatSequenceInputStream
needs to loop through. To the MD5 algorithm, this just appears as oneInputStream
.A gotcha is that you need the files presented in the same order every time for the hash to be the same with the same inputs. The
listFiles()
method inFile
doesn't guarantee an ordering, so I sort by filename.I was doing this for SVN controlled files, and wanted to avoid hashing the hidden SVN files, so I implemented a flag to avoid hidden files.
The relevant basic code is as below. (Obviously it could be 'hardened'.)
我做了一个函数来计算 Directory 上的 MD5 校验和:
首先,我使用 FastMD5: http:// www.twmacinta.com/myjava/fast_md5.php
这是我的代码:
I made a function to calculate MD5 checksum on Directory :
First, I'm using FastMD5: http://www.twmacinta.com/myjava/fast_md5.php
Here is my code :
基于 Stuart Rossiter 的答案,但正确处理了干净的代码和隐藏文件:
Based on Stuart Rossiter's answer but clean code and hidden files properly handled:
HashCopy 是一个 Java 应用程序。它可以递归地生成和验证单个文件或目录的 MD5 和 SHA。我不确定它是否有 API。可以从 www.jdxsoftware.org 下载。
HashCopy is a Java application. It can generate and verify MD5 and SHA on a single file or a directory recursively. I am not sure if it has an API. It can be downloaded from www.jdxsoftware.org.
如果您需要在 Gradle 构建文件中执行此操作,它比使用普通 Groovy 要简单得多。
下面是一个示例:
MessageDigest
来自 Java 标准库:https://docs.oracle.com/javase/8/docs/api/java/security/MessageDigest.html所有 JVM 支持的算法有:
If you need to do this in a Gradle build file, it's much simpler than with plain Groovy.
Here's an example:
MessageDigest
is from the Java std lib: https://docs.oracle.com/javase/8/docs/api/java/security/MessageDigest.htmlAlgorithms supported in all JVMs are:
目前尚不清楚获取目录的 md5sum 意味着什么。您可能需要文件列表的校验和;您可能需要文件列表及其内容的校验和。如果您已经对文件数据本身进行了求和,我建议您为目录列表指定一个明确的表示(注意文件名中的邪恶字符),然后每次进行计算和散列。您还需要考虑如何处理特殊文件(Unix 世界中的套接字、管道、设备和符号链接;NTFS 有文件流,我相信也有类似于符号链接的东西)。
It's not clear what it means to take the md5sum of a directory. You might want the checksum of the file listing; you might want the checksum of the file listings and their contents. If you're already summing the file data themselves, I'd suggest you spec an unambiguous representation for a directory listing (watch out for evil characters in filenames), then compute and hash that each time. You also need to consider how you will handle special files (sockets, pipes, devices and symlinks in the unix world; NTFS has file streams and I believe something akin to symlinks as well).
我计算了 sha512 而不是 md5(因为它更安全),但想法是你可以在你的 gradle 文件或原始 groovy 中定义它。
然后在任何任务中调用 calcDirHash (并传入您想要散列的目录)。
您可以使用其他编码方案来代替 SHA-512。
I calculated sha512 instead of md5 (since its more secure) but the idea is you can define this in your gradle file or in raw groovy.
Then call calcDirHash in any task (and pass in the directory you want hashed).
You can use other encoding schemes instead of SHA-512.