如何使用java或groovy计算目录上的md5校验和?

发布于 2024-09-04 19:33:39 字数 629 浏览 4 评论 0原文

我希望使用 java 或 groovy 来获取完整目录的 md5 校验和。

我必须将源目录复制到目标,校验源和目标,然后删除源目录。

我找到了这个文件脚本,但是如何对目录执行同样的操作?

import java.security.MessageDigest

def generateMD5(final file) {
    MessageDigest digest = MessageDigest.getInstance("MD5")
    file.withInputStream(){ is ->
        byte[] buffer = new byte[8192]
        int read = 0
        while( (read = is.read(buffer)) > 0) {
            digest.update(buffer, 0, read);
        }
    }
    byte[] md5sum = digest.digest()
    BigInteger bigInt = new BigInteger(1, md5sum)

    return bigInt.toString(16).padLeft(32, '0')
}

有更好的方法吗?

I am looking to use java or groovy to get the md5 checksum of a complete directory.

I have to copy directories for source to target, checksum source and target, and after delete source directories.

I find this script for files, but how to do the same thing with directories ?

import java.security.MessageDigest

def generateMD5(final file) {
    MessageDigest digest = MessageDigest.getInstance("MD5")
    file.withInputStream(){ is ->
        byte[] buffer = new byte[8192]
        int read = 0
        while( (read = is.read(buffer)) > 0) {
            digest.update(buffer, 0, read);
        }
    }
    byte[] md5sum = digest.digest()
    BigInteger bigInt = new BigInteger(1, md5sum)

    return bigInt.toString(16).padLeft(32, '0')
}

Is there a better approach ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

夜光 2024-09-11 19:33:39

我有同样的要求,并选择我的“目录哈希”作为目录中所有(非目录)文件的串联流的 MD5 哈希。正如 crozin 在 类似问题 的评论中提到的,您可以使用 SequenceInputStream 充当连接其他流的负载。我使用 Apache Commons Codec 作为 MD5 算法。

基本上,您可以递归目录树,将 FileInputStream 实例添加到非目录文件的 Vector 中。然后,Vector 可以方便地使用 elements() 方法来提供 SequenceInputStream 需要循环的Enumeration。对于 MD5 算法,这仅显示为一个 InputStream

一个问题是,您需要每次都以相同的顺序呈现文件,以便哈希值与相同的输入相同。 File 中的 listFiles() 方法不保证排序,因此我按文件名排序。

我正在为 SVN 控制的文件执行此操作,并且希望避免散列隐藏的 SVN 文件,因此我实现了一个标志来避免隐藏文件。

相关基本代码如下。 (显然它可以被“强化”。)

import org.apache.commons.codec.digest.DigestUtils;

import java.io.*;
import java.util.*;

public String calcMD5HashForDir(File dirToHash, boolean includeHiddenFiles) {

    assert (dirToHash.isDirectory());
    Vector<FileInputStream> fileStreams = new Vector<FileInputStream>();

    System.out.println("Found files for hashing:");
    collectInputStreams(dirToHash, fileStreams, includeHiddenFiles);

    SequenceInputStream seqStream = 
            new SequenceInputStream(fileStreams.elements());

    try {
        String md5Hash = DigestUtils.md5Hex(seqStream);
        seqStream.close();
        return md5Hash;
    }
    catch (IOException e) {
        throw new RuntimeException("Error reading files to hash in "
                                   + dirToHash.getAbsolutePath(), e);
    }

}

private void collectInputStreams(File dir,
                                 List<FileInputStream> foundStreams,
                                 boolean includeHiddenFiles) {

    File[] fileList = dir.listFiles();        
    Arrays.sort(fileList,               // Need in reproducible order
                new Comparator<File>() {
                    public int compare(File f1, File f2) {                       
                        return f1.getName().compareTo(f2.getName());
                    }
                });

    for (File f : fileList) {
        if (!includeHiddenFiles && f.getName().startsWith(".")) {
            // Skip it
        }
        else if (f.isDirectory()) {
            collectInputStreams(f, foundStreams, includeHiddenFiles);
        }
        else {
            try {
                System.out.println("\t" + f.getAbsolutePath());
                foundStreams.add(new FileInputStream(f));
            }
            catch (FileNotFoundException e) {
                throw new AssertionError(e.getMessage()
                            + ": file should never not be found!");
            }
        }
    }

}

I had the same requirement and chose my 'directory hash' to be an MD5 hash of the concatenated streams of all (non-directory) files within the directory. As crozin mentioned in comments on a similar question, you can use SequenceInputStream to act as a stream concatenating a load of other streams. I'm using Apache Commons Codec for the MD5 algorithm.

Basically, you recurse through the directory tree, adding FileInputStream instances to a Vector for non-directory files. Vector then conveniently has the elements() method to provide the Enumeration that SequenceInputStream needs to loop through. To the MD5 algorithm, this just appears as one InputStream.

A gotcha is that you need the files presented in the same order every time for the hash to be the same with the same inputs. The listFiles() method in File doesn't guarantee an ordering, so I sort by filename.

I was doing this for SVN controlled files, and wanted to avoid hashing the hidden SVN files, so I implemented a flag to avoid hidden files.

The relevant basic code is as below. (Obviously it could be 'hardened'.)

import org.apache.commons.codec.digest.DigestUtils;

import java.io.*;
import java.util.*;

public String calcMD5HashForDir(File dirToHash, boolean includeHiddenFiles) {

    assert (dirToHash.isDirectory());
    Vector<FileInputStream> fileStreams = new Vector<FileInputStream>();

    System.out.println("Found files for hashing:");
    collectInputStreams(dirToHash, fileStreams, includeHiddenFiles);

    SequenceInputStream seqStream = 
            new SequenceInputStream(fileStreams.elements());

    try {
        String md5Hash = DigestUtils.md5Hex(seqStream);
        seqStream.close();
        return md5Hash;
    }
    catch (IOException e) {
        throw new RuntimeException("Error reading files to hash in "
                                   + dirToHash.getAbsolutePath(), e);
    }

}

private void collectInputStreams(File dir,
                                 List<FileInputStream> foundStreams,
                                 boolean includeHiddenFiles) {

    File[] fileList = dir.listFiles();        
    Arrays.sort(fileList,               // Need in reproducible order
                new Comparator<File>() {
                    public int compare(File f1, File f2) {                       
                        return f1.getName().compareTo(f2.getName());
                    }
                });

    for (File f : fileList) {
        if (!includeHiddenFiles && f.getName().startsWith(".")) {
            // Skip it
        }
        else if (f.isDirectory()) {
            collectInputStreams(f, foundStreams, includeHiddenFiles);
        }
        else {
            try {
                System.out.println("\t" + f.getAbsolutePath());
                foundStreams.add(new FileInputStream(f));
            }
            catch (FileNotFoundException e) {
                throw new AssertionError(e.getMessage()
                            + ": file should never not be found!");
            }
        }
    }

}
风情万种。 2024-09-11 19:33:39

我做了一个函数来计算 Directory 上的 MD5 校验和:

首先,我使用 FastMD5: http:// www.twmacinta.com/myjava/fast_md5.php

这是我的代码:

  def MD5HashDirectory(String fileDir) {
    MD5 md5 = new MD5();
    new File(fileDir).eachFileRecurse{ file ->
      if (file.isFile()) {
        String hashFile = MD5.asHex(MD5.getHash(new File(file.path)));
        md5.Update(hashFile, null);
      }

    }
    String hashFolder = md5.asHex();
    return hashFolder
  }

I made a function to calculate MD5 checksum on Directory :

First, I'm using FastMD5: http://www.twmacinta.com/myjava/fast_md5.php

Here is my code :

  def MD5HashDirectory(String fileDir) {
    MD5 md5 = new MD5();
    new File(fileDir).eachFileRecurse{ file ->
      if (file.isFile()) {
        String hashFile = MD5.asHex(MD5.getHash(new File(file.path)));
        md5.Update(hashFile, null);
      }

    }
    String hashFolder = md5.asHex();
    return hashFolder
  }
卸妝后依然美 2024-09-11 19:33:39

基于 Stuart Rossiter 的答案,但正确处理了干净的代码和隐藏文件:

import org.apache.commons.codec.digest.DigestUtils;

import java.io.*;
import java.nio.file.Files;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.Vector;

public class Hashing {
    public static String hashDirectory(String directoryPath, boolean includeHiddenFiles) throws IOException {
        File directory = new File(directoryPath);
        
        if (!directory.isDirectory()) {
            throw new IllegalArgumentException("Not a directory");
        }

        Vector<FileInputStream> fileStreams = new Vector<>();
        collectFiles(directory, fileStreams, includeHiddenFiles);

        try (SequenceInputStream sequenceInputStream = new SequenceInputStream(fileStreams.elements())) {
            return DigestUtils.md5Hex(sequenceInputStream);
        }
    }

    private static void collectFiles(File directory, List<FileInputStream> fileInputStreams,
                                     boolean includeHiddenFiles) throws IOException {
        File[] files = directory.listFiles();

        if (files != null) {
            Arrays.sort(files, Comparator.comparing(File::getName));

            for (File file : files) {
                if (includeHiddenFiles || !Files.isHidden(file.toPath())) {
                    if (file.isDirectory()) {
                        collectFiles(file, fileInputStreams, includeHiddenFiles);
                    } else {
                        fileInputStreams.add(new FileInputStream(file));
                    }
                }
            }
        }
    }
}

Based on Stuart Rossiter's answer but clean code and hidden files properly handled:

import org.apache.commons.codec.digest.DigestUtils;

import java.io.*;
import java.nio.file.Files;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.Vector;

public class Hashing {
    public static String hashDirectory(String directoryPath, boolean includeHiddenFiles) throws IOException {
        File directory = new File(directoryPath);
        
        if (!directory.isDirectory()) {
            throw new IllegalArgumentException("Not a directory");
        }

        Vector<FileInputStream> fileStreams = new Vector<>();
        collectFiles(directory, fileStreams, includeHiddenFiles);

        try (SequenceInputStream sequenceInputStream = new SequenceInputStream(fileStreams.elements())) {
            return DigestUtils.md5Hex(sequenceInputStream);
        }
    }

    private static void collectFiles(File directory, List<FileInputStream> fileInputStreams,
                                     boolean includeHiddenFiles) throws IOException {
        File[] files = directory.listFiles();

        if (files != null) {
            Arrays.sort(files, Comparator.comparing(File::getName));

            for (File file : files) {
                if (includeHiddenFiles || !Files.isHidden(file.toPath())) {
                    if (file.isDirectory()) {
                        collectFiles(file, fileInputStreams, includeHiddenFiles);
                    } else {
                        fileInputStreams.add(new FileInputStream(file));
                    }
                }
            }
        }
    }
}
笛声青案梦长安 2024-09-11 19:33:39

HashCopy 是一个 Java 应用程序。它可以递归地生成和验证单个文件或目录的 MD5 和 SHA。我不确定它是否有 API。可以从 www.jdxsoftware.org 下载。

HashCopy is a Java application. It can generate and verify MD5 and SHA on a single file or a directory recursively. I am not sure if it has an API. It can be downloaded from www.jdxsoftware.org.

只等公子 2024-09-11 19:33:39

如果您需要在 Gradle 构建文件中执行此操作,它比使用普通 Groovy 要简单得多。

下面是一个示例:

def sources = fileTree('rootDir').matching {
    include 'src/*', 'build.gradle'
}.sort { it.name }
def digest = MessageDigest.getInstance('SHA-1')
sources.each { digest.update(it.bytes) }
digest.digest().encodeHex().toString()

MessageDigest 来自 Java 标准库:https://docs.oracle.com/javase/8/docs/api/java/security/MessageDigest.html

所有 JVM 支持的算法有:

MD5
SHA-1
SHA-256

If you need to do this in a Gradle build file, it's much simpler than with plain Groovy.

Here's an example:

def sources = fileTree('rootDir').matching {
    include 'src/*', 'build.gradle'
}.sort { it.name }
def digest = MessageDigest.getInstance('SHA-1')
sources.each { digest.update(it.bytes) }
digest.digest().encodeHex().toString()

MessageDigest is from the Java std lib: https://docs.oracle.com/javase/8/docs/api/java/security/MessageDigest.html

Algorithms supported in all JVMs are:

MD5
SHA-1
SHA-256
夕嗳→ 2024-09-11 19:33:39

目前尚不清楚获取目录的 md5sum 意味着什么。您可能需要文件列表的校验和;您可能需要文件列表及其内容的校验和。如果您已经对文件数据本身进行了求和,我建议您为目录列表指定一个明确的表示(注意文件名中的邪恶字符),然后每次进行计算和散列。您还需要考虑如何处理特殊文件(Unix 世界中的套接字、管道、设备和符号链接;NTFS 有文件流,我相信也有类似于符号链接的东西)。

It's not clear what it means to take the md5sum of a directory. You might want the checksum of the file listing; you might want the checksum of the file listings and their contents. If you're already summing the file data themselves, I'd suggest you spec an unambiguous representation for a directory listing (watch out for evil characters in filenames), then compute and hash that each time. You also need to consider how you will handle special files (sockets, pipes, devices and symlinks in the unix world; NTFS has file streams and I believe something akin to symlinks as well).

旧时浪漫 2024-09-11 19:33:39

我计算了 sha512 而不是 md5(因为它更安全),但想法是你可以在你的 gradle 文件或原始 groovy 中定义它。

import java.security.MessageDigest
import java.io.File

def calcDirHash(fileDir) {
  def hash = MessageDigest.getInstance("SHA-512")
  new File(fileDir).eachFileRecurse{ file ->
    if (file.isFile()) {
      file.eachByte 4096, {bytes, size ->
        hash.update(bytes, 0, size);
      }
    }
  }
  return hash.digest().encodeHex()
}

然后在任何任务中调用 calcDirHash (并传入您想要散列的目录)。

您可以使用其他编码方案来代替 SHA-512。

I calculated sha512 instead of md5 (since its more secure) but the idea is you can define this in your gradle file or in raw groovy.

import java.security.MessageDigest
import java.io.File

def calcDirHash(fileDir) {
  def hash = MessageDigest.getInstance("SHA-512")
  new File(fileDir).eachFileRecurse{ file ->
    if (file.isFile()) {
      file.eachByte 4096, {bytes, size ->
        hash.update(bytes, 0, size);
      }
    }
  }
  return hash.digest().encodeHex()
}

Then call calcDirHash in any task (and pass in the directory you want hashed).

You can use other encoding schemes instead of SHA-512.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文