使用Java为大文件生成MD5非常慢

发布于 2025-01-06 19:15:25 字数 833 浏览 1 评论 0原文

我正在使用 Java 为某些文件生成 MD5 哈希值。我需要为总大小约为 1 GB 的多个文件生成一个 MD5。 这是我的代码:

private String generateMD5(SequenceInputStream inputStream){
    if(inputStream==null){
        return null;
    }
    MessageDigest md;
    try {
        int read =0;
        byte[] buf = new byte[2048];
        md = MessageDigest.getInstance("MD5");
        while((read = inputStream.read(buf))>0){
            md.update(buf,0,read);
        }
        byte[] hashValue = md.digest();
        return new String(hashValue);
    } catch (NoSuchAlgorithmException e) {
        return null;
    } catch (IOException e) {
        return null;
    }finally{
        try {
            if(inputStream!=null)inputStream.close();
        } catch (IOException e) {
            // ...
        }
    } 

}

这似乎会永远运行。 我怎样才能让它更有效率?

I am using Java to generate the MD5 hash for some files. I need to generate one MD5 for several files with a total size of about 1 gigabyte.
Here's my code:

private String generateMD5(SequenceInputStream inputStream){
    if(inputStream==null){
        return null;
    }
    MessageDigest md;
    try {
        int read =0;
        byte[] buf = new byte[2048];
        md = MessageDigest.getInstance("MD5");
        while((read = inputStream.read(buf))>0){
            md.update(buf,0,read);
        }
        byte[] hashValue = md.digest();
        return new String(hashValue);
    } catch (NoSuchAlgorithmException e) {
        return null;
    } catch (IOException e) {
        return null;
    }finally{
        try {
            if(inputStream!=null)inputStream.close();
        } catch (IOException e) {
            // ...
        }
    } 

}

This seems to run forever.
How can I make it more efficient?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

提笔书几行 2025-01-13 19:15:25

您可能需要使用 Fast MD5 库。它比 Java 的内置 MD5 提供程序快得多,并且获取哈希值非常简单:

String hash = MD5.asHex(MD5.getHash(new File(filename)));

请注意,速度慢也可能是由于文件 I/O 慢所致。

You may want to use the Fast MD5 library. It's much faster than Java's built-in MD5 provider and getting a hash is as simple as:

String hash = MD5.asHex(MD5.getHash(new File(filename)));

Be aware that the slow speed may also be due to slow File I/O.

池木 2025-01-13 19:15:25

我用nio重写了你的代码,代码有点像下面这样:

private static String generateMD5(FileInputStream inputStream){
    if(inputStream==null){

        return null;
    }
    MessageDigest md;
    try {
        md = MessageDigest.getInstance("MD5");
        FileChannel channel = inputStream.getChannel();
        ByteBuffer buff = ByteBuffer.allocate(2048);
        while(channel.read(buff) != -1)
        {
            buff.flip();
            md.update(buff);
            buff.clear();
        }
        byte[] hashValue = md.digest();
        return new String(hashValue);
    }
    catch (NoSuchAlgorithmException e)
    {
        return null;
    } 
    catch (IOException e) 
    {
        return null;
    }
    finally
    {
        try {
            if(inputStream!=null)inputStream.close();
        } catch (IOException e) {

        }
    } 
}

在我的机器上,为一个大文件生成md5代码需要大约30s,当然我也测试了你的代码,结果表明nio没有改进程序的性能。

然后,我尝试分别获取io和md5的时间,统计表明慢文件io是瓶颈,因为大约5/6的时间用于io。

通过使用@Sticky提到的Fast MD5库,生成md5代码只需要15s,改进是显着的。

I rewrite your code with nio, the code is somewhat like below:

private static String generateMD5(FileInputStream inputStream){
    if(inputStream==null){

        return null;
    }
    MessageDigest md;
    try {
        md = MessageDigest.getInstance("MD5");
        FileChannel channel = inputStream.getChannel();
        ByteBuffer buff = ByteBuffer.allocate(2048);
        while(channel.read(buff) != -1)
        {
            buff.flip();
            md.update(buff);
            buff.clear();
        }
        byte[] hashValue = md.digest();
        return new String(hashValue);
    }
    catch (NoSuchAlgorithmException e)
    {
        return null;
    } 
    catch (IOException e) 
    {
        return null;
    }
    finally
    {
        try {
            if(inputStream!=null)inputStream.close();
        } catch (IOException e) {

        }
    } 
}

On my machine, it takes about 30s to generate md5 code for a large file, and of course i test your code as well, the result indicates that nio doesn't improve the performance of the program.

Then, i try to get the time for io and md5 respectively, the statistics indicates that the slow file io is the bottleneck because about 5/6 of time is taken for io.

By using the Fast MD5 library mentioned by @Sticky, it takes only 15s to generate md5 code, the improvement is remarkable.

む无字情书 2025-01-13 19:15:25

每当速度成为问题并且您从 URL 下载文件并且想要同时计算其 MD5(即不保存文件,重新打开并再次读取以获得其 MD5)时,我的解决方案https://stackoverflow.com/a/11189634/1082681 可能会有所帮助。它基于 Bloodwulf 在本线程中的代码片段(谢谢!),只是对其进行了一些扩展。

Whenever speed is an issue and you download a file from a URL and want to calculate its MD5 at the same time (i.e. not save the file, reopen and read again just to get its MD5), my solution at https://stackoverflow.com/a/11189634/1082681 might be helpful. It is based on Bloodwulf's code snippet here in this thread (thanks!) and just extends it a bit.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文