使用Java为大文件生成MD5非常慢
我正在使用 Java 为某些文件生成 MD5 哈希值。我需要为总大小约为 1 GB 的多个文件生成一个 MD5。 这是我的代码:
private String generateMD5(SequenceInputStream inputStream){
if(inputStream==null){
return null;
}
MessageDigest md;
try {
int read =0;
byte[] buf = new byte[2048];
md = MessageDigest.getInstance("MD5");
while((read = inputStream.read(buf))>0){
md.update(buf,0,read);
}
byte[] hashValue = md.digest();
return new String(hashValue);
} catch (NoSuchAlgorithmException e) {
return null;
} catch (IOException e) {
return null;
}finally{
try {
if(inputStream!=null)inputStream.close();
} catch (IOException e) {
// ...
}
}
}
这似乎会永远运行。 我怎样才能让它更有效率?
I am using Java to generate the MD5 hash for some files. I need to generate one MD5 for several files with a total size of about 1 gigabyte.
Here's my code:
private String generateMD5(SequenceInputStream inputStream){
if(inputStream==null){
return null;
}
MessageDigest md;
try {
int read =0;
byte[] buf = new byte[2048];
md = MessageDigest.getInstance("MD5");
while((read = inputStream.read(buf))>0){
md.update(buf,0,read);
}
byte[] hashValue = md.digest();
return new String(hashValue);
} catch (NoSuchAlgorithmException e) {
return null;
} catch (IOException e) {
return null;
}finally{
try {
if(inputStream!=null)inputStream.close();
} catch (IOException e) {
// ...
}
}
}
This seems to run forever.
How can I make it more efficient?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可能需要使用 Fast MD5 库。它比 Java 的内置 MD5 提供程序快得多,并且获取哈希值非常简单:
请注意,速度慢也可能是由于文件 I/O 慢所致。
You may want to use the Fast MD5 library. It's much faster than Java's built-in MD5 provider and getting a hash is as simple as:
Be aware that the slow speed may also be due to slow File I/O.
我用nio重写了你的代码,代码有点像下面这样:
在我的机器上,为一个大文件生成md5代码需要大约30s,当然我也测试了你的代码,结果表明nio没有改进程序的性能。
然后,我尝试分别获取io和md5的时间,统计表明慢文件io是瓶颈,因为大约5/6的时间用于io。
通过使用@Sticky提到的Fast MD5库,生成md5代码只需要15s,改进是显着的。
I rewrite your code with nio, the code is somewhat like below:
On my machine, it takes about 30s to generate md5 code for a large file, and of course i test your code as well, the result indicates that nio doesn't improve the performance of the program.
Then, i try to get the time for io and md5 respectively, the statistics indicates that the slow file io is the bottleneck because about 5/6 of time is taken for io.
By using the Fast MD5 library mentioned by @Sticky, it takes only 15s to generate md5 code, the improvement is remarkable.
每当速度成为问题并且您从 URL 下载文件并且想要同时计算其 MD5(即不保存文件,重新打开并再次读取以获得其 MD5)时,我的解决方案https://stackoverflow.com/a/11189634/1082681 可能会有所帮助。它基于 Bloodwulf 在本线程中的代码片段(谢谢!),只是对其进行了一些扩展。
Whenever speed is an issue and you download a file from a URL and want to calculate its MD5 at the same time (i.e. not save the file, reopen and read again just to get its MD5), my solution at https://stackoverflow.com/a/11189634/1082681 might be helpful. It is based on Bloodwulf's code snippet here in this thread (thanks!) and just extends it a bit.