根据哈希值确认文件内容
我需要“检查文件内容的完整性”。文件将写入 CD/DVD,可能会被复制多次。这个想法是识别正确复制的副本(在从 Nero 等中删除它们之后)。
我对此相当陌生,但快速搜索表明 Arrays.hashCode(byte[])
将满足需要。我们可以在磁盘上包含一个文件,其中包含对每个感兴趣的资源的调用结果,然后将其与从磁盘读取的 File
的 byte[]
进行比较检查时。
我是否正确理解该方法,这是检查文件内容的有效方法吗?
如果没有,关于搜索关键字或策略/方法/类的建议将不胜感激。
基于布伦丹答案的工作代码。它解决了 VoidStar 发现的问题(需要将整个 byte[]
保存在内存中以获取哈希值)。
import java.io.File;
import java.io.FileInputStream;
import java.util.zip.CRC32;
class TestHash {
public static void main(String[] args) throws Exception {
File f = new File("TestHash.java");
FileInputStream fis = new FileInputStream(f);
CRC32 crcMaker = new CRC32();
byte[] buffer = new byte[65536];
int bytesRead;
while((bytesRead = fis.read(buffer)) != -1) {
crcMaker.update(buffer, 0, bytesRead);
}
long crc = crcMaker.getValue(); // This is your error checking code
System.out.println("CRC code is " + crc);
}
}
I have a requirement to 'check the integrity' of the content of files. The files will be written to CD/DVD, which might be copied many times. The idea is to identify copies (after they are removed from Nero etc.) which copied correctly.
Am rather new to this, but a quick search suggests that Arrays.hashCode(byte[])
will fit the need. We can include a file on the disk that contains the result of that call for each resource of interest, then compare it to the byte[]
of the File
as read from disk when checked.
Do I understand the method correctly, is this a valid way to go about checking file content?
If not, suggestions as to search keywords or strategies/methods/classes would be appreciated.
Working code based on the answer of Brendan. It takes care of the problem identified by VoidStar (needing to hold the entire byte[]
in memory for getting the hash).
import java.io.File;
import java.io.FileInputStream;
import java.util.zip.CRC32;
class TestHash {
public static void main(String[] args) throws Exception {
File f = new File("TestHash.java");
FileInputStream fis = new FileInputStream(f);
CRC32 crcMaker = new CRC32();
byte[] buffer = new byte[65536];
int bytesRead;
while((bytesRead = fis.read(buffer)) != -1) {
crcMaker.update(buffer, 0, bytesRead);
}
long crc = crcMaker.getValue(); // This is your error checking code
System.out.println("CRC code is " + crc);
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Arrays.hashCode()
被设计得非常快(在哈希表中使用)。我强烈建议不要将其用于此目的。您想要的是某种错误检查代码,例如 CRC。
Java恰好有一个用于计算这些的类: CRC32 :
Arrays.hashCode()
is designed to be very fast (used in hash tables). I highly recommend not using it for this purpose.What you want is some sort of error-checking code like a CRC.
Java happens to have a class for calculating these: CRC32:
这是一个示例:
您需要创建一个校验和文件
http://www.jguru.com/faq/view.jsp?EID= 216274
Here is an example:
You need to create a checksum file
http://www.jguru.com/faq/view.jsp?EID=216274
是的,只要加载整个文件并传入,它就会按预期执行。 但是文件越大,它就会消耗尽可能多的 RAM,这对于此任务来说不是必需的。如果您在从存储中流式传输文件时将文件散列成较小的块,则可以避免浪费内存。例如,您可以将每个块的哈希值异或在一起以创建最终哈希值,或者找到期望数据流式传输的哈希实现。
Yes, as long as you load the entire file and pass it in, it will perform as expected. However it will consume as much RAM as the file is big, which is not necessary for this task. If you instead hash the file in smaller blocks as you stream it from storage, then you can avoid wasting memory. You could, for example, xor together the hashes of each block to create a final hash, or find a hash implementation that expects data to be streamed.