我需要计算相当大的文件(千兆字节)的校验和。这可以使用以下方法来完成:
private byte[] calcHash(string file)
{
System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();
FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);
byte[] hash = ha.ComputeHash(fs);
fs.Close();
return hash;
}
但是,文件通常是预先以缓冲方式写入的(例如一次写入 32mb)。我确信我看到了哈希函数的重写,它允许我在写入的同时计算 MD5(或其他)哈希,即:计算一个缓冲区的哈希,然后将生成的哈希输入下一次迭代。
像这样的东西:(伪代码式)
byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 };
while(!eof)
{
buffer = readFromSourceFile();
writefile(buffer);
hash = calchash(buffer, hash);
}
哈希现在类似于通过在整个文件上运行 calcHash 函数来完成的操作。
现在,我在 .Net 3.5 Framework 中找不到任何类似的覆盖,我是在做梦吗?它从来没有存在过,还是我只是不擅长搜索?一次进行写入和校验和计算的原因是因为由于文件很大,所以这是有意义的。
I need to calculate checksums of quite large files (gigabytes). This can be accomplished using the following method:
private byte[] calcHash(string file)
{
System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();
FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);
byte[] hash = ha.ComputeHash(fs);
fs.Close();
return hash;
}
However, the files are normally written just beforehand in a buffered manner (say writing 32mb's at a time). I am so convinced that I saw an override of a hash function that allowed me to calculate a MD5 (or other) hash at the same time as writing, ie: calculating the hash of one buffer, then feeding that resulting hash into the next iteration.
Something like this: (pseudocode-ish)
byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 };
while(!eof)
{
buffer = readFromSourceFile();
writefile(buffer);
hash = calchash(buffer, hash);
}
hash is now sililar to what would be accomplished by running the calcHash function on the entire file.
Now, I can't find any overrides like that in the.Net 3.5 Framework, am I dreaming ? Has it never existed, or am I just lousy at searching ? The reason for doing both writing and checksum calculation at once is because it makes sense due to the large files.
发布评论
评论(5)
您可以使用
TransformBlock
和TransformFinalBlock
方法来处理块中的数据。注意:它可以(至少与 MD5 提供程序一起)将所有块发送到 TransformBlock,然后将空块发送到 TransformFinalBlock 来完成该过程。
You use the
TransformBlock
andTransformFinalBlock
methods to process the data in chunks.Note: It works (at least with the MD5 provider) to send all blocks to
TransformBlock
and then send an empty block toTransformFinalBlock
to finalise the process.我喜欢上面的答案,但为了完整起见,并且作为更通用的解决方案,请参阅
CryptoStream
类。如果您已经在处理流,则可以轻松地将流包装在CryptoStream
中,并将HashAlgorithm
作为ICryptoTransform
参数传递。您可能必须在获取哈希之前关闭流(以便 HashAlgorithm 知道它已完成)。
I like the answer above but for the sake of completeness, and being a more general solution, refer to the
CryptoStream
class. If you are already handling streams, it is easy to wrap your stream in aCryptoStream
, passing aHashAlgorithm
as theICryptoTransform
parameter.You might have to close the stream before getting the hash (so the
HashAlgorithm
knows it's done).似乎您可以使用
TransformBlock
/TransformFinalBlock
,如本示例所示:散列大文件时显示进度更新Seems you can to use
TransformBlock
/TransformFinalBlock
, as shown in this sample: Displaying progress updates when hashing large files哈希算法预计可以处理这种情况,并且通常使用 3 个函数来实现:
hash_init()
- 调用以分配资源并开始哈希。hash_update()
- 在新数据到达时调用它。hash_final()
- 完成计算并释放资源。查看http://www.openssl.org/docs/crypto/md5.html 或 http://www.openssl.org/docs/crypto/sha.html 提供良好的标准 C 示例;我确信您的平台有类似的库。
Hash algorithms are expected to handle this situation and are typically implemented with 3 functions:
hash_init()
- Called to allocate resources and begin the hash.hash_update()
- Called with new data as it arrives.hash_final()
- Complete the calculation and free resources.Look at http://www.openssl.org/docs/crypto/md5.html or http://www.openssl.org/docs/crypto/sha.html for good, standard examples in C; I'm sure there are similar libraries for your platform.
我只需要做类似的事情,但想异步读取文件。它使用 TransformBlock 和 TransformFinalBlock 并给我与 Azure 一致的答案,所以我认为它是正确的!
I've just had to do something similar, but wanted to read the file asynchronously. It's using TransformBlock and TransformFinalBlock and is giving me answers consistent with Azure, so I think it is correct!