如何将更大的缓冲区大小传递给 DCPCrypt“UpdateStream”程序
我有一个程序,目前仅使用 SHA1 对文件进行哈希处理。没有其他选择。它使用 SHA1 哈希函数对它们进行哈希处理,该函数是 Lazarus 和 Free Pascal 编译器的一部分。
此后,我通过使用 DCPCrypt 库添加了使用 MD5、SHA256 和 SHA512 的功能(http://wiki.lazarus.freepascal.org/DCPcrypt 或 http://www.cityinthesky.co.uk/opensource)。一切工作正常,但是,如果文件大于 1Mb,我的早期版本会在 2Mb 缓冲区中对文件进行哈希处理。如果它小于 1Mb,则它使用 1024 字节的默认缓冲区,如下所示:
if SizeOfFile > 1048576 then // if > 1Mb
begin
fileHashValue := SHA1Print(SHA1File(NameOfFileToHash, 2097152)); //2Mb buffer
end
else
fileHashValue := SHA1Print(SHA1File(NameOfFileToHash)); //1024 byte buffer
但是,我的散列函数和过程现在已移至由单选按钮状态控制的单个函数,以使我的代码更加面向对象。它基本上在其中编码了所有 4 个哈希选项,运行哪个部分取决于程序找到的 RadioButton.Checked 状态。例如,SHA1 的代码现在如下所示:
..
SourceData := TFileStream.Create(FileToBeHashed, fmOpenRead);
..
else if SHA1RadioButton2.Checked = true then
begin
varSHA1Hash := TDCP_SHA1.Create(nil);
varSHA1Hash.Init;
varSHA1Hash.UpdateStream(SourceData, SourceData.Size); // HOW DO I ADD A BUFFER HERE?
varSHA1Hash.Final(DigestSHA1);
varSHA1Hash.Free;
for i := 0 to 19 do // 40 character output
GeneratedHash := GeneratedHash + IntToHex(DigestSHA1[i],2);
end // End of SHA1 if
我的问题是,如果找到的文件“大”(例如,大于 1Mb),如何向 varSHA1Hash.UpdateStream 添加缓冲区大小?这很重要,因为例如,使用我的早期版本处理一个 300Mb 的文件需要 4 秒,而现在使用 DCPCrypt 库的“改进”版本则需要 9 秒!因此,尽管我的代码可读性更好,但大文件所需的时间却增加了一倍。如果我可以让 varSHA1Hash.UpdateStream 一次读取几 Mb 的数据,而不是 8k 字节缓冲区(如果您阅读代码库,则过程 UpdateStream 会这样做),它将使其更快。就目前情况而言,我的理解是 varSHA1Hash.UpdateStream(SourceData, SourceData.Size);基本上读取正在读取的文件的整个大小作为缓冲区?
如果有帮助,这里是 UpdateStream 过程,
procedure TDCP_hash.UpdateStream(Stream: TStream; Size: longword);
var
Buffer: array[0..8191] of byte;
i, read: integer;
begin
dcpFillChar(Buffer, SizeOf(Buffer), 0);
for i:= 1 to (Size div Sizeof(Buffer)) do
begin
read:= Stream.Read(Buffer,Sizeof(Buffer));
Update(Buffer,read);
end;
if (Size mod Sizeof(Buffer))<> 0 then
begin
read:= Stream.Read(Buffer,Size mod Sizeof(Buffer));
Update(Buffer,read);
end;
end;
我还查看了一些其他库,例如 Delphi Encryption Compedium (http://home.netsurf.de/wolfgang.ehrhardt/crchash_en.html) 和 Wolfgang Ehrhardt 库 (http: //www.terry.net/pages.php?id=519#939342),也是 DoubleCommander 中包含的一个,但出于各种原因(简单来说就是一)我正在尝试使用 DCPCrypt 来做到这一点。
I have a program that currently hashes files using just SHA1. No other options. It hashes them using the SHA1 hash function that's part of the Lazarus and Free Pascal Compiler.
I've since added the ability to use MD5, SHA256 and SHA512 by using the DCPCrypt library (http://wiki.lazarus.freepascal.org/DCPcrypt or http://www.cityinthesky.co.uk/opensource). Everything is working fine, however, my earlier version hashed the file in 2Mb buffers if the file was larger than 1Mb. If it was smaller than 1Mb, it used the default buffer of 1024 bytes, like this :
if SizeOfFile > 1048576 then // if > 1Mb
begin
fileHashValue := SHA1Print(SHA1File(NameOfFileToHash, 2097152)); //2Mb buffer
end
else
fileHashValue := SHA1Print(SHA1File(NameOfFileToHash)); //1024 byte buffer
However, my hashing functions and procedures have now been moved to a single function controlled by a Radio button status to make my code more object orientated. It basically has all 4 hashing options coded within it, and which section is ran depends on which RadioButton.Checked status the program finds. The code of SHA1, for example, now looks like this :
..
SourceData := TFileStream.Create(FileToBeHashed, fmOpenRead);
..
else if SHA1RadioButton2.Checked = true then
begin
varSHA1Hash := TDCP_SHA1.Create(nil);
varSHA1Hash.Init;
varSHA1Hash.UpdateStream(SourceData, SourceData.Size); // HOW DO I ADD A BUFFER HERE?
varSHA1Hash.Final(DigestSHA1);
varSHA1Hash.Free;
for i := 0 to 19 do // 40 character output
GeneratedHash := GeneratedHash + IntToHex(DigestSHA1[i],2);
end // End of SHA1 if
My question is how do I add a buffer size to varSHA1Hash.UpdateStream if the file found is 'large' (say, bigger than 1Mb)? This is important because a 300Mb file, for example, takes 4 seconds with my earlier version and now it takes 9 seconds with my 'improved' version that utilises the DCPCrypt library! So it has doubled the time it takes for large files even though my code reads much better. If I can get varSHA1Hash.UpdateStream to read in data of several Mb at a time instead of 8k byte buffers (which the procedure UpdateStream does, if you read the code library) it will make it faster. As it stands, my understanding is that varSHA1Hash.UpdateStream(SourceData, SourceData.Size); basically reads the entire size of the file being read as the buffer?
If it helps, here is the UpdateStream procedure from
procedure TDCP_hash.UpdateStream(Stream: TStream; Size: longword);
var
Buffer: array[0..8191] of byte;
i, read: integer;
begin
dcpFillChar(Buffer, SizeOf(Buffer), 0);
for i:= 1 to (Size div Sizeof(Buffer)) do
begin
read:= Stream.Read(Buffer,Sizeof(Buffer));
Update(Buffer,read);
end;
if (Size mod Sizeof(Buffer))<> 0 then
begin
read:= Stream.Read(Buffer,Size mod Sizeof(Buffer));
Update(Buffer,read);
end;
end;
I have also looked at some other libraries, such as Delphi Encryption Compedium (http://home.netsurf.de/wolfgang.ehrhardt/crchash_en.html) and Wolfgang Ehrhardt library (http://www.torry.net/pages.php?id=519#939342) and also the one that is included with DoubleCommander, but for varios reasons (simplicty being one) I am trying to do this using DCPCrypt.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
回答你的问题:你不能传递不同的大小,但你可以用你提到的方法更改 dcpcrypt2.pas 中的数组大小并重新编译 DCPCrypt,毕竟它是 OSS。
但这并没有多大帮助,因为 fpc 的 sha1 单元并不是因为缓冲区大小较大而更快,而是因为 sha1 算法的实现速度更快,它利用编译器内在函数来旋转值,这是一个频繁使用的操作sha1算法。
只是以下具有不同数字命令行参数的程序(例如 8192 和 8388608):
至少在我的 PC 上,缓冲区是 8k 或 8M 没有区别。如果您使用较低的值(例如 1024),您会发现速度略有下降 (10-20%)。
To answer your question: you cannot pass a different size but you can change the array size in dcpcrypt2.pas in the method you mentioned and recompile DCPCrypt, it is OSS after all.
But this will not help much because the sha1 unit of fpc is not faster because of the larger buffer size but because of a faster implementation of the sha1 algorithm, it makes use of the compiler intrinsics to rotate values which is an heavily used operation of the sha1 algorithm.
Just the following program with different numerical command line parameters (e.g. 8192 and 8388608):
At least on my PC it makes no difference if the buffer is 8k or 8M. If you use lower values like 1024, you will see a slight slow down (10-20%).