缓冲文件(用于更快的磁盘访问)

发布于 2024-10-31 16:59:31 字数 2742 浏览 0 评论 0原文

我正在处理大文件,直接写入磁盘很慢。由于文件很大,我无法将其加载到 TMemoryStream 中。

TFileStream 没有缓冲,所以我想知道是否有一个自定义库可以提供缓冲流,或者我应该只依赖操作系统提供的缓冲。操作系统缓冲可靠吗?我的意思是,如果缓存已满,旧文件(我的)可能会从缓存中刷新,以便为新文件腾出空间。

我的文件在 GB 范围内。它包含数百万条记录。不幸的是,记录的大小不固定。因此,我必须进行数百万次读取(4 到 500 字节之间)。阅读(和写作)是连续的。我不会在文件中上下跳动(我认为这对于缓冲来说是理想的)。

最后,我必须将这样的文件写回磁盘(再次进行数百万次小写入)。


David 提供了他的个人库,该库提供缓冲磁盘访问。

   Speed tests:
     Input file: 317MB.SFF
     Delphi stream: 9.84sec
     David's stream: 2.05sec
     ______________________________________

   More tests:
     Input file: input2_700MB.txt
     Lines: 19 millions
     Compiler optimization: ON
     I/O check: On
     FastMM: release mode
     **HDD**   

     Reading: **linear** (ReadLine) (PS: multiply time with 10)      
      We see clear performance drop at 8KB. Recommended 16 or 32KB
        Time: 618 ms  Cache size: 64KB.
        Time: 622 ms  Cache size: 128KB.
        Time: 622 ms  Cache size: 24KB.
        Time: 622 ms  Cache size: 32KB.
        Time: 622 ms  Cache size: 64KB.
        Time: 624 ms  Cache size: 256KB.
        Time: 625 ms  Cache size: 18KB.
        Time: 626 ms  Cache size: 26KB.
        Time: 626 ms  Cache size: 1024KB.
        Time: 626 ms  Cache size: 16KB.
        Time: 628 ms  Cache size: 42KB.
        Time: 644 ms  Cache size: 8KB.      <--- no difference until 8K
        Time: 664 ms  Cache size: 4KB.
        Time: 705 ms  Cache size: 2KB.
        Time: 791 ms  Cache size: 1KB.
        Time: 795 ms  Cache size: 1KB.

      **SSD**
      We see a small improvement as we go towards higher buffers. Recommended 16 or 32KB
        Time: 610 ms  Cache size: 128KB.
        Time: 611 ms  Cache size: 256KB.
        Time: 614 ms  Cache size: 32KB.
        Time: 623 ms  Cache size: 16KB.
        Time: 625 ms  Cache size: 66KB.
        Time: 639 ms  Cache size: 8KB.       <--- definitively not good with 8K
        Time: 660 ms  Cache size: 4KB.
     ______

     Reading: **Random** (ReadInteger) (100000 reads)
     SSD
       Time: 064 ms. Cache size: 1KB.   Count: 100000.  RAM: 13.27 MB         <-- probably the best buffer size for ReadInteger is 4bytes!
       Time: 067 ms. Cache size: 2KB.   Count: 100000.  RAM: 13.27 MB
       Time: 080 ms. Cache size: 4KB.   Count: 100000.  RAM: 13.27 MB
       Time: 098 ms. Cache size: 8KB.   Count: 100000.  RAM: 13.27 MB
       Time: 140 ms. Cache size: 16KB.  Count: 100000.  RAM: 13.27 MB
       Time: 213 ms. Cache size: 32KB.  Count: 100000.  RAM: 13.27 MB
       Time: 360 ms. Cache size: 64KB.  Count: 100000.  RAM: 13.27 MB
       Conclusion: don't use it for "random" reading   

2020 年更新:
按顺序读取时,新的 System.Classes.TBufferedFileStream 似乎比上面提供的库快 70%。

I am working with large files and writing directly to disk is slow. Because the file is large I cannot load it in a TMemoryStream.

TFileStream is not buffered so I want to know if there is a custom library that can offer buffered streams or should I rely only on the buffering offered by OS. Is the OS buffering reliable? I mean if the cache is full an old file (mine) might be flushed from cache in order to make room for a new file.

My file is in the GB range. It contains millions of records. Unfortunately, the records are not of fix size. So, I have to do millions of readings (between 4 and 500 bytes). The reading (and the writing) is sequential. I don't jump up and down into the file (which I think is ideal for buffering).

In the end, I have to write such file back to disk (again millions of small writes).


David provided the his personal library that provides buffered disk access.

   Speed tests:
     Input file: 317MB.SFF
     Delphi stream: 9.84sec
     David's stream: 2.05sec
     ______________________________________

   More tests:
     Input file: input2_700MB.txt
     Lines: 19 millions
     Compiler optimization: ON
     I/O check: On
     FastMM: release mode
     **HDD**   

     Reading: **linear** (ReadLine) (PS: multiply time with 10)      
      We see clear performance drop at 8KB. Recommended 16 or 32KB
        Time: 618 ms  Cache size: 64KB.
        Time: 622 ms  Cache size: 128KB.
        Time: 622 ms  Cache size: 24KB.
        Time: 622 ms  Cache size: 32KB.
        Time: 622 ms  Cache size: 64KB.
        Time: 624 ms  Cache size: 256KB.
        Time: 625 ms  Cache size: 18KB.
        Time: 626 ms  Cache size: 26KB.
        Time: 626 ms  Cache size: 1024KB.
        Time: 626 ms  Cache size: 16KB.
        Time: 628 ms  Cache size: 42KB.
        Time: 644 ms  Cache size: 8KB.      <--- no difference until 8K
        Time: 664 ms  Cache size: 4KB.
        Time: 705 ms  Cache size: 2KB.
        Time: 791 ms  Cache size: 1KB.
        Time: 795 ms  Cache size: 1KB.

      **SSD**
      We see a small improvement as we go towards higher buffers. Recommended 16 or 32KB
        Time: 610 ms  Cache size: 128KB.
        Time: 611 ms  Cache size: 256KB.
        Time: 614 ms  Cache size: 32KB.
        Time: 623 ms  Cache size: 16KB.
        Time: 625 ms  Cache size: 66KB.
        Time: 639 ms  Cache size: 8KB.       <--- definitively not good with 8K
        Time: 660 ms  Cache size: 4KB.
     ______

     Reading: **Random** (ReadInteger) (100000 reads)
     SSD
       Time: 064 ms. Cache size: 1KB.   Count: 100000.  RAM: 13.27 MB         <-- probably the best buffer size for ReadInteger is 4bytes!
       Time: 067 ms. Cache size: 2KB.   Count: 100000.  RAM: 13.27 MB
       Time: 080 ms. Cache size: 4KB.   Count: 100000.  RAM: 13.27 MB
       Time: 098 ms. Cache size: 8KB.   Count: 100000.  RAM: 13.27 MB
       Time: 140 ms. Cache size: 16KB.  Count: 100000.  RAM: 13.27 MB
       Time: 213 ms. Cache size: 32KB.  Count: 100000.  RAM: 13.27 MB
       Time: 360 ms. Cache size: 64KB.  Count: 100000.  RAM: 13.27 MB
       Conclusion: don't use it for "random" reading   

Update 2020:
When reading sequentially, the new System.Classes.TBufferedFileStream seems to be 70% faster than the library presented above.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

人间☆小暴躁 2024-11-07 16:59:31

Windows 文件缓存非常有效,特别是当您使用 Vista 或更高版本时。 TFileStream 是 Windows ReadFile()WriteFile() API 函数的松散包装,对于许多用例来说,唯一更快的是内存映射文件。

但是,在一种常见情况下,TFileStream 会成为性能瓶颈。也就是说,如果您每次调用流读取或写入函数时读取或写入少量数据。例如,如果您一次读取一个整数数组中的一项,那么在调用 ReadFile() 时一次读取 4 个字节会产生很大的开销。

同样,内存映射文件是解决此瓶颈的绝佳方法,但另一种常用的方法是读取更大的缓冲区(例如数千字节),然后解决将来从内存缓存中读取流的问题,而不是进一步调用ReadFile()。这种方法仅适用于顺序访问。


从更新的问题中描述的使用模式来看,我认为您可能会发现以下类可以提高您的性能:

unit BufferedFileStream;

interface

uses
  SysUtils, Math, Classes, Windows;

type
  TBaseCachedFileStream = class(TStream)
  private
    function QueryInterface(const IID: TGUID; out Obj): HResult; stdcall;
    function _AddRef: Integer; stdcall;
    function _Release: Integer; stdcall;
  protected
    FHandle: THandle;
    FOwnsHandle: Boolean;
    FCache: PByte;
    FCacheSize: Integer;
    FPosition: Int64;//the current position in the file (relative to the beginning of the file)
    FCacheStart: Int64;//the postion in the file of the start of the cache (relative to the beginning of the file)
    FCacheEnd: Int64;//the postion in the file of the end of the cache (relative to the beginning of the file)
    FFileName: string;
    FLastError: DWORD;
    procedure HandleError(const Msg: string);
    procedure RaiseSystemError(const Msg: string; LastError: DWORD); overload;
    procedure RaiseSystemError(const Msg: string); overload;
    procedure RaiseSystemErrorFmt(const Msg: string; const Args: array of const);
    function CreateHandle(FlagsAndAttributes: DWORD): THandle; virtual; abstract;
    function GetFileSize: Int64; virtual;
    procedure SetSize(NewSize: Longint); override;
    procedure SetSize(const NewSize: Int64); override;
    function FileRead(var Buffer; Count: Longword): Integer;
    function FileWrite(const Buffer; Count: Longword): Integer;
    function FileSeek(const Offset: Int64; Origin: TSeekOrigin): Int64;
  public
    constructor Create(const FileName: string); overload;
    constructor Create(const FileName: string; CacheSize: Integer); overload;
    constructor Create(const FileName: string; CacheSize: Integer; Handle: THandle); overload; virtual;
    destructor Destroy; override;
    property CacheSize: Integer read FCacheSize;
    function Read(var Buffer; Count: Longint): Longint; override;
    function Write(const Buffer; Count: Longint): Longint; override;
    function Seek(const Offset: Int64; Origin: TSeekOrigin): Int64; override;
  end;
  TBaseCachedFileStreamClass = class of TBaseCachedFileStream;

  IDisableStreamReadCache = interface
    ['{0B6D0004-88D1-42D5-BC0F-447911C0FC21}']
    procedure DisableStreamReadCache;
    procedure EnableStreamReadCache;
  end;

  TReadOnlyCachedFileStream = class(TBaseCachedFileStream, IDisableStreamReadCache)
  (* This class works by filling the cache each time a call to Read is made and
     FPosition is outside the existing cache.  By filling the cache we mean
     reading from the file into the temporary cache.  Calls to Read when
     FPosition is in the existing cache are then dealt with by filling the
     buffer with bytes from the cache.
  *)
  private
    FUseAlignedCache: Boolean;
    FViewStart: Int64;
    FViewLength: Int64;
    FDisableStreamReadCacheRefCount: Integer;
    procedure DisableStreamReadCache;
    procedure EnableStreamReadCache;
    procedure FlushCache;
  protected
    function CreateHandle(FlagsAndAttributes: DWORD): THandle; override;
    function GetFileSize: Int64; override;
  public
    constructor Create(const FileName: string; CacheSize: Integer; Handle: THandle); overload; override;
    property UseAlignedCache: Boolean read FUseAlignedCache write FUseAlignedCache;
    function Read(var Buffer; Count: Longint): Longint; override;
    procedure SetViewWindow(const ViewStart, ViewLength: Int64);
  end;

  TWriteCachedFileStream = class(TBaseCachedFileStream, IDisableStreamReadCache)
  (* This class works by caching calls to Write.  By this we mean temporarily
     storing the bytes to be written in the cache.  As each call to Write is
     processed the cache grows.  The cache is written to file when:
       1.  A call to Write is made when the cache is full.
       2.  A call to Write is made and FPosition is outside the cache (this
           must be as a result of a call to Seek).
       3.  The class is destroyed.

     Note that data can be read from these streams but the reading is not
     cached and in fact a read operation will flush the cache before
     attempting to read the data.
  *)
  private
    FFileSize: Int64;
    FReadStream: TReadOnlyCachedFileStream;
    FReadStreamCacheSize: Integer;
    FReadStreamUseAlignedCache: Boolean;
    procedure DisableStreamReadCache;
    procedure EnableStreamReadCache;
    procedure CreateReadStream;
    procedure FlushCache;
  protected
    function CreateHandle(FlagsAndAttributes: DWORD): THandle; override;
    function GetFileSize: Int64; override;
  public
    constructor Create(const FileName: string; CacheSize, ReadStreamCacheSize: Integer; ReadStreamUseAlignedCache: Boolean); overload;
    destructor Destroy; override;
    function Read(var Buffer; Count: Longint): Longint; override;
    function Write(const Buffer; Count: Longint): Longint; override;
  end;

implementation

function GetFileSizeEx(hFile: THandle; var FileSize: Int64): BOOL; stdcall; external kernel32;
function SetFilePointerEx(hFile: THandle; DistanceToMove: Int64; lpNewFilePointer: PInt64; dwMoveMethod: DWORD): BOOL; stdcall; external kernel32;

{ TBaseCachedFileStream }

constructor TBaseCachedFileStream.Create(const FileName: string);
begin
  Create(FileName, 0);
end;

constructor TBaseCachedFileStream.Create(const FileName: string; CacheSize: Integer);
begin
  Create(FileName, CacheSize, 0);
end;

constructor TBaseCachedFileStream.Create(const FileName: string; CacheSize: Integer; Handle: THandle);
const
  DefaultCacheSize = 16*1024;
  //16kb - this was chosen empirically - don't make it too large otherwise the progress report is 'jerky'
begin
  inherited Create;
  FFileName := FileName;
  FOwnsHandle := Handle=0;
  if FOwnsHandle then begin
    FHandle := CreateHandle(FILE_ATTRIBUTE_NORMAL);
  end else begin
    FHandle := Handle;
  end;
  FCacheSize := CacheSize;
  if FCacheSize<=0 then begin
    FCacheSize := DefaultCacheSize;
  end;
  GetMem(FCache, FCacheSize);
end;

destructor TBaseCachedFileStream.Destroy;
begin
  FreeMem(FCache);
  if FOwnsHandle and (FHandle<>0) then begin
    CloseHandle(FHandle);
  end;
  inherited;
end;

function TBaseCachedFileStream.QueryInterface(const IID: TGUID; out Obj): HResult;
begin
  if GetInterface(IID, Obj) then begin
    Result := S_OK;
  end else begin
    Result := E_NOINTERFACE;
  end;
end;

function TBaseCachedFileStream._AddRef: Integer;
begin
  Result := -1;
end;

function TBaseCachedFileStream._Release: Integer;
begin
  Result := -1;
end;

procedure TBaseCachedFileStream.HandleError(const Msg: string);
begin
  if FLastError<>0 then begin
    RaiseSystemError(Msg, FLastError);
  end;
end;

procedure TBaseCachedFileStream.RaiseSystemError(const Msg: string; LastError: DWORD);
begin
  raise EStreamError.Create(Trim(Msg+'  ')+SysErrorMessage(LastError));
end;

procedure TBaseCachedFileStream.RaiseSystemError(const Msg: string);
begin
  RaiseSystemError(Msg, GetLastError);
end;

procedure TBaseCachedFileStream.RaiseSystemErrorFmt(const Msg: string; const Args: array of const);
var
  LastError: DWORD;
begin
  LastError := GetLastError; // must call GetLastError before Format
  RaiseSystemError(Format(Msg, Args), LastError);
end;

function TBaseCachedFileStream.GetFileSize: Int64;
begin
  if not GetFileSizeEx(FHandle, Result) then begin
    RaiseSystemErrorFmt('GetFileSizeEx failed for %s.', [FFileName]);
  end;
end;

procedure TBaseCachedFileStream.SetSize(NewSize: Longint);
begin
  SetSize(Int64(NewSize));
end;

procedure TBaseCachedFileStream.SetSize(const NewSize: Int64);
begin
  Seek(NewSize, soBeginning);
  if not Windows.SetEndOfFile(FHandle) then begin
    RaiseSystemErrorFmt('SetEndOfFile for %s.', [FFileName]);
  end;
end;

function TBaseCachedFileStream.FileRead(var Buffer; Count: Longword): Integer;
begin
  if Windows.ReadFile(FHandle, Buffer, Count, LongWord(Result), nil) then begin
    FLastError := 0;
  end else begin
    FLastError := GetLastError;
    Result := -1;
  end;
end;

function TBaseCachedFileStream.FileWrite(const Buffer; Count: Longword): Integer;
begin
  if Windows.WriteFile(FHandle, Buffer, Count, LongWord(Result), nil) then begin
    FLastError := 0;
  end else begin
    FLastError := GetLastError;
    Result := -1;
  end;
end;

function TBaseCachedFileStream.FileSeek(const Offset: Int64; Origin: TSeekOrigin): Int64;
begin
  if not SetFilePointerEx(FHandle, Offset, @Result, ord(Origin)) then begin
    RaiseSystemErrorFmt('SetFilePointerEx failed for %s.', [FFileName]);
  end;
end;

function TBaseCachedFileStream.Read(var Buffer; Count: Integer): Longint;
begin
  raise EAssertionFailed.Create('Cannot read from this stream');
end;

function TBaseCachedFileStream.Write(const Buffer; Count: Integer): Longint;
begin
  raise EAssertionFailed.Create('Cannot write to this stream');
end;

function TBaseCachedFileStream.Seek(const Offset: Int64; Origin: TSeekOrigin): Int64;
//Set FPosition to the value specified - if this has implications for the
//cache then overriden Write and Read methods must deal with those.
begin
  case Origin of
  soBeginning:
    FPosition := Offset;
  soEnd:
    FPosition := GetFileSize+Offset;
  soCurrent:
    inc(FPosition, Offset);
  end;
  Result := FPosition;
end;

{ TReadOnlyCachedFileStream }

constructor TReadOnlyCachedFileStream.Create(const FileName: string; CacheSize: Integer; Handle: THandle);
begin
  inherited;
  SetViewWindow(0, inherited GetFileSize);
end;

function TReadOnlyCachedFileStream.CreateHandle(FlagsAndAttributes: DWORD): THandle;
begin
  Result := Windows.CreateFile(
    PChar(FFileName),
    GENERIC_READ,
    FILE_SHARE_READ,
    nil,
    OPEN_EXISTING,
    FlagsAndAttributes,
    0
  );
  if Result=INVALID_HANDLE_VALUE then begin
    RaiseSystemErrorFmt('Cannot open %s.', [FFileName]);
  end;
end;

procedure TReadOnlyCachedFileStream.DisableStreamReadCache;
begin
  inc(FDisableStreamReadCacheRefCount);
end;

procedure TReadOnlyCachedFileStream.EnableStreamReadCache;
begin
  dec(FDisableStreamReadCacheRefCount);
end;

procedure TReadOnlyCachedFileStream.FlushCache;
begin
  FCacheStart := 0;
  FCacheEnd := 0;
end;

function TReadOnlyCachedFileStream.GetFileSize: Int64;
begin
  Result := FViewLength;
end;

procedure TReadOnlyCachedFileStream.SetViewWindow(const ViewStart, ViewLength: Int64);
begin
  if ViewStart<0 then begin
    raise EAssertionFailed.Create('Invalid view window');
  end;
  if (ViewStart+ViewLength)>inherited GetFileSize then begin
    raise EAssertionFailed.Create('Invalid view window');
  end;
  FViewStart := ViewStart;
  FViewLength := ViewLength;
  FPosition := 0;
  FCacheStart := 0;
  FCacheEnd := 0;
end;

function TReadOnlyCachedFileStream.Read(var Buffer; Count: Longint): Longint;
var
  NumOfBytesToCopy, NumOfBytesLeft, NumOfBytesRead: Longint;
  CachePtr, BufferPtr: PByte;
begin
  if FDisableStreamReadCacheRefCount>0 then begin
    FileSeek(FPosition+FViewStart, soBeginning);
    Result := FileRead(Buffer, Count);
    if Result=-1 then begin
      Result := 0;//contract is to return number of bytes that were read
    end;
    inc(FPosition, Result);
  end else begin
    Result := 0;
    NumOfBytesLeft := Count;
    BufferPtr := @Buffer;
    while NumOfBytesLeft>0 do begin
      if (FPosition<FCacheStart) or (FPosition>=FCacheEnd) then begin
        //the current position is not available in the cache so we need to re-fill the cache
        FCacheStart := FPosition;
        if UseAlignedCache then begin
          FCacheStart := FCacheStart - (FCacheStart mod CacheSize);
        end;
        FileSeek(FCacheStart+FViewStart, soBeginning);
        NumOfBytesRead := FileRead(FCache^, CacheSize);
        if NumOfBytesRead=-1 then begin
          exit;
        end;
        Assert(NumOfBytesRead>=0);
        FCacheEnd := FCacheStart+NumOfBytesRead;
        if NumOfBytesRead=0 then begin
          FLastError := ERROR_HANDLE_EOF;//must be at the end of the file
          break;
        end;
      end;

      //read from cache to Buffer
      NumOfBytesToCopy := Min(FCacheEnd-FPosition, NumOfBytesLeft);
      CachePtr := FCache;
      inc(CachePtr, FPosition-FCacheStart);
      Move(CachePtr^, BufferPtr^, NumOfBytesToCopy);
      inc(Result, NumOfBytesToCopy);
      inc(FPosition, NumOfBytesToCopy);
      inc(BufferPtr, NumOfBytesToCopy);
      dec(NumOfBytesLeft, NumOfBytesToCopy);
    end;
  end;
end;

{ TWriteCachedFileStream }

constructor TWriteCachedFileStream.Create(const FileName: string; CacheSize, ReadStreamCacheSize: Integer; ReadStreamUseAlignedCache: Boolean);
begin
  inherited Create(FileName, CacheSize);
  FReadStreamCacheSize := ReadStreamCacheSize;
  FReadStreamUseAlignedCache := ReadStreamUseAlignedCache;
end;

destructor TWriteCachedFileStream.Destroy;
begin
  FlushCache;//make sure that the final calls to Write get recorded in the file
  FreeAndNil(FReadStream);
  inherited;
end;

function TWriteCachedFileStream.CreateHandle(FlagsAndAttributes: DWORD): THandle;
begin
  Result := Windows.CreateFile(
    PChar(FFileName),
    GENERIC_READ or GENERIC_WRITE,
    0,
    nil,
    CREATE_ALWAYS,
    FlagsAndAttributes,
    0
  );
  if Result=INVALID_HANDLE_VALUE then begin
    RaiseSystemErrorFmt('Cannot create %s.', [FFileName]);
  end;
end;

procedure TWriteCachedFileStream.DisableStreamReadCache;
begin
  CreateReadStream;
  FReadStream.DisableStreamReadCache;
end;

procedure TWriteCachedFileStream.EnableStreamReadCache;
begin
  Assert(Assigned(FReadStream));
  FReadStream.EnableStreamReadCache;
end;

function TWriteCachedFileStream.GetFileSize: Int64;
begin
  Result := FFileSize;
end;

procedure TWriteCachedFileStream.CreateReadStream;
begin
  if not Assigned(FReadStream) then begin
    FReadStream := TReadOnlyCachedFileStream.Create(FFileName, FReadStreamCacheSize, FHandle);
    FReadStream.UseAlignedCache := FReadStreamUseAlignedCache;
  end;
end;

procedure TWriteCachedFileStream.FlushCache;
var
  NumOfBytesToWrite: Longint;
begin
  if Assigned(FCache) then begin
    NumOfBytesToWrite := FCacheEnd-FCacheStart;
    if NumOfBytesToWrite>0 then begin
      FileSeek(FCacheStart, soBeginning);
      if FileWrite(FCache^, NumOfBytesToWrite)<>NumOfBytesToWrite then begin
        RaiseSystemErrorFmt('FileWrite failed for %s.', [FFileName]);
      end;
      if Assigned(FReadStream) then begin
        FReadStream.FlushCache;
      end;
    end;
    FCacheStart := FPosition;
    FCacheEnd := FPosition;
  end;
end;

function TWriteCachedFileStream.Read(var Buffer; Count: Integer): Longint;
begin
  FlushCache;
  CreateReadStream;
  Assert(FReadStream.FViewStart=0);
  if FReadStream.FViewLength<>FFileSize then begin
    FReadStream.SetViewWindow(0, FFileSize);
  end;
  FReadStream.Position := FPosition;
  Result := FReadStream.Read(Buffer, Count);
  inc(FPosition, Result);
end;

function TWriteCachedFileStream.Write(const Buffer; Count: Longint): Longint;
var
  NumOfBytesToCopy, NumOfBytesLeft: Longint;
  CachePtr, BufferPtr: PByte;
begin
  Result := 0;
  NumOfBytesLeft := Count;
  BufferPtr := @Buffer;
  while NumOfBytesLeft>0 do begin
    if ((FPosition<FCacheStart) or (FPosition>FCacheEnd))//the current position is outside the cache
    or (FPosition-FCacheStart=FCacheSize)//the cache is full
    then begin
      FlushCache;
      Assert(FCacheStart=FPosition);
    end;

    //write from Buffer to the cache
    NumOfBytesToCopy := Min(FCacheSize-(FPosition-FCacheStart), NumOfBytesLeft);
    CachePtr := FCache;
    inc(CachePtr, FPosition-FCacheStart);
    Move(BufferPtr^, CachePtr^, NumOfBytesToCopy);
    inc(Result, NumOfBytesToCopy);
    inc(FPosition, NumOfBytesToCopy);
    FCacheEnd := Max(FCacheEnd, FPosition);
    inc(BufferPtr, NumOfBytesToCopy);
    dec(NumOfBytesLeft, NumOfBytesToCopy);
  end;
  FFileSize := Max(FFileSize, FPosition);
end;

end.

Windows file caching is very effective, especially if you are using Vista or later. TFileStream is a loose wrapper around the Windows ReadFile() and WriteFile() API functions and for many use cases the only thing faster is a memory mapped file.

However, there is one common scenario where TFileStream becomes a performance bottleneck. That is if you read or write small amounts of data with each call to the stream read or write functions. For example if you read an array of integers one item at a time then you incur a significant overhead by reading 4 bytes at a time in the calls to ReadFile().

Again, memory mapped files are an excellent way to solve this bottleneck, but the other commonly used approach is to read a much larger buffer, many kilobytes say, and then resolve future reads of the stream from this in memory cache rather than further calls to ReadFile(). This approach only really works for sequential access.


From the use pattern described in your updated question, I think you may find the following classes would improve performance for you:

unit BufferedFileStream;

interface

uses
  SysUtils, Math, Classes, Windows;

type
  TBaseCachedFileStream = class(TStream)
  private
    function QueryInterface(const IID: TGUID; out Obj): HResult; stdcall;
    function _AddRef: Integer; stdcall;
    function _Release: Integer; stdcall;
  protected
    FHandle: THandle;
    FOwnsHandle: Boolean;
    FCache: PByte;
    FCacheSize: Integer;
    FPosition: Int64;//the current position in the file (relative to the beginning of the file)
    FCacheStart: Int64;//the postion in the file of the start of the cache (relative to the beginning of the file)
    FCacheEnd: Int64;//the postion in the file of the end of the cache (relative to the beginning of the file)
    FFileName: string;
    FLastError: DWORD;
    procedure HandleError(const Msg: string);
    procedure RaiseSystemError(const Msg: string; LastError: DWORD); overload;
    procedure RaiseSystemError(const Msg: string); overload;
    procedure RaiseSystemErrorFmt(const Msg: string; const Args: array of const);
    function CreateHandle(FlagsAndAttributes: DWORD): THandle; virtual; abstract;
    function GetFileSize: Int64; virtual;
    procedure SetSize(NewSize: Longint); override;
    procedure SetSize(const NewSize: Int64); override;
    function FileRead(var Buffer; Count: Longword): Integer;
    function FileWrite(const Buffer; Count: Longword): Integer;
    function FileSeek(const Offset: Int64; Origin: TSeekOrigin): Int64;
  public
    constructor Create(const FileName: string); overload;
    constructor Create(const FileName: string; CacheSize: Integer); overload;
    constructor Create(const FileName: string; CacheSize: Integer; Handle: THandle); overload; virtual;
    destructor Destroy; override;
    property CacheSize: Integer read FCacheSize;
    function Read(var Buffer; Count: Longint): Longint; override;
    function Write(const Buffer; Count: Longint): Longint; override;
    function Seek(const Offset: Int64; Origin: TSeekOrigin): Int64; override;
  end;
  TBaseCachedFileStreamClass = class of TBaseCachedFileStream;

  IDisableStreamReadCache = interface
    ['{0B6D0004-88D1-42D5-BC0F-447911C0FC21}']
    procedure DisableStreamReadCache;
    procedure EnableStreamReadCache;
  end;

  TReadOnlyCachedFileStream = class(TBaseCachedFileStream, IDisableStreamReadCache)
  (* This class works by filling the cache each time a call to Read is made and
     FPosition is outside the existing cache.  By filling the cache we mean
     reading from the file into the temporary cache.  Calls to Read when
     FPosition is in the existing cache are then dealt with by filling the
     buffer with bytes from the cache.
  *)
  private
    FUseAlignedCache: Boolean;
    FViewStart: Int64;
    FViewLength: Int64;
    FDisableStreamReadCacheRefCount: Integer;
    procedure DisableStreamReadCache;
    procedure EnableStreamReadCache;
    procedure FlushCache;
  protected
    function CreateHandle(FlagsAndAttributes: DWORD): THandle; override;
    function GetFileSize: Int64; override;
  public
    constructor Create(const FileName: string; CacheSize: Integer; Handle: THandle); overload; override;
    property UseAlignedCache: Boolean read FUseAlignedCache write FUseAlignedCache;
    function Read(var Buffer; Count: Longint): Longint; override;
    procedure SetViewWindow(const ViewStart, ViewLength: Int64);
  end;

  TWriteCachedFileStream = class(TBaseCachedFileStream, IDisableStreamReadCache)
  (* This class works by caching calls to Write.  By this we mean temporarily
     storing the bytes to be written in the cache.  As each call to Write is
     processed the cache grows.  The cache is written to file when:
       1.  A call to Write is made when the cache is full.
       2.  A call to Write is made and FPosition is outside the cache (this
           must be as a result of a call to Seek).
       3.  The class is destroyed.

     Note that data can be read from these streams but the reading is not
     cached and in fact a read operation will flush the cache before
     attempting to read the data.
  *)
  private
    FFileSize: Int64;
    FReadStream: TReadOnlyCachedFileStream;
    FReadStreamCacheSize: Integer;
    FReadStreamUseAlignedCache: Boolean;
    procedure DisableStreamReadCache;
    procedure EnableStreamReadCache;
    procedure CreateReadStream;
    procedure FlushCache;
  protected
    function CreateHandle(FlagsAndAttributes: DWORD): THandle; override;
    function GetFileSize: Int64; override;
  public
    constructor Create(const FileName: string; CacheSize, ReadStreamCacheSize: Integer; ReadStreamUseAlignedCache: Boolean); overload;
    destructor Destroy; override;
    function Read(var Buffer; Count: Longint): Longint; override;
    function Write(const Buffer; Count: Longint): Longint; override;
  end;

implementation

function GetFileSizeEx(hFile: THandle; var FileSize: Int64): BOOL; stdcall; external kernel32;
function SetFilePointerEx(hFile: THandle; DistanceToMove: Int64; lpNewFilePointer: PInt64; dwMoveMethod: DWORD): BOOL; stdcall; external kernel32;

{ TBaseCachedFileStream }

constructor TBaseCachedFileStream.Create(const FileName: string);
begin
  Create(FileName, 0);
end;

constructor TBaseCachedFileStream.Create(const FileName: string; CacheSize: Integer);
begin
  Create(FileName, CacheSize, 0);
end;

constructor TBaseCachedFileStream.Create(const FileName: string; CacheSize: Integer; Handle: THandle);
const
  DefaultCacheSize = 16*1024;
  //16kb - this was chosen empirically - don't make it too large otherwise the progress report is 'jerky'
begin
  inherited Create;
  FFileName := FileName;
  FOwnsHandle := Handle=0;
  if FOwnsHandle then begin
    FHandle := CreateHandle(FILE_ATTRIBUTE_NORMAL);
  end else begin
    FHandle := Handle;
  end;
  FCacheSize := CacheSize;
  if FCacheSize<=0 then begin
    FCacheSize := DefaultCacheSize;
  end;
  GetMem(FCache, FCacheSize);
end;

destructor TBaseCachedFileStream.Destroy;
begin
  FreeMem(FCache);
  if FOwnsHandle and (FHandle<>0) then begin
    CloseHandle(FHandle);
  end;
  inherited;
end;

function TBaseCachedFileStream.QueryInterface(const IID: TGUID; out Obj): HResult;
begin
  if GetInterface(IID, Obj) then begin
    Result := S_OK;
  end else begin
    Result := E_NOINTERFACE;
  end;
end;

function TBaseCachedFileStream._AddRef: Integer;
begin
  Result := -1;
end;

function TBaseCachedFileStream._Release: Integer;
begin
  Result := -1;
end;

procedure TBaseCachedFileStream.HandleError(const Msg: string);
begin
  if FLastError<>0 then begin
    RaiseSystemError(Msg, FLastError);
  end;
end;

procedure TBaseCachedFileStream.RaiseSystemError(const Msg: string; LastError: DWORD);
begin
  raise EStreamError.Create(Trim(Msg+'  ')+SysErrorMessage(LastError));
end;

procedure TBaseCachedFileStream.RaiseSystemError(const Msg: string);
begin
  RaiseSystemError(Msg, GetLastError);
end;

procedure TBaseCachedFileStream.RaiseSystemErrorFmt(const Msg: string; const Args: array of const);
var
  LastError: DWORD;
begin
  LastError := GetLastError; // must call GetLastError before Format
  RaiseSystemError(Format(Msg, Args), LastError);
end;

function TBaseCachedFileStream.GetFileSize: Int64;
begin
  if not GetFileSizeEx(FHandle, Result) then begin
    RaiseSystemErrorFmt('GetFileSizeEx failed for %s.', [FFileName]);
  end;
end;

procedure TBaseCachedFileStream.SetSize(NewSize: Longint);
begin
  SetSize(Int64(NewSize));
end;

procedure TBaseCachedFileStream.SetSize(const NewSize: Int64);
begin
  Seek(NewSize, soBeginning);
  if not Windows.SetEndOfFile(FHandle) then begin
    RaiseSystemErrorFmt('SetEndOfFile for %s.', [FFileName]);
  end;
end;

function TBaseCachedFileStream.FileRead(var Buffer; Count: Longword): Integer;
begin
  if Windows.ReadFile(FHandle, Buffer, Count, LongWord(Result), nil) then begin
    FLastError := 0;
  end else begin
    FLastError := GetLastError;
    Result := -1;
  end;
end;

function TBaseCachedFileStream.FileWrite(const Buffer; Count: Longword): Integer;
begin
  if Windows.WriteFile(FHandle, Buffer, Count, LongWord(Result), nil) then begin
    FLastError := 0;
  end else begin
    FLastError := GetLastError;
    Result := -1;
  end;
end;

function TBaseCachedFileStream.FileSeek(const Offset: Int64; Origin: TSeekOrigin): Int64;
begin
  if not SetFilePointerEx(FHandle, Offset, @Result, ord(Origin)) then begin
    RaiseSystemErrorFmt('SetFilePointerEx failed for %s.', [FFileName]);
  end;
end;

function TBaseCachedFileStream.Read(var Buffer; Count: Integer): Longint;
begin
  raise EAssertionFailed.Create('Cannot read from this stream');
end;

function TBaseCachedFileStream.Write(const Buffer; Count: Integer): Longint;
begin
  raise EAssertionFailed.Create('Cannot write to this stream');
end;

function TBaseCachedFileStream.Seek(const Offset: Int64; Origin: TSeekOrigin): Int64;
//Set FPosition to the value specified - if this has implications for the
//cache then overriden Write and Read methods must deal with those.
begin
  case Origin of
  soBeginning:
    FPosition := Offset;
  soEnd:
    FPosition := GetFileSize+Offset;
  soCurrent:
    inc(FPosition, Offset);
  end;
  Result := FPosition;
end;

{ TReadOnlyCachedFileStream }

constructor TReadOnlyCachedFileStream.Create(const FileName: string; CacheSize: Integer; Handle: THandle);
begin
  inherited;
  SetViewWindow(0, inherited GetFileSize);
end;

function TReadOnlyCachedFileStream.CreateHandle(FlagsAndAttributes: DWORD): THandle;
begin
  Result := Windows.CreateFile(
    PChar(FFileName),
    GENERIC_READ,
    FILE_SHARE_READ,
    nil,
    OPEN_EXISTING,
    FlagsAndAttributes,
    0
  );
  if Result=INVALID_HANDLE_VALUE then begin
    RaiseSystemErrorFmt('Cannot open %s.', [FFileName]);
  end;
end;

procedure TReadOnlyCachedFileStream.DisableStreamReadCache;
begin
  inc(FDisableStreamReadCacheRefCount);
end;

procedure TReadOnlyCachedFileStream.EnableStreamReadCache;
begin
  dec(FDisableStreamReadCacheRefCount);
end;

procedure TReadOnlyCachedFileStream.FlushCache;
begin
  FCacheStart := 0;
  FCacheEnd := 0;
end;

function TReadOnlyCachedFileStream.GetFileSize: Int64;
begin
  Result := FViewLength;
end;

procedure TReadOnlyCachedFileStream.SetViewWindow(const ViewStart, ViewLength: Int64);
begin
  if ViewStart<0 then begin
    raise EAssertionFailed.Create('Invalid view window');
  end;
  if (ViewStart+ViewLength)>inherited GetFileSize then begin
    raise EAssertionFailed.Create('Invalid view window');
  end;
  FViewStart := ViewStart;
  FViewLength := ViewLength;
  FPosition := 0;
  FCacheStart := 0;
  FCacheEnd := 0;
end;

function TReadOnlyCachedFileStream.Read(var Buffer; Count: Longint): Longint;
var
  NumOfBytesToCopy, NumOfBytesLeft, NumOfBytesRead: Longint;
  CachePtr, BufferPtr: PByte;
begin
  if FDisableStreamReadCacheRefCount>0 then begin
    FileSeek(FPosition+FViewStart, soBeginning);
    Result := FileRead(Buffer, Count);
    if Result=-1 then begin
      Result := 0;//contract is to return number of bytes that were read
    end;
    inc(FPosition, Result);
  end else begin
    Result := 0;
    NumOfBytesLeft := Count;
    BufferPtr := @Buffer;
    while NumOfBytesLeft>0 do begin
      if (FPosition<FCacheStart) or (FPosition>=FCacheEnd) then begin
        //the current position is not available in the cache so we need to re-fill the cache
        FCacheStart := FPosition;
        if UseAlignedCache then begin
          FCacheStart := FCacheStart - (FCacheStart mod CacheSize);
        end;
        FileSeek(FCacheStart+FViewStart, soBeginning);
        NumOfBytesRead := FileRead(FCache^, CacheSize);
        if NumOfBytesRead=-1 then begin
          exit;
        end;
        Assert(NumOfBytesRead>=0);
        FCacheEnd := FCacheStart+NumOfBytesRead;
        if NumOfBytesRead=0 then begin
          FLastError := ERROR_HANDLE_EOF;//must be at the end of the file
          break;
        end;
      end;

      //read from cache to Buffer
      NumOfBytesToCopy := Min(FCacheEnd-FPosition, NumOfBytesLeft);
      CachePtr := FCache;
      inc(CachePtr, FPosition-FCacheStart);
      Move(CachePtr^, BufferPtr^, NumOfBytesToCopy);
      inc(Result, NumOfBytesToCopy);
      inc(FPosition, NumOfBytesToCopy);
      inc(BufferPtr, NumOfBytesToCopy);
      dec(NumOfBytesLeft, NumOfBytesToCopy);
    end;
  end;
end;

{ TWriteCachedFileStream }

constructor TWriteCachedFileStream.Create(const FileName: string; CacheSize, ReadStreamCacheSize: Integer; ReadStreamUseAlignedCache: Boolean);
begin
  inherited Create(FileName, CacheSize);
  FReadStreamCacheSize := ReadStreamCacheSize;
  FReadStreamUseAlignedCache := ReadStreamUseAlignedCache;
end;

destructor TWriteCachedFileStream.Destroy;
begin
  FlushCache;//make sure that the final calls to Write get recorded in the file
  FreeAndNil(FReadStream);
  inherited;
end;

function TWriteCachedFileStream.CreateHandle(FlagsAndAttributes: DWORD): THandle;
begin
  Result := Windows.CreateFile(
    PChar(FFileName),
    GENERIC_READ or GENERIC_WRITE,
    0,
    nil,
    CREATE_ALWAYS,
    FlagsAndAttributes,
    0
  );
  if Result=INVALID_HANDLE_VALUE then begin
    RaiseSystemErrorFmt('Cannot create %s.', [FFileName]);
  end;
end;

procedure TWriteCachedFileStream.DisableStreamReadCache;
begin
  CreateReadStream;
  FReadStream.DisableStreamReadCache;
end;

procedure TWriteCachedFileStream.EnableStreamReadCache;
begin
  Assert(Assigned(FReadStream));
  FReadStream.EnableStreamReadCache;
end;

function TWriteCachedFileStream.GetFileSize: Int64;
begin
  Result := FFileSize;
end;

procedure TWriteCachedFileStream.CreateReadStream;
begin
  if not Assigned(FReadStream) then begin
    FReadStream := TReadOnlyCachedFileStream.Create(FFileName, FReadStreamCacheSize, FHandle);
    FReadStream.UseAlignedCache := FReadStreamUseAlignedCache;
  end;
end;

procedure TWriteCachedFileStream.FlushCache;
var
  NumOfBytesToWrite: Longint;
begin
  if Assigned(FCache) then begin
    NumOfBytesToWrite := FCacheEnd-FCacheStart;
    if NumOfBytesToWrite>0 then begin
      FileSeek(FCacheStart, soBeginning);
      if FileWrite(FCache^, NumOfBytesToWrite)<>NumOfBytesToWrite then begin
        RaiseSystemErrorFmt('FileWrite failed for %s.', [FFileName]);
      end;
      if Assigned(FReadStream) then begin
        FReadStream.FlushCache;
      end;
    end;
    FCacheStart := FPosition;
    FCacheEnd := FPosition;
  end;
end;

function TWriteCachedFileStream.Read(var Buffer; Count: Integer): Longint;
begin
  FlushCache;
  CreateReadStream;
  Assert(FReadStream.FViewStart=0);
  if FReadStream.FViewLength<>FFileSize then begin
    FReadStream.SetViewWindow(0, FFileSize);
  end;
  FReadStream.Position := FPosition;
  Result := FReadStream.Read(Buffer, Count);
  inc(FPosition, Result);
end;

function TWriteCachedFileStream.Write(const Buffer; Count: Longint): Longint;
var
  NumOfBytesToCopy, NumOfBytesLeft: Longint;
  CachePtr, BufferPtr: PByte;
begin
  Result := 0;
  NumOfBytesLeft := Count;
  BufferPtr := @Buffer;
  while NumOfBytesLeft>0 do begin
    if ((FPosition<FCacheStart) or (FPosition>FCacheEnd))//the current position is outside the cache
    or (FPosition-FCacheStart=FCacheSize)//the cache is full
    then begin
      FlushCache;
      Assert(FCacheStart=FPosition);
    end;

    //write from Buffer to the cache
    NumOfBytesToCopy := Min(FCacheSize-(FPosition-FCacheStart), NumOfBytesLeft);
    CachePtr := FCache;
    inc(CachePtr, FPosition-FCacheStart);
    Move(BufferPtr^, CachePtr^, NumOfBytesToCopy);
    inc(Result, NumOfBytesToCopy);
    inc(FPosition, NumOfBytesToCopy);
    FCacheEnd := Max(FCacheEnd, FPosition);
    inc(BufferPtr, NumOfBytesToCopy);
    dec(NumOfBytesLeft, NumOfBytesToCopy);
  end;
  FFileSize := Max(FFileSize, FPosition);
end;

end.
离去的眼神 2024-11-07 16:59:31

TFileStream 类在内部使用 CreateFile 函数始终使用缓冲区来管理文件,除非您指定 FILE_FLAG_NO_BUFFERING 标志(请注意,您不能直接使用 TFileStream 指定此标志)。为了
有关详细信息,您可以查看这些链接

您也可以尝试 TGpHugeFileStream,它是 的一部分来自 Primoz Gabrijelcic 的 GpHugeFile 单元。

The TFileStream class internally uses the CreateFile function which always uses a buffer to manage the file, unless which you specify the FILE_FLAG_NO_BUFFERING flag (be aware which you can't specify this flag directly using the TFileStream). for
more information you can check these links

also you can try the TGpHugeFileStream which is part of the GpHugeFile unit from Primoz Gabrijelcic.

悲喜皆因你 2024-11-07 16:59:31

为了大家的兴趣:Embarcadero 添加了 TBufferedFileStream请参阅文档)在Delphi 10.1 Berlin的最新版本中。

不幸的是,我无法说出它如何与此处给出的解决方案竞争,因为我还没有购买更新。我也知道这个问题是在 Delphi 7 上提出的,但我确信对 Delphi 自己的实现的引用将来会很有用。

For everybody's interest: Embarcadero added TBufferedFileStream (see the documentation) in the latest Release of Delphi 10.1 Berlin.

Unfortunately, I can't say how it competes with the solutions given here as I haven't bought the update yet. I am also aware of that the question was asked on Delphi 7 but I am sure the reference to Delphi's own implementation can be useful in the future.

凉墨 2024-11-07 16:59:31

如果您有很多这种代码:

while Stream.Position < Stream.Size do

您可以通过将 FileStream.Size 缓存到变量来优化它,它会加快速度。 Stream.Size 使用三个虚拟函数调用来找出实际大小。

If you have this kind of code a lot:

while Stream.Position < Stream.Size do

You can optimize it by caching the FileStream.Size to a variable and it will speed up. Stream.Size uses three virtual function calls to find out the actual size.

め可乐爱微笑 2024-11-07 16:59:31

所以 TFileStream 速度很慢,它从磁盘读取所有内容。
并且TMemoryStream不能足够大。 (如果你这么说的话)

那为什么不使用一个 TFileStream 将一个最大 100MB 的块加载到 TMemoryStream 中进行处理呢?这可以通过一个简单的预解析器来完成,该预解析器仅查看数据中的大小标头,但这会恢复您的问题。

让你的代码意识到它的大文件可能会出现错误并完全避免这种情况并不是一件坏事:允许它处理来自 TMemoryStream 的(不完整)块,如果需要的话,这也会带来线程增强功能(硬盘访问不是瓶颈)。

So a TFileStream is to slow, it reads everything from disk.
And a TMemoryStream can not be large enough. (if you say so)

Then why not use a TFileStream that loads one chunks upto 100MB into a TMemoryStream for processing? This could be done by a simple preparser that just looks at size headers in your data, but that would reinstate your problem.

Its not a bad thing to have your code realize its big file can misbehave and avoid that altogether: allow it to handle (incomplete) chunks from TMemoryStream, this also bring threading enhancements into view (hdd access not being the bottleneck) if needed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文