在Delphi中读取大文件的最快方法是什么？

发布于 2024-10-10 19:30:41 字数 884 浏览 7 评论 0原文

我的程序需要通过随机访问从一个巨大的二进制文件中读取块。我有一个偏移量和长度的列表，可能有几千个条目。用户选择一个条目，程序会查找偏移量并读取长度字节。

程序内部使用TMemoryStream来存储和处理从文件读取的块。读取数据是通过 TFileStream 完成的，如下所示：

FileStream.Position := Offset;
MemoryStream.CopyFrom(FileStream, Size);

这工作正常，但不幸的是，随着文件变大，它变得越来越慢。文件大小从几兆字节开始，但经常达到几十千兆字节。读取的块大小约为 100 KB。

该文件的内容只能由我的程序读取。它是当时访问该文件的唯一程序。此外，文件存储在本地，因此这不是网络问题。

我在 Windows XP 机器上使用 Delphi 2007。

我可以采取什么措施来加快此文件的访问速度？

编辑：

对于大文件，无论读取文件的哪一部分，文件访问都很慢。
程序通常不会顺序读取文件。块的顺序是用户驱动的并且无法预测。
从大文件中读取块总是比从小文件中读取同样大的块慢。
我谈论的是从文件中读取块的性能，而不是处理整个文件所需的总时间。对于较大的文件，后者显然需要更长的时间，但这不是这里的问题。

我需要向大家道歉：在我按照建议使用内存映射文件实现文件访问后，事实证明这并没有多大区别。但在我添加了更多计时代码后也发现，文件访问并不是减慢程序速度的原因。无论文件大小如何，文件访问实际上都花费几乎恒定的时间。用户界面的某些部分（我尚未确定）似乎在处理大量数据时存在性能问题，并且不知何故，当我第一次对进程进行计时时，我未能看到差异。

我很抱歉在识别瓶颈时草率了。

原文

My program needs to read chunks from a huge binary file with random access. I have got a list of offsets and lengths which may have several thousand entries. The user selects an entry and the program seeks to the offset and reads length bytes.

The program internally uses a TMemoryStream to store and process the chunks read from the file. Reading the data is done via a TFileStream like this:

FileStream.Position := Offset;
MemoryStream.CopyFrom(FileStream, Size);

This works fine but unfortunately it becomes increasingly slower as the files get larger. The file size starts at a few megabytes but frequently reaches several tens of gigabytes. The chunks read are around 100 kbytes in size.

The file's content is only read by my program. It is the only program accessing the file at the time. Also the files are stored locally so this is not a network issue.

I am using Delphi 2007 on a Windows XP box.

What can I do to speed up this file access?

edit:

The file access is slow for large files, regardless of which part of the file is being read.
The program usually does not read the file sequentially. The order of the chunks is user driven and cannot be predicted.
It is always slower to read a chunk from a large file than to read an equally large chunk from a small file.
I am talking about the performance for reading a chunk from the file, not about the overall time it takes to process a whole file. The latter would obviously take longer for larger files, but that's not the issue here.

I need to apologize to everybody: After I implemented file access using a memory mapped file as suggested it turned out that it did not make much of a difference. But it also turned out after I added some more timing code that it is not the file access that slows down the program. The file access takes actually nearly constant time regardless of the file size. Some part of the user interface (which I have yet to identify) seems to have a performance problem with large amounts of data and somehow I failed to see the difference when I first timed the processes.

I am sorry for being sloppy in identifying the bottleneck.

分享到QQ

分享到微博