在Delphi中读取大文件的最快方法是什么?

发布于 2024-10-10 19:30:41 字数 884 浏览 7 评论 0原文

我的程序需要通过随机访问从一个巨大的二进制文件中读取块。我有一个偏移量和长度的列表,可能有几千个条目。用户选择一个条目,程序会查找偏移量并读取长度字节。

程序内部使用TMemoryStream来存储和处理从文件读取的块。读取数据是通过 TFileStream 完成的,如下所示:

FileStream.Position := Offset;
MemoryStream.CopyFrom(FileStream, Size);

这工作正常,但不幸的是,随着文件变大,它变得越来越慢。文件大小从几兆字节开始,但经常达到几十千兆字节。读取的块大小约为 100 KB。

该文件的内容只能由我的程序读取。它是当时访问该文件的唯一程序。此外,文件存储在本地,因此这不是网络问题。

我在 Windows XP 机器上使用 Delphi 2007。

我可以采取什么措施来加快此文件的访问速度?

编辑:

  • 对于大文件,无论读取文件的哪一部分,文件访问都很慢。
  • 程序通常不会顺序读取文件。块的顺序是用户驱动的并且无法预测。
  • 从大文件中读取块总是比从小文件中读取同样大的块慢。
  • 我谈论的是从文件中读取块的性能,而不是处理整个文件所需的总时间。对于较大的文件,后者显然需要更长的时间,但这不是这里的问题。

我需要向大家道歉:在我按照建议使用内存映射文件实现文件访问后,事实证明这并没有多大区别。但在我添加了更多计时代码后也发现,文件访问并不是减慢程序速度的原因。无论文件大小如何,文件访问实际上都花费几乎恒定的时间。用户界面的某些部分(我尚未确定)似乎在处理大量数据时存在性能问题,并且不知何故,当我第一次对进程进行计时时,我未能看到差异。

我很抱歉在识别瓶颈时草率了。

My program needs to read chunks from a huge binary file with random access. I have got a list of offsets and lengths which may have several thousand entries. The user selects an entry and the program seeks to the offset and reads length bytes.

The program internally uses a TMemoryStream to store and process the chunks read from the file. Reading the data is done via a TFileStream like this:

FileStream.Position := Offset;
MemoryStream.CopyFrom(FileStream, Size);

This works fine but unfortunately it becomes increasingly slower as the files get larger. The file size starts at a few megabytes but frequently reaches several tens of gigabytes. The chunks read are around 100 kbytes in size.

The file's content is only read by my program. It is the only program accessing the file at the time. Also the files are stored locally so this is not a network issue.

I am using Delphi 2007 on a Windows XP box.

What can I do to speed up this file access?

edit:

  • The file access is slow for large files, regardless of which part of the file is being read.
  • The program usually does not read the file sequentially. The order of the chunks is user driven and cannot be predicted.
  • It is always slower to read a chunk from a large file than to read an equally large chunk from a small file.
  • I am talking about the performance for reading a chunk from the file, not about the overall time it takes to process a whole file. The latter would obviously take longer for larger files, but that's not the issue here.

I need to apologize to everybody: After I implemented file access using a memory mapped file as suggested it turned out that it did not make much of a difference. But it also turned out after I added some more timing code that it is not the file access that slows down the program. The file access takes actually nearly constant time regardless of the file size. Some part of the user interface (which I have yet to identify) seems to have a performance problem with large amounts of data and somehow I failed to see the difference when I first timed the processes.

I am sorry for being sloppy in identifying the bottleneck.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

苏大泽ㄣ 2024-10-17 19:30:42

如果您打开 CreateFile 的帮助主题() WinAPI 函数,您会在那里找到有趣的标志,例如 FILE_FLAG_NO_BUFFERING 和 FILE_FLAG_RANDOM_ACCESS 。您可以与它们一起玩以获得一些性能。

接下来,复制文件数据(即使大小为 100Kb)是一个额外的步骤,会减慢操作速度。最好使用 CreateFileMapping 和MapViewOfFile 函数获取可供使用的数据指针。这样您就可以避免复制,并且还可能获得一定的性能优势(但您需要仔细测量速度)。

If you open help topic for CreateFile() WinAPI function, you will find interesting flags there such as FILE_FLAG_NO_BUFFERING and FILE_FLAG_RANDOM_ACCESS . You can play with them to gain some performance.

Next, copying the file data, even 100Kb in size, is an extra step which slows down operations. It is a good idea to use CreateFileMapping and MapViewOfFile functions to get the ready for use pointer to the data. This way you avoid copying and also possibly get certain performance benefits (but you need to measure speed carefully).

囍孤女 2024-10-17 19:30:42

Delphi 中的普通 TMemoryStream 由于其分配内存的方式而很慢。 NexusDB公司有TnxMemoryStream,效率更高。可能有一些免费的效果更好。

现有的 Delphi TFileStream 也不是最高效的组件。回溯历史,Julian Bucknall 在杂志或其他地方发布了一个名为 BufferedFileStream 的组件,该组件可以非常有效地处理文件流。

祝你好运。

The stock TMemoryStream in Delphi is slow due to the way it allocates memory. The NexusDB company has TnxMemoryStream which is much more efficient. There might be some free ones out there that work better.

The stock Delphi TFileStream is also not the most efficient component. Wayback in history Julian Bucknall published a component named BufferedFileStream in a magazine or somewhere that worked with file streams very efficiently.

Good luck.

青春有你 2024-10-17 19:30:42

也许您可以采用这种方法:

对最大文件位置上的条目进行排序,然后进行以下操作:

  1. 取出仅需要文件的第一个 X MB 的条目(直到某个文件位置)
  2. 从文件中读取 X MB 到缓冲区中(TMemorystream
  3. Now)从缓冲区读取条目(可能是多线程)
  4. 对所有条目重复此

操作:缓存文件的一部分并读取适合其中的所有条目(多线程),然后缓存下一部分等。

也许您可以提高速度。如果您只采用原来的方法,但按位置对条目进行排序。

Maybe you can take this approach:

Sort the entries on max fileposition and then to the following:

  1. Take the entries that only need the first X MB of the file (till a certain fileposition)
  2. Read X MB from the file into a buffer (TMemorystream
  3. Now read the entries from the buffer (maybe multithreaded)
  4. Repeat this for all the entries.

In short: cache a part of the file and read all entries that fit into it (multhithreaded), then cache the next part etc.

Maybe you can gain speed if you just take your original approach, but sort the entries on position.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文