第一次读取文件时速度较慢,但连续读取时速度较快
(这不是我的程序,但我会尽力提供据我所知的所有相关信息。)
有一个程序可以读取大约 300MB 大小的二进制文件,处理它们并输出一些信息。该程序使用 ifstream 进行文件输入,并且每次读取时都会正确初始化和关闭流。
该程序必须多次读取每个文件。第一次读取文件大约需要3秒,每次连续读取大约需要0.1秒。如果处理了多个文件,返回到第一个文件仍然会产生快速的读取速度,但一段时间后重新读取文件会变得很慢。
另外,如果将文件复制到另一个位置,则新文件的首次读取速度大约为 0.1 秒。
如果你算一下,连续读取的速度大致就是硬盘宣传的读取速度。
所有这些看起来都像是由操作系统或硬盘驱动器缓存的文件位置,因此在连续读取时您不必寻找文件位置。
有谁知道到底是什么导致初始读取速度减慢,以及是否可以预防?三秒看起来似乎不多,但正确处理每个文件所需的总时间却增加了大约 5 个小时。
此外,该程序在 Fedora 14 和 Scientific Linux 上运行,这两个操作系统都有默认文件系统。
任何想法将不胜感激。
(This isn't my program, but I'll try to provide all the relevant information to the best of my knowledge.)
There is a program which reads binary files that are roughly 300MB in size, processes them and outputs some information. The program uses ifstream for file input and streams are correctly initialized and closed for each read.
The program has to read each file multiple times. Reading a file for the first time takes about 3 seconds, and each consecutive read takes about 0.1 seconds. If several files are processed, going back to the first file will still yield fast read speeds, but after some time re-reading a file becomes slow.
Additionally, if a file is copied to another location, the speed of the first read of the new file is roughly 0.1 seconds.
If you do the math, the speed of consecutive reads is roughly the advertised read speed of the hard drive.
All this looks like file locations are cached by either the OS or the hard drive, so that on consecutive reads you don't have to seek out file locations.
Does anyone know what exactly is causing the slowdown on the initial read, and if it can be prevented? Three seconds may not seem like a lot, but they add about 5 hours to the total time needed to correctly process every file.
Also, the program runs on Fedora 14 and Scientific Linux, with both OS's having their default file systems.
Any ideas would be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Linux 会尝试将文件复制到 RAM 中,以便下次读取速度更快 - 我猜这就是正在发生的事情。初始读取实际上是在磁盘外进行的 - 后续读取不在文件缓存中,因为整个文件已复制到 RAM
Linux will try and copy the file into RAM to make the next read faster - I am guessing this is what is happening. The initial read is actual off disk - subsequent reads are out of the file cache because the entire file has been copied to RAM
操作系统(Linux)具有磁盘缓存。读取该文件一次后,它就在缓存中。
The OS (Linux) has a disk cache. After you read the file once, it's in the cache.
我的猜测是,也许第一次读取文件需要更长的时间,因为它将一些信息加载到缓存中?
第一次之后,它只使用缓存中的一些信息。
My guess would be that maybe the first time it reads the file it takes longer because it loads some information into the cache?
After the first time, it just uses some of the information in the cache.
是的,数据被缓存。您可以使用 readahead 系统调用强制进行缓存(或者只是让另一个进程读取它) 。如果使用 mmap 你也可以使用 madvise
Yes, the data becomes cached. You might force that caching with the readahead syscall (or simply by having another process read it). If using mmap you could also use madvise