需要快速文件访问选项
我希望我的代码能够非常快地处理文件。该文件大小从 1 KB 到 2 GB 不等。
即使我准备为该单个文件创建一个单独的文件系统。
我将把文件分割为固定大小的块(可能是 8KB)并访问它以进行数据读写。从代码角度来看,该算法无法更改,因为它提供了良好的性能且稳定。所以我不想改变。我还使用 mmap() 根据需要将块映射到内存。
是否有可能将文件系统作为单个块,以便文件访问、读写操作可以更快?
请给我所有的建议,哪怕是一件小事,对我有帮助。
这些建议可以跨平台和文件系统。
谢谢, 娜迦
I want my code to process a file very fast. This file size will vary from single KB to even 2 GB.
Even i am ready to create a separate file system for that single file.
I will split the file as constant size blocks(probably 8KB) and access it for data read and write. The code wise, the algorithm cannot be changed because it gives good performance and also stable one. so I don't want to change. I am also using mmap() to map blocks to memory on demand basis.
Is it possible to get a file system as a single block so that the the file access, read write operations can be faster?
Please give all your suggestions even a small thing that will help me.
The suggestions can be across platforms and file systems.
Thanks,
Naga
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
一般,独立于操作系统的一般规则:
使用物理读取(而不是
流)
使用大的 I/O 缓冲区进行读取。 I/O 操作的初始化(以及与旋转硬件的同步)非常耗时。几次小读取比一次大读取花费的时间更长。
创建一些基准来找出最有效的缓冲区大小。在给定大小之后,效率不会提高,并且您不想不必要地吞噬所有宝贵的 RAM。最佳缓冲区大小取决于您的硬件和操作系统。在当前的硬件上,使用 500KB 到 1MB 范围内的缓冲区大小通常就足够高效。
最小化磁盘头寻道。即,如果您必须将数据写回,如果它们位于同一物理磁盘,则读/写交替的成本可能会非常高。
如果您有一些重要的处理要做,请使用双缓冲和异步 IO 来重叠 IO 和处理。
General, OS independent general rules:
Use physical reads (rather than
streams)
Use large I/O buffers for your reads. The initialization of the I/O operation (and the sync with the spinning hardware) is time costly. Several small reads take longer than a large one.
Create some benchmark to figure out the most efficient buffer size. After a give size, efficiency will not improve, and you don't want to gobble all your precious RAM needlessly. The optimal buffer size depends on your hardware and OS. On current hardware, using buffer sizes in the 500KB to 1MB range is usually efficient enough.
Minimize disk head seeks. I.e. if you have to write the data back, the read/write alternance can be very costly, if they are to the same physical disk.
if you have some significant processing to do, use double buffering and asynchronous IO to overlap IO and processing.
mmap
或MapViewOfFile
让您可以直接访问内存中的文件。操作系统将根据需要在页面中透明地出错,甚至可能提前阅读(这可以通过madvise
或FILE_FLAG_*
)。根据您的访问模式和文件大小,这可能比正常读取/写入文件要快得多。不利的一面是,您将不得不更多地担心一致性(请确保使用
msync
或FlushViewOfFile
小心),并且由于必要的页表操作,它也可能会更慢。mmap
orMapViewOfFile
let you access files directly in memory. The OS will transparently fault in pages as needed, or possibly even read ahead (which can be hinted at withmadvise
orFILE_FLAG_*
). Depending on your access pattern and the size of your files, this could be noticeably faster than reading/writing the files normally.On the downside, you will have to worry a bit more about consistency (make sure to use
msync
orFlushViewOfFile
with care), and because of the pagetable manipulations necessary, it might be slower too.Windows 允许您打开一个分区以进行原始读写。它还可以让您打开原始 IO 的物理设备。因此,如果您愿意将硬盘或分区视为单个文件,则可以保证“文件”在磁盘上逻辑上连续。 (由于硬盘对坏扇区进行热修复的方式,它实际上可能在物理上不是连续的)。
如果您选择进行原始 io,那么您将必须以设备块大小的倍数进行读写。这通常是 512 字节,但使用 4k 作为块大小可能更明智,因为这是较新的磁盘正在使用的内容,也是 Win32 的页面大小。
要打开原始读取分区,请使用 CreateFile 文件名“\.\X:”,其中 X: 是分区的驱动器号。请参阅标题 物理磁盘和卷 部分下的 CreateFile 文档
另一方面,很难击败内存映射文件的性能,请参阅此问题的示例
如何扫描磁盘上非常大的文件?
Windows permits you to open a partition for raw reads and writes. It will also let you open a physical device for raw IO. So if you are willing to treat a Hard Disk or a Partition as a single file, you will be guaranteed that the 'file' logically contiguous on disk. (Because of the way hard-disks do hotfixes for bad sectors, it may not actually be physically contiguous).
If you choose to do raw io, then you will have to read and write in multiples of the block size of the device. This is usually 512 bytes, but it would probably be wiser to use 4k as your block size since that is what newer disks are using and that is the page size for Win32.
To open a partition for raw reads you use CreateFile with the filename "\.\X:" where X: is the drive letter of the partition. See the CreateFile documentation under the section heading Physical Disks and Volumes
On the other hand, it's pretty hard to beat the performance of memory mapped files, see this question for an example
How to scan through really huge files on disk?
始终尝试以 64kB-1MB 的块顺序访问文件。这样您就可以利用预取并最大化每个 I/O 操作的数据量。
另外,首先尝试确保文件是连续的,以便磁盘头不必在顺序读取之间移动太多。如果您首先设置文件末尾或立即对整个文件执行 write() 操作,许多文件系统将创建尽可能连续的文件。在 Windows 上,您可以使用 sysinternals.com 实用程序
contig.exe
使文件连续。Always try to access your file sequentially, in blocks of 64kB-1MB. That way you can take advantage of prefetching and maximize the amount of data per I/O operation.
Also, try to make sure that the file is contiguous in the first place so that the disk head doesn't have to move a lot between sequential reads. Many filesystems will create a file as contiguous as possible if you start out by setting the end of file or doing a
write()
of the whole file at once. On Windows you can use the sysinternals.com utilitycontig.exe
to make a file contiguous.