为什么 fseeko() 对于大文件比小文件更快？

发布于 2024-09-10 09:01:55 字数 1235 浏览 6 评论 0原文

我在这里得到了一些奇怪的性能结果，我希望 stackoverflow.com 上的人能够对此有所了解！

我的目标是一个程序，我可以用它来测试大型搜索是否比小型搜索更昂贵...

首先，我通过 dd'ing /dev/zero 来分隔文件创建了两个文件...一个是 1 mb，另一个是 1 MB是 9.8gb... 然后我写了这段代码：

#define _LARGE_FILE_API
#define _FILE_OFFSET_BITS 64

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

int main( int argc, char* argv[] )
{
  struct stat64 fileInfo;
  stat64( argv[1], &fileInfo );

  FILE* inFile = fopen( argv[1], "r" );

  for( int i = 0; i < 1000000; i++ )
    {
      double seekFrac = ((double)(random() % 100)) / ((double)100);

      unsigned long long seekOffset = (unsigned long long)(seekFrac * fileInfo.st_size);

      fseeko( inFile, seekOffset, SEEK_SET );
    }

    fclose( inFile );
}

基本上，这段代码在文件的整个范围内进行一百万次随机查找。当我在 time 下运行它时，我得到这样的小文件结果：

[developer@stinger ~]# time ./seeker ./smallfile

real    0m1.863s
user    0m0.504s
sys  0m1.358s

当我针对 9.8 gig 文件运行它时，我得到这样的结果：

[developer@stinger ~]# time ./seeker ./bigfile

real    0m0.670s
user    0m0.337s
sys  0m0.333s

我对每个文件运行了几十次，结果是一致的。在大文件中查找的速度比在小文件中查找的速度快两倍以上。为什么？

原文

I'm getting some strange performance results here and I'm hoping someone on stackoverflow.com can shed some light on this!

My goal was a program that I could use to test whether large seek's were more expensive than small seek's...

First, I created two files by dd'ing /dev/zero to seperate files... One is 1 mb, the other is 9.8gb... Then I wrote this code:

#define _LARGE_FILE_API
#define _FILE_OFFSET_BITS 64

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

int main( int argc, char* argv[] )
{
  struct stat64 fileInfo;
  stat64( argv[1], &fileInfo );

  FILE* inFile = fopen( argv[1], "r" );

  for( int i = 0; i < 1000000; i++ )
    {
      double seekFrac = ((double)(random() % 100)) / ((double)100);

      unsigned long long seekOffset = (unsigned long long)(seekFrac * fileInfo.st_size);

      fseeko( inFile, seekOffset, SEEK_SET );
    }

    fclose( inFile );
}

Basically, this code does one million random seeks across the whole range of the file. When I run this under time, I get results like this for smallfile:

[developer@stinger ~]# time ./seeker ./smallfile

real    0m1.863s
user    0m0.504s
sys  0m1.358s

When I run it against the 9.8 gig file, I get results like this:

[developer@stinger ~]# time ./seeker ./bigfile

real    0m0.670s
user    0m0.337s
sys  0m0.333s

I ran against each file a couple dozen times and the results are consistent. Seeking in the large file is more than twice as fast as seeking in the small file. Why?

分享到QQ

分享到微博