为什么 fseeko() 对于大文件比小文件更快?

发布于 2024-09-10 09:01:55 字数 1235 浏览 6 评论 0原文

我在这里得到了一些奇怪的性能结果,我希望 stackoverflow.com 上的人能够对此有所了解!

我的目标是一个程序,我可以用它来测试大型搜索是否比小型搜索更昂贵...

首先,我通过 dd'ing /dev/zero 来分隔文件创建了两个文件...一个是 1 mb,另一个是 1 MB是 9.8gb... 然后我写了这段代码:

#define _LARGE_FILE_API
#define _FILE_OFFSET_BITS 64

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

int main( int argc, char* argv[] )
{
  struct stat64 fileInfo;
  stat64( argv[1], &fileInfo );

  FILE* inFile = fopen( argv[1], "r" );

  for( int i = 0; i < 1000000; i++ )
    {
      double seekFrac = ((double)(random() % 100)) / ((double)100);

      unsigned long long seekOffset = (unsigned long long)(seekFrac * fileInfo.st_size);

      fseeko( inFile, seekOffset, SEEK_SET );
    }

    fclose( inFile );
}

基本上,这段代码在文件的整个范围内进行一百万次随机查找。当我在 time 下运行它时,我得到这样的小文件结果:

[developer@stinger ~]# time ./seeker ./smallfile

real    0m1.863s
user    0m0.504s
sys  0m1.358s

当我针对 9.8 gig 文件运行它时,我得到这样的结果:

[developer@stinger ~]# time ./seeker ./bigfile

real    0m0.670s
user    0m0.337s
sys  0m0.333s

我对每个文件运行了几十次,结果是一致的。在大文件中查找的速度比在小文件中查找的速度快两倍以上。为什么?

I'm getting some strange performance results here and I'm hoping someone on stackoverflow.com can shed some light on this!

My goal was a program that I could use to test whether large seek's were more expensive than small seek's...

First, I created two files by dd'ing /dev/zero to seperate files... One is 1 mb, the other is 9.8gb... Then I wrote this code:

#define _LARGE_FILE_API
#define _FILE_OFFSET_BITS 64

#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

int main( int argc, char* argv[] )
{
  struct stat64 fileInfo;
  stat64( argv[1], &fileInfo );

  FILE* inFile = fopen( argv[1], "r" );

  for( int i = 0; i < 1000000; i++ )
    {
      double seekFrac = ((double)(random() % 100)) / ((double)100);

      unsigned long long seekOffset = (unsigned long long)(seekFrac * fileInfo.st_size);

      fseeko( inFile, seekOffset, SEEK_SET );
    }

    fclose( inFile );
}

Basically, this code does one million random seeks across the whole range of the file. When I run this under time, I get results like this for smallfile:

[developer@stinger ~]# time ./seeker ./smallfile

real    0m1.863s
user    0m0.504s
sys  0m1.358s

When I run it against the 9.8 gig file, I get results like this:

[developer@stinger ~]# time ./seeker ./bigfile

real    0m0.670s
user    0m0.337s
sys  0m0.333s

I ran against each file a couple dozen times and the results are consistent. Seeking in the large file is more than twice as fast as seeking in the small file. Why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

淤浪 2024-09-17 09:01:55

您不是在测量磁盘性能,而是在测量 fseek 设置指针并返回所需的时间。

如果您想测试真实的 IO,我建议您从您要查找的位置读取文件。

You're not measuring disk performance, you're measuring how long it takes for fseek to set a pointer and return.

I recommend you do a file read from the location you're seeking to, if you want to test real IO.

乜一 2024-09-17 09:01:55

我认为这与 fseeko 的实现有关。

fseek 的手册页表明它只是“为指示的流设置文件位置指示器”。由于设置整数应该与文件大小无关,因此也许有一个“优化”,它将在对小文件而不是大文件进行 fseek 之后执行自动读取(并缓存结果信息)。

I would assume that it has to do with the implementation of fseeko.

The man page of fseek indicates that it merely "sets the file position indicator for the indicated stream." Since setting an integer should be independent of the file size, perhaps there is an "optimization" that will perform an automatic read (and cache the resulting information) after an fseek for small files and not large files.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文