在 c++ 中创建大文件的最快方法?

发布于 2024-07-07 17:13:50 字数 65 浏览 7 评论 0原文

使用 C++ 创建大约 50 - 100 MB 的平面文本文件 内容“添加第一行”应该被插入到文件中 400 万次

Create a flat text file in c++ around 50 - 100 MB
with the content 'Added first line' should be inserted in to the file for 4 million times

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

我恋#小黄人 2024-07-14 17:13:50

使用旧式文件io

fopen 文件进行写入。

fseek 到所需的文件大小 - 1.

fwrite 单字节

fclose 文件

using old style file io

fopen the file for write.

fseek to the desired file size - 1.

fwrite a single byte

fclose the file

岛歌少女 2024-07-14 17:13:50

创建特定大小的文件的最快方法是使用 creat()open() 创建一个零长度文件,然后使用 更改大小>chsize()。 这将简单地在磁盘上为文件分配块,内容将是这些块中发生的任何内容。 由于不需要进行缓冲区写入,因此速度非常快。

The fastest way to create a file of a certain size is to simply create a zero-length file using creat() or open() and then change the size using chsize(). This will simply allocate blocks on the disk for the file, the contents will be whatever happened to be in those blocks. It's very fast since no buffer writing needs to take place.

倾其所爱 2024-07-14 17:13:50

不确定我理解这个问题。 您想确保文件中的每个字符都是可打印的 ASCII 字符吗? 如果是这样,那这个呢? 用“abcdefghabc ....”填充文件

#include <stdio.h>
int main ()
{
   const int FILE_SiZE = 50000; //size in KB
   const int BUFFER_SIZE = 1024;
   char buffer [BUFFER_SIZE + 1];
   int i;
   for(i = 0; i < BUFFER_SIZE; i++)
      buffer[i] = (char)(i%8 + 'a');
   buffer[BUFFER_SIZE] = '\0';

   FILE *pFile = fopen ("somefile.txt", "w");
   for (i = 0; i < FILE_SIZE; i++)
     fprintf(pFile, buffer);

   fclose(pFile);

   return 0;
}

Not sure I understand the question. Do you want to ensure that every character in the file is a printable ASCII character? If so, what about this? Fills the file with "abcdefghabc...."

#include <stdio.h>
int main ()
{
   const int FILE_SiZE = 50000; //size in KB
   const int BUFFER_SIZE = 1024;
   char buffer [BUFFER_SIZE + 1];
   int i;
   for(i = 0; i < BUFFER_SIZE; i++)
      buffer[i] = (char)(i%8 + 'a');
   buffer[BUFFER_SIZE] = '\0';

   FILE *pFile = fopen ("somefile.txt", "w");
   for (i = 0; i < FILE_SIZE; i++)
     fprintf(pFile, buffer);

   fclose(pFile);

   return 0;
}
冰雪之触 2024-07-14 17:13:50

您没有提到操作系统,但我假设创建/打开/关闭/写入可用。

为了真正高效地写入并假设 4k 页和磁盘块大小以及重复的字符串:

  1. 打开文件。
  2. 在重复字符串中分配 4k * 个字符,最好与页面边界对齐。
  3. 将重复的字符串打印到内存中 4k 次,精确地填充块。
  4. 使用 write() 根据需要多次将块写入磁盘。 您可能希望为最后一个块编写一部分,以获得正确的大小。
  5. 关闭文件。

这绕过了 fopen() 和朋友的缓冲,这有好有坏:它们的缓冲意味着它们又好又快,但它们仍然不会像这样高效,因为它没有使用缓冲区的开销。

这可以很容易地用 C++ 或 C 编写,但为了提高效率,假设您将使用 POSIX 调用而不是 iostream 或 stdio,因此它超出了核心库规范。

You haven't mentioned the OS but I'll assume creat/open/close/write are available.

For truly efficient writing and assuming, say, a 4k page and disk block size and a repeated string:

  1. open the file.
  2. allocate 4k * number of chars in your repeated string, ideally aligned to a page boundary.
  3. print repeated string into the memory 4k times, filling the blocks precisely.
  4. Use write() to write out the blocks to disk as many times as necessary. You may wish to write a partial piece for the last block to get the size to come out right.
  5. close the file.

This bypasses the buffering of fopen() and friends, which is good and bad: their buffering means that they're nice and fast, but they are still not going to be as efficient as this, which has no overhead of working with the buffer.

This can easily be written in C++ or C, but does assume that you're going to use POSIX calls rather than iostream or stdio for efficiency's sake, so it's outside the core library specification.

末骤雨初歇 2024-07-14 17:13:50

我遇到了同样的问题,在 Windows 上非常快地创建了约 500MB 的文件。
传递给 fwrite() 的缓冲区越大,速度就越快。

int i;
FILE *fp;

fp = fopen(fname,"wb");

if (fp != NULL) {

    // create big block's data
    uint8_t b[278528]; // some big chunk size

    for( i = 0; i < sizeof(b); i++ ) // custom initialization if != 0x00
    {
        b[i] = 0xFF;
    }

    // write all blocks to file
    for( i = 0; i < TOT_BLOCKS; i++ )
        fwrite(&b, sizeof(b), 1, fp);

    fclose (fp);
}

现在至少在我的 Win7 上,MinGW 几乎可以立即创建文件。
与一次写入 1 个字节的 fwrite() 相比,这将在 10 秒内完成。
通过 4k 缓冲区将在 2 秒内完成。

I faced the same problem, creating a ~500MB file on Windows very fast.
The larger buffer you pass to fwrite() the fastest you'll be.

int i;
FILE *fp;

fp = fopen(fname,"wb");

if (fp != NULL) {

    // create big block's data
    uint8_t b[278528]; // some big chunk size

    for( i = 0; i < sizeof(b); i++ ) // custom initialization if != 0x00
    {
        b[i] = 0xFF;
    }

    // write all blocks to file
    for( i = 0; i < TOT_BLOCKS; i++ )
        fwrite(&b, sizeof(b), 1, fp);

    fclose (fp);
}

Now at least on my Win7, MinGW, creates file almost instantly.
Compared to fwrite() 1 byte at time, that will complete in 10 Secs.
Passing 4k buffer will complete in 2 Secs.

同展鸳鸯锦 2024-07-14 17:13:50

在 C++ 中创建大文件的最快方法?
好的。 我认为最快的方法意味着运行时间最短的方法。

用 C++ 创建一个大约 50 - 100 MB 的平面文本文件,内容“添加第一行”应插入到该文件中 400 万次。

使用旧样式文件预分配文件 io

fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file

create a string containing the "Added first line\n" a thousand times.
find it's length.

预分配文件使用旧式文件 io

fopen the file for write.
fseek to the the string length * 4000
fwrite a single byte
fclose the file

open the file for read/write
loop 4000 times, 
    writing the string to the file.
close the file.

这是我最好的猜测。
我确信有很多方法可以做到这一点。

Fastest way to create large file in c++?
Ok. I assume fastest way means the one that takes the smallest run time.

Create a flat text file in c++ around 50 - 100 MB with the content 'Added first line' should be inserted in to the file for 4 million times.

preallocate the file using old style file io

fopen the file for write.
fseek to the desired file size - 1.
fwrite a single byte
fclose the file

create a string containing the "Added first line\n" a thousand times.
find it's length.

preallocate the file using old style file io

fopen the file for write.
fseek to the the string length * 4000
fwrite a single byte
fclose the file

open the file for read/write
loop 4000 times, 
    writing the string to the file.
close the file.

That's my best guess.
I'm sure there are a lot of ways to do it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文