为什么 C++ 中的缓冲是这样的?重要的?

发布于 2024-10-19 10:01:17 字数 1012 浏览 2 评论 0原文

我尝试打印 Hello World 200,000 次,但花了很长时间,所以我必须停下来。但在我添加一个 char 数组作为缓冲区后,只花了不到 10 秒的时间。为什么?

添加缓冲区之前:

#include <iostream> 
using namespace std;

int main() {
        int count = 0;
        std::ios_base::sync_with_stdio(false);
        for(int i = 1; i < 200000; i++)
        {       
                cout << "Hello world!\n";
                count++;
        }
                cout<<"Count:%d\n"<<count;
return 0;
}

这是添加缓冲区之后:

#include <iostream> 
using namespace std;

int main() {
        int count = 0;
        std::ios_base::sync_with_stdio(false);
        char buffer[1024];
        cout.rdbuf()->pubsetbuf(buffer, 1024);
        for(int i = 1; i < 200000; i++)
        {       
                cout << "Hello world!\n";
                count++;
        }
                cout<<"Count:%d\n"<<count;
return 0;
}

这让我想到了 Java。使用BufferReader读取文件有什么好处?

I tried to print Hello World 200,000 times and it took me forever, so I have to stop. But right after I add a char array to act as a buffer, it took less than 10 seconds. Why?

Before adding a buffer:

#include <iostream> 
using namespace std;

int main() {
        int count = 0;
        std::ios_base::sync_with_stdio(false);
        for(int i = 1; i < 200000; i++)
        {       
                cout << "Hello world!\n";
                count++;
        }
                cout<<"Count:%d\n"<<count;
return 0;
}

And this is after adding a buffer:

#include <iostream> 
using namespace std;

int main() {
        int count = 0;
        std::ios_base::sync_with_stdio(false);
        char buffer[1024];
        cout.rdbuf()->pubsetbuf(buffer, 1024);
        for(int i = 1; i < 200000; i++)
        {       
                cout << "Hello world!\n";
                count++;
        }
                cout<<"Count:%d\n"<<count;
return 0;
}

This makes me think about Java. What's the advantages of a using BufferReader to read in file?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

嗳卜坏 2024-10-26 10:01:17

对于文件操作来说,写入内存(RAM)总是比直接写入磁盘上的文件要快。

为了便于说明,我们定义:

  • 每次对磁盘上文件的写 IO 操作花费 1 毫秒
  • 通过网络对磁盘上文件的每次写 IO 操作花费 5 毫秒
  • 对内存的每次写 IO 操作花费 0.5 毫秒

假设我们必须将一些数据写入文件 100 次。

案例 1:直接写入磁盘上的文件

100 times x 1 ms = 100 ms

案例 2:通过网络直接写入磁盘上的文件

100 times x 5 ms = 500 ms

案例 3:写入磁盘上的文件之前在内存中缓冲

(100 times x 0.5 ms) + 1 ms = 51 ms

案例 4:通过网络写入磁盘上的文件之前在内存

(100 times x 0.5 ms) + 5 ms = 55 ms

中缓冲结论

内存中的缓冲总是比直接操作更快。但是,如果您的系统内存不足并且必须与页面文件交换,则速度会再次变慢。因此,您必须平衡内存和磁盘/网络之间的 IO 操作。

For the stand of file operations, writing to memory (RAM) is always faster than writing to the file on the disk directly.

For illustration, let's define:

  • each write IO operation to a file on the disk costs 1 ms
  • each write IO operation to a file on the disk over a network costs 5 ms
  • each write IO operation to the memory costs 0.5 ms

Let's say we have to write some data to a file 100 times.

Case 1: Directly Writing to File On Disk

100 times x 1 ms = 100 ms

Case 2: Directly Writing to File On Disk Over Network

100 times x 5 ms = 500 ms

Case 3: Buffering in Memory before Writing to File on Disk

(100 times x 0.5 ms) + 1 ms = 51 ms

Case 4: Buffering in Memory before Writing to File on Disk Over Network

(100 times x 0.5 ms) + 5 ms = 55 ms

Conclusion

Buffering in memory is always faster than direct operation. However if your system is low on memory and has to swap with page file, it'll be slow again. Thus you have to balance your IO operations between memory and disk/network.

囚我心虐我身 2024-10-26 10:01:17

写入磁盘的主要问题是写入时间不是字节数的线性函数,而是一个具有巨大常数的仿射函数。

在计算方面,这意味着对于 IO,您具有良好的吞吐量(低于内存,但仍然相当不错),但延迟很差(比正常网络稍好一些)。

如果您查看 HDD 或 SSD 的评测文章,您会发现读/写测试分为两类:

  • 随机读取的吞吐量
  • 连续读取的吞吐量

后者通常明显大于前者。

通常,操作系统和 IO 库应该为您抽象这一点,但正如您所注意到的,如果您的例程是 IO 密集型,您可能会通过增加缓冲区大小来获得收益。这是正常的,该库通常是为各种用途量身定制的,因此为普通应用程序提供了良好的中间立场。如果您的应用程序不是“平均”,那么它的执行速度可能不会那么快。

The main issue with writing to the disk is that the time taken to write is not a linear function of the number bytes, but an affine one with a huge constant.

In computing terms, it means that, for IO, you have a good throughput (less than memory, but quite good still), however you have poor latency (a tad better than network normally).

If you look at evaluation articles of HDD or SSD, you'll notice that the read/write tests are separated in two categories:

  • throughput in random reads
  • throughput in contiguous reads

The latter is normally significantly greater than the former.

Normally, the OS and the IO library should abstract this for you, but as you noticed, if your routine is IO intensive, you might gain by increasing the buffer size. This is normal, the library is generally tailored for all kinds of uses and thus offers a good middle-ground for average applications. If your application is not "average", then it might not perform as fast as it could.

帅气称霸 2024-10-26 10:01:17

您使用什么编译器/平台?我认为这里没有显着差异(RedHat,gcc 4.1.2);这两个程序都需要 5-6 秒才能完成(但“用户”时间约为 150 毫秒)。如果我将输出重定向到文件(通过 shell),总时间约为 300 毫秒(因此 6 秒的大部分时间都花在等待控制台跟上程序上)。

换句话说,默认情况下应该缓冲输出,所以我很好奇为什么你会看到如此巨大的加速。

3 个与切线相关的注释:

  1. 您的程序存在一个相差一的错误,因为您仅打印了 199999 次,而不是规定的 200000 次(以 i = 0 开头或以 i < 结尾;= 200000
  2. 在输出 count 时,您将 printf 语法与 cout 语法混合在一起......对此的修复是显而易见的。
  3. 当输出到控制台时,禁用 sync_with_stdio 会产生小幅加速(大约 5%),但在重定向到文件时,影响可以忽略不计。这是一种微观优化,在大多数情况下您可能不需要(恕我直言)。

What compiler/platform are you using? I see no significant difference here (RedHat, gcc 4.1.2); both programs take 5-6 seconds to finish (but "user" time is about 150 ms). If I redirect output to a file (through the shell), total time is about 300 ms (so most of the 6 seconds is spent waiting for my console to catch up to the program).

In other words, output should be buffered by default, so I'm curious why you're seeing such a huge speedup.

3 tangentially-related notes:

  1. Your program has an off-by-one error in that you only print 199999 times instead of the stated 200000 (either start with i = 0 or end with i <= 200000)
  2. You're mixing printf syntax with cout syntax when outputting count...the fix for that is obvious enough.
  3. Disabling sync_with_stdio produces a small speedup (about 5%) for me when outputting to console, but the impact is negligible when redirecting to file. This is a micro-optimization which you probably wouldn't need in most cases (IMHO).
晌融 2024-10-26 10:01:17

cout 函数包含许多隐藏且复杂的逻辑,一直延伸到内核,因此您可以将文本写入屏幕,当您以这种方式使用缓冲区时,您实际上会执行批处理请求,而不是重复复杂的 I/奥打电话。

The cout function contains a lot of hidden and complex logic going all the way down the the kernel so you can write your text to the screen, when you use a buffer in that way your essentially do a batch request instead of repeating the complex I/O calls.

浪荡不羁 2024-10-26 10:01:17

如果有缓冲区,实际 I/O 调用就会减少,这就是缓慢的部分。首先,缓冲区被填满,然后进行一次 I/O 调用来刷新缓冲区。在 Java 或任何其他 I/O 缓慢的系统中同样有帮助。

If you have a buffer, you get fewer actual I/O calls, which is the slow part. First, the buffer gets filled, then one I/O call is made to flush the buffer. Will be equally helpful in Java or any other system where I/O is slow.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文