为什么 C++ 中的缓冲是这样的?重要的?
我尝试打印 Hello World
200,000 次,但花了很长时间,所以我必须停下来。但在我添加一个 char
数组作为缓冲区后,只花了不到 10 秒的时间。为什么?
添加缓冲区之前:
#include <iostream>
using namespace std;
int main() {
int count = 0;
std::ios_base::sync_with_stdio(false);
for(int i = 1; i < 200000; i++)
{
cout << "Hello world!\n";
count++;
}
cout<<"Count:%d\n"<<count;
return 0;
}
这是添加缓冲区之后:
#include <iostream>
using namespace std;
int main() {
int count = 0;
std::ios_base::sync_with_stdio(false);
char buffer[1024];
cout.rdbuf()->pubsetbuf(buffer, 1024);
for(int i = 1; i < 200000; i++)
{
cout << "Hello world!\n";
count++;
}
cout<<"Count:%d\n"<<count;
return 0;
}
这让我想到了 Java。使用BufferReader读取文件有什么好处?
I tried to print Hello World
200,000 times and it took me forever, so I have to stop. But right after I add a char
array to act as a buffer, it took less than 10 seconds. Why?
Before adding a buffer:
#include <iostream>
using namespace std;
int main() {
int count = 0;
std::ios_base::sync_with_stdio(false);
for(int i = 1; i < 200000; i++)
{
cout << "Hello world!\n";
count++;
}
cout<<"Count:%d\n"<<count;
return 0;
}
And this is after adding a buffer:
#include <iostream>
using namespace std;
int main() {
int count = 0;
std::ios_base::sync_with_stdio(false);
char buffer[1024];
cout.rdbuf()->pubsetbuf(buffer, 1024);
for(int i = 1; i < 200000; i++)
{
cout << "Hello world!\n";
count++;
}
cout<<"Count:%d\n"<<count;
return 0;
}
This makes me think about Java. What's the advantages of a using BufferReader to read in file?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
对于文件操作来说,写入内存(RAM)总是比直接写入磁盘上的文件要快。
为了便于说明,我们定义:
假设我们必须将一些数据写入文件 100 次。
案例 1:直接写入磁盘上的文件
案例 2:通过网络直接写入磁盘上的文件
案例 3:写入磁盘上的文件之前在内存中缓冲
案例 4:通过网络写入磁盘上的文件之前在内存
中缓冲结论
内存中的缓冲总是比直接操作更快。但是,如果您的系统内存不足并且必须与页面文件交换,则速度会再次变慢。因此,您必须平衡内存和磁盘/网络之间的 IO 操作。
For the stand of file operations, writing to memory (RAM) is always faster than writing to the file on the disk directly.
For illustration, let's define:
Let's say we have to write some data to a file 100 times.
Case 1: Directly Writing to File On Disk
Case 2: Directly Writing to File On Disk Over Network
Case 3: Buffering in Memory before Writing to File on Disk
Case 4: Buffering in Memory before Writing to File on Disk Over Network
Conclusion
Buffering in memory is always faster than direct operation. However if your system is low on memory and has to swap with page file, it'll be slow again. Thus you have to balance your IO operations between memory and disk/network.
写入磁盘的主要问题是写入时间不是字节数的线性函数,而是一个具有巨大常数的仿射函数。
在计算方面,这意味着对于 IO,您具有良好的吞吐量(低于内存,但仍然相当不错),但延迟很差(比正常网络稍好一些)。
如果您查看 HDD 或 SSD 的评测文章,您会发现读/写测试分为两类:
后者通常明显大于前者。
通常,操作系统和 IO 库应该为您抽象这一点,但正如您所注意到的,如果您的例程是 IO 密集型,您可能会通过增加缓冲区大小来获得收益。这是正常的,该库通常是为各种用途量身定制的,因此为普通应用程序提供了良好的中间立场。如果您的应用程序不是“平均”,那么它的执行速度可能不会那么快。
The main issue with writing to the disk is that the time taken to write is not a linear function of the number bytes, but an affine one with a huge constant.
In computing terms, it means that, for IO, you have a good throughput (less than memory, but quite good still), however you have poor latency (a tad better than network normally).
If you look at evaluation articles of HDD or SSD, you'll notice that the read/write tests are separated in two categories:
The latter is normally significantly greater than the former.
Normally, the OS and the IO library should abstract this for you, but as you noticed, if your routine is IO intensive, you might gain by increasing the buffer size. This is normal, the library is generally tailored for all kinds of uses and thus offers a good middle-ground for average applications. If your application is not "average", then it might not perform as fast as it could.
您使用什么编译器/平台?我认为这里没有显着差异(RedHat,gcc 4.1.2);这两个程序都需要 5-6 秒才能完成(但“用户”时间约为 150 毫秒)。如果我将输出重定向到文件(通过 shell),总时间约为 300 毫秒(因此 6 秒的大部分时间都花在等待控制台跟上程序上)。
换句话说,默认情况下应该缓冲输出,所以我很好奇为什么你会看到如此巨大的加速。
3 个与切线相关的注释:
i = 0
开头或以i < 结尾;= 200000
)printf
语法与cout
语法混合在一起......对此的修复是显而易见的。sync_with_stdio
会产生小幅加速(大约 5%),但在重定向到文件时,影响可以忽略不计。这是一种微观优化,在大多数情况下您可能不需要(恕我直言)。What compiler/platform are you using? I see no significant difference here (RedHat, gcc 4.1.2); both programs take 5-6 seconds to finish (but "user" time is about 150 ms). If I redirect output to a file (through the shell), total time is about 300 ms (so most of the 6 seconds is spent waiting for my console to catch up to the program).
In other words, output should be buffered by default, so I'm curious why you're seeing such a huge speedup.
3 tangentially-related notes:
i = 0
or end withi <= 200000
)printf
syntax withcout
syntax when outputting count...the fix for that is obvious enough.sync_with_stdio
produces a small speedup (about 5%) for me when outputting to console, but the impact is negligible when redirecting to file. This is a micro-optimization which you probably wouldn't need in most cases (IMHO).cout 函数包含许多隐藏且复杂的逻辑,一直延伸到内核,因此您可以将文本写入屏幕,当您以这种方式使用缓冲区时,您实际上会执行批处理请求,而不是重复复杂的 I/奥打电话。
The cout function contains a lot of hidden and complex logic going all the way down the the kernel so you can write your text to the screen, when you use a buffer in that way your essentially do a batch request instead of repeating the complex I/O calls.
如果有缓冲区,实际 I/O 调用就会减少,这就是缓慢的部分。首先,缓冲区被填满,然后进行一次 I/O 调用来刷新缓冲区。在 Java 或任何其他 I/O 缓慢的系统中同样有帮助。
If you have a buffer, you get fewer actual I/O calls, which is the slow part. First, the buffer gets filled, then one I/O call is made to flush the buffer. Will be equally helpful in Java or any other system where I/O is slow.