std::cin 真的很慢
所以我试图给自己写一个Linux管道命令。将其视为 gnu 'cat' 或 'sed' 的复制品,它从 stdin 获取输入,进行一些处理并写入 stdout。
我最初编写了一个 AWK 脚本,但希望获得更高的性能,因此我使用了以下 C++ 代码:
std::string crtLine;
crtLine.reserve(1000);
while (true)
{
std::getline(std::cin, crtLine);
if (!std::cin) // failbit (EOF immediately found) or badbit (I/O error)
break;
std::cout << crtLine << "\n";
}
这正是 cat (不带任何参数)所做的事情。 事实证明,这个程序与 awk 对应程序一样慢,并且远不及 cat 快。
在 1GB 文件上进行测试:
$time cat 'file' | cat | wc -l
real 0m0.771s
$time cat 'file' | filter-range.sh | wc -l
real 0m44.267s
我尝试使用 cin.getline(buffer, size) 而不是 getline(istream, string),但没有任何改进。这就尴尬了,难道是缓冲的问题?我还尝试一次获取 100KB 而不是只获取一行,但没有帮助!有什么想法吗?
编辑: 你们所说的有道理,但罪魁祸首不是字符串构建/复制,也不是扫描换行符。 (缓冲区的大小也不是)。看看这两个程序:
char buf[200];
while (fgets(buf, 200, stdin))
std::cout << buf;
$time cat 'file' | ./FilterRange > /dev/null
real 0m3.276s
char buf[200];
while (std::cin.getline(buf, 200))
std::cout << buf << "\n";
$time cat 'file' | ./FilterRange > /dev/null
real 0m55.031s
它们都不操作字符串,并且都执行换行扫描,但其中一个比另一个慢 17 倍。它们的区别仅在于 cin 的使用。 我想我们可以有把握地得出结论:cin 搞砸了时机。
So I was trying to write myself a command for a linux pipeline. Think of it as a replica of gnu 'cat' or 'sed', that takes input from stdin, does some processing and writes to stdout.
I originally wrote an AWK script but wanted more performance so I used the following c++ code:
std::string crtLine;
crtLine.reserve(1000);
while (true)
{
std::getline(std::cin, crtLine);
if (!std::cin) // failbit (EOF immediately found) or badbit (I/O error)
break;
std::cout << crtLine << "\n";
}
This is exactly what cat (without any parameters does).
As it turns out, this program is about as slow as its awk counterpart, and nowhere near as fast as cat.
Testing on a 1GB file:
$time cat 'file' | cat | wc -l
real 0m0.771s
$time cat 'file' | filter-range.sh | wc -l
real 0m44.267s
Instead of getline(istream, string) I tried cin.getline(buffer, size) but no improvements. This is embarassing, is it a buffering issue? I also tried fetching 100KB at a time instead of just one line, no help! Any ideas?
EDIT:
What you folks say makes sense, BUT the culprit is not string building/copying and neither is scanning for newlines. (And neither is the size of the buffer). Take a look at these 2 programs:
char buf[200];
while (fgets(buf, 200, stdin))
std::cout << buf;
$time cat 'file' | ./FilterRange > /dev/null
real 0m3.276s
char buf[200];
while (std::cin.getline(buf, 200))
std::cout << buf << "\n";
$time cat 'file' | ./FilterRange > /dev/null
real 0m55.031s
Neither of them manipulate strings and both of them do newline scanning, however one is 17 times slower than the other. They differ only by the use of cin.
I think we can safely conclude that cin screws up the timing.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
为了让标准 I/O 流对象获得良好的性能,您要做的第一事情就是关闭与标准 C 流对象的同步:
一旦完成此操作,您应该会获得更好的性能。不过,您是否获得良好的表现是另一个问题。
由于有些人声称关于
cat
在里面会做什么,所以这应该是将一个流复制到另一个流的最快方法:我希望您能正确地
std: :copy()
一个流到另一个流,但这对于大多数 I/O 流实现来说效果不太好:我希望我最终能达到最好的效果...
The first thing you want to do to get good performance for the standard I/O stream objects it turn off synchronization with the standard C stream objects:
Once you have done this you should get much better performance. Whether you get good performance is a different question though.
Since some people claimed funny things about what
cat
would do inside, here is what is supposed to be the fastest approach to copy one stream to another:I would love if the you could properly
std::copy()
one stream to another but this won't work too well with most I/O stream implementations:I hope I get to this being the best eventually...
并不真地。这与 /bin/cat 的效果完全相同,但使用的方法不同。
/bin/cat
看起来更像是这样:请注意,
/bin/cat
不对其输入进行任何处理。它不会从中构建std::string
,也不会扫描它的\n
,它只是执行一个又一个的系统调用。另一方面,您的程序会构建
string
、复制它们、扫描\n
等。这个小而完整的程序运行 2-3 个命令比 /bin/cat 慢很多:
我这样计时:
EDIT
This program gets within 50% of the performance of /bin/cat:
简而言之,如果您的要求是对输入进行逐行分析,那么您将需要付出一些代价才能使用格式化输入。另一方面,如果您需要执行逐字节分析,那么您可以使用未格式化的输入并且速度更快。
Not really. This has exactly the same effect as /bin/cat, but it does not use the same method.
/bin/cat
looks more like this:Notice that
/bin/cat
does no processing on its input. It doesn't build astd::string
out of it, it doesn't scan it for\n
, it just does one system call after another.Your program, on the other hand, builds
string
s, make copies of them, scans for\n
, etc, etc.This small, complete program runs 2-3 orders of magnitude slower than /bin/cat:
I timed it thus:
EDIT
This program gets within 50% of the performance of /bin/cat:
In short, if your requirement is to perform line-by-line analysis of the input, then you will have to pay some price to use formatted input. If, on the other hand, you need to perform byte-by-byte analysis, then you can use unformatted input and go faster.
如果您确实希望 stdin 具有更好的性能,您应该尝试使用纯 C。
If you really would like to have much better performance with stdin you should try to use pure C.
我认为更快的解决方案将基于 sendfile
I think the faster solution will be based on sendfile