std::cin 真的很慢

发布于 2024-12-29 17:50:23 字数 1242 浏览 3 评论 0原文

所以我试图给自己写一个Linux管道命令。将其视为 gnu 'cat' 或 'sed' 的复制品,它从 stdin 获取输入,进行一些处理并写入 stdout。

我最初编写了一个 AWK 脚本,但希望获得更高的性能,因此我使用了以下 C++ 代码:

std::string crtLine;
crtLine.reserve(1000);
while (true)
{
    std::getline(std::cin, crtLine);
    if (!std::cin) // failbit (EOF immediately found) or badbit (I/O error)
        break;

    std::cout << crtLine << "\n";
}

这正是 cat (不带任何参数)所做的事情。 事实证明,这个程序与 awk 对应程序一样慢,并且远不及 cat 快。

在 1GB 文件上进行测试:

$time cat 'file' | cat | wc -l
real    0m0.771s

$time cat 'file' | filter-range.sh | wc -l
real    0m44.267s

我尝试使用 cin.getline(buffer, size) 而不是 getline(istream, string),但没有任何改进。这就尴尬了,难道是缓冲的问题?我还尝试一次获取 100KB 而不是只获取一行,但没有帮助!有什么想法吗?

编辑: 你们所说的有道理,但罪魁祸首不是字符串构建/复制,也不是扫描换行符。 (缓冲区的大小也不是)。看看这两个程序:

char buf[200];
while (fgets(buf, 200, stdin))
    std::cout << buf;

$time cat 'file' | ./FilterRange > /dev/null
real    0m3.276s




char buf[200];
while (std::cin.getline(buf, 200))
    std::cout << buf << "\n";

$time cat 'file' | ./FilterRange > /dev/null
real    0m55.031s

它们都不操作字符串,并且都执行换行扫描,但其中一个比另一个慢 17 倍。它们的区别仅在于 cin 的使用。 我想我们可以有把握地得出结论:cin 搞砸了时机。

So I was trying to write myself a command for a linux pipeline. Think of it as a replica of gnu 'cat' or 'sed', that takes input from stdin, does some processing and writes to stdout.

I originally wrote an AWK script but wanted more performance so I used the following c++ code:

std::string crtLine;
crtLine.reserve(1000);
while (true)
{
    std::getline(std::cin, crtLine);
    if (!std::cin) // failbit (EOF immediately found) or badbit (I/O error)
        break;

    std::cout << crtLine << "\n";
}

This is exactly what cat (without any parameters does).
As it turns out, this program is about as slow as its awk counterpart, and nowhere near as fast as cat.

Testing on a 1GB file:

$time cat 'file' | cat | wc -l
real    0m0.771s

$time cat 'file' | filter-range.sh | wc -l
real    0m44.267s

Instead of getline(istream, string) I tried cin.getline(buffer, size) but no improvements. This is embarassing, is it a buffering issue? I also tried fetching 100KB at a time instead of just one line, no help! Any ideas?

EDIT:
What you folks say makes sense, BUT the culprit is not string building/copying and neither is scanning for newlines. (And neither is the size of the buffer). Take a look at these 2 programs:

char buf[200];
while (fgets(buf, 200, stdin))
    std::cout << buf;

$time cat 'file' | ./FilterRange > /dev/null
real    0m3.276s




char buf[200];
while (std::cin.getline(buf, 200))
    std::cout << buf << "\n";

$time cat 'file' | ./FilterRange > /dev/null
real    0m55.031s

Neither of them manipulate strings and both of them do newline scanning, however one is 17 times slower than the other. They differ only by the use of cin.
I think we can safely conclude that cin screws up the timing.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

那一片橙海, 2025-01-05 17:50:23

为了让标准 I/O 流对象获得良好的性能,您要做的第一事情就是关闭与标准 C 流对象的同步:

std::ios_base::sync_with_stdio(false);

一旦完成此操作,您应该会获得更好的性能。不过,您是否获得良好的表现是另一个问题。

由于有些人声称关于 cat 在里面会做什么,所以这应该是将一个流复制到另一个流的最快方法:

std::cout << std::cin.rdbuf();

我希望您能正确地 std: :copy() 一个流到另一个流,但这对于大多数 I/O 流实现来说效果不太好:

std::copy(std::istreambuf_iterator<char>(std::cin), std::istreambuf_iterator<char>(),
          std::ostreambuf_iterator<char>(std::cout));

我希望我最终能达到最好的效果...

The first thing you want to do to get good performance for the standard I/O stream objects it turn off synchronization with the standard C stream objects:

std::ios_base::sync_with_stdio(false);

Once you have done this you should get much better performance. Whether you get good performance is a different question though.

Since some people claimed funny things about what cat would do inside, here is what is supposed to be the fastest approach to copy one stream to another:

std::cout << std::cin.rdbuf();

I would love if the you could properly std::copy() one stream to another but this won't work too well with most I/O stream implementations:

std::copy(std::istreambuf_iterator<char>(std::cin), std::istreambuf_iterator<char>(),
          std::ostreambuf_iterator<char>(std::cout));

I hope I get to this being the best eventually...

姜生凉生 2025-01-05 17:50:23

这正是 cat 的作用(不带任何参数)。

并不真地。这与 /bin/cat 的效果完全相同,但使用的方法不同。

/bin/cat 看起来更像是这样:

while( (readSize = read(inFd, buffer, sizeof buffer)) > 0)
  write(outFd, buffer, readSize);

请注意,/bin/cat 不对其输入进行任何处理。它不会从中构建 std::string,也不会扫描它的 \n,它只是执行一个又一个的系统调用。

另一方面,您的程序会构建 string、复制它们、扫描 \n 等。

这个小而完整的程序运行 2-3 个命令比 /bin/cat 慢很多:

#include <string>
#include <iostream>

int main (int ac, char **av) {
  std::string crtLine;
  crtLine.reserve(1000);
  while(std::getline(std::cin, crtLine)) {
    std::cout << crtLine << "\n";
  }
}

我这样计时:

$ time ./x < inputFile > /dev/null
$ time /bin/cat < inputFile > /dev/null


EDIT
This program gets within 50% of the performance of /bin/cat:

#include <string>
#include <iostream>
#include <vector>

int main (int ac, char **av) {
  std::vector<char> v(4096);
  do {
    std::cin.read(&v[0], v.size());
    std::cout.write(&v[0], std::cin.gcount());
  } while(std::cin);
}

简而言之,如果您的要求是对输入进行逐行分析,那么您将需要付出一些代价才能使用格式化输入。另一方面,如果您需要执行逐字节分析,那么您可以使用未格式化的输入并且速度更快。

This is exactly what cat (without any parameters does).

Not really. This has exactly the same effect as /bin/cat, but it does not use the same method.

/bin/cat looks more like this:

while( (readSize = read(inFd, buffer, sizeof buffer)) > 0)
  write(outFd, buffer, readSize);

Notice that /bin/cat does no processing on its input. It doesn't build a std::string out of it, it doesn't scan it for \n, it just does one system call after another.

Your program, on the other hand, builds strings, make copies of them, scans for \n, etc, etc.

This small, complete program runs 2-3 orders of magnitude slower than /bin/cat:

#include <string>
#include <iostream>

int main (int ac, char **av) {
  std::string crtLine;
  crtLine.reserve(1000);
  while(std::getline(std::cin, crtLine)) {
    std::cout << crtLine << "\n";
  }
}

I timed it thus:

$ time ./x < inputFile > /dev/null
$ time /bin/cat < inputFile > /dev/null


EDIT
This program gets within 50% of the performance of /bin/cat:

#include <string>
#include <iostream>
#include <vector>

int main (int ac, char **av) {
  std::vector<char> v(4096);
  do {
    std::cin.read(&v[0], v.size());
    std::cout.write(&v[0], std::cin.gcount());
  } while(std::cin);
}

In short, if your requirement is to perform line-by-line analysis of the input, then you will have to pay some price to use formatted input. If, on the other hand, you need to perform byte-by-byte analysis, then you can use unformatted input and go faster.

时光清浅 2025-01-05 17:50:23

如果您确实希望 stdin 具有更好的性能,您应该尝试使用纯 C。

vector<char> line(0x1000);
while(!feof(stdin))
    fgets(&line.front(), line.size(), stdin);

If you really would like to have much better performance with stdin you should try to use pure C.

vector<char> line(0x1000);
while(!feof(stdin))
    fgets(&line.front(), line.size(), stdin);
秋千易 2025-01-05 17:50:23

我认为更快的解决方案将基于 sendfile

I think the faster solution will be based on sendfile

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文