为什么在 C++ 中从 stdin 读取行要慢得多?比Python?

发布于 2025-01-07 16:04:09 字数 3747 浏览 4 评论 0原文

我想比较使用 Python 和 C++ 读取来自 stdin 的字符串输入行,并惊讶地发现我的 C++ 代码运行速度比等效的 Python 代码慢一个数量级。由于我的 C++ 很生疏,而且我还不是专家 Pythonista,请告诉我我是否做错了什么或者我是否误解了什么。


TLDR 答案: 包含语句:cin.sync_with_stdio(false) 或仅使用 fgets

TLDR 结果: 一直向下滚动到我的问题的底部并查看表格。)


C++ 代码:

#include <iostream>
#include <time.h>

using namespace std;

int main() {
    string input_line;
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    while (cin) {
        getline(cin, input_line);
        if (!cin.eof())
            line_count++;
    };

    sec = (int) time(NULL) - start;
    cerr << "Read " << line_count << " lines in " << sec << " seconds.";
    if (sec > 0) {
        lps = line_count / sec;
        cerr << " LPS: " << lps << endl;
    } else
        cerr << endl;
    return 0;
}

// Compiled with:
// g++ -O3 -o readline_test_cpp foo.cpp

Python 等效代码:

#!/usr/bin/env python
import time
import sys

count = 0
start = time.time()

for line in  sys.stdin:
    count += 1

delta_sec = int(time.time() - start_time)
if delta_sec >= 0:
    lines_per_sec = int(round(count/delta_sec))
    print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
       lines_per_sec))

这是我的结果: >

$ cat test_lines | ./readline_test_cpp
Read 5570000 lines in 9 seconds. LPS: 618889

$ cat test_lines | ./readline_test.py
Read 5570000 lines in 1 seconds. LPS: 5570000

我应该注意到我尝试过这个均在 Mac OS X v10.6.8 (Snow Leopard) 和 Linux 2.6.32 (Red Hat Linux 6.2) 下运行。前者是 MacBook Pro,后者是一个非常强大的服务器,但这并不是太相关。

$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done
Test run 1 at Mon Feb 20 21:29:28 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 2 at Mon Feb 20 21:29:39 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 3 at Mon Feb 20 21:29:50 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 4 at Mon Feb 20 21:30:01 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 5 at Mon Feb 20 21:30:11 EST 2012
CPP:   Read 5570001 lines in 10 seconds. LPS: 557000
Python:Read 5570000 lines in  1 seconds. LPS: 5570000

微小的基准附录和回顾

为了完整起见,我想我应该更新同一文件的读取速度与原始(同步)C++ 代码相同的框。同样,这是针对快速磁盘上的 100M 行文件。以下是几种解决方案/方法的比较:

执行行每秒
数python (默认)3,571,428
cin (默认/天真)819,672
cin (无同步)12,500,000
fgets14,285,714
wc (不公平比较)54,644,808

I wanted to compare reading lines of string input from stdin using Python and C++ and was shocked to see my C++ code run an order of magnitude slower than the equivalent Python code. Since my C++ is rusty and I'm not yet an expert Pythonista, please tell me if I'm doing something wrong or if I'm misunderstanding something.


(TLDR answer: include the statement: cin.sync_with_stdio(false) or just use fgets instead.

TLDR results: scroll all the way down to the bottom of my question and look at the table.)


C++ code:

#include <iostream>
#include <time.h>

using namespace std;

int main() {
    string input_line;
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    while (cin) {
        getline(cin, input_line);
        if (!cin.eof())
            line_count++;
    };

    sec = (int) time(NULL) - start;
    cerr << "Read " << line_count << " lines in " << sec << " seconds.";
    if (sec > 0) {
        lps = line_count / sec;
        cerr << " LPS: " << lps << endl;
    } else
        cerr << endl;
    return 0;
}

// Compiled with:
// g++ -O3 -o readline_test_cpp foo.cpp

Python Equivalent:

#!/usr/bin/env python
import time
import sys

count = 0
start = time.time()

for line in  sys.stdin:
    count += 1

delta_sec = int(time.time() - start_time)
if delta_sec >= 0:
    lines_per_sec = int(round(count/delta_sec))
    print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
       lines_per_sec))

Here are my results:

$ cat test_lines | ./readline_test_cpp
Read 5570000 lines in 9 seconds. LPS: 618889

$ cat test_lines | ./readline_test.py
Read 5570000 lines in 1 seconds. LPS: 5570000

I should note that I tried this both under Mac OS X v10.6.8 (Snow Leopard) and Linux 2.6.32 (Red Hat Linux 6.2). The former is a MacBook Pro, and the latter is a very beefy server, not that this is too pertinent.

$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done
Test run 1 at Mon Feb 20 21:29:28 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 2 at Mon Feb 20 21:29:39 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 3 at Mon Feb 20 21:29:50 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 4 at Mon Feb 20 21:30:01 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 5 at Mon Feb 20 21:30:11 EST 2012
CPP:   Read 5570001 lines in 10 seconds. LPS: 557000
Python:Read 5570000 lines in  1 seconds. LPS: 5570000

Tiny benchmark addendum and recap

For completeness, I thought I'd update the read speed for the same file on the same box with the original (synced) C++ code. Again, this is for a 100M line file on a fast disk. Here's the comparison, with several solutions/approaches:

ImplementationLines per second
python (default)3,571,428
cin (default/naive)819,672
cin (no sync)12,500,000
fgets14,285,714
wc (not fair comparison)54,644,808

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

倾城°AllureLove 2025-01-14 16:04:09

tl;dr: 因为 C++ 中不同的默认设置需要更多的系统调用。

默认情况下,cin 与 stdio 同步,这会导致它避免任何输入缓冲。如果将其添加到 main 的顶部,您应该会看到更好的性能:

std::ios_base::sync_with_stdio(false);

通常,当缓冲输入流时,不是一次读取一个字符,而是以更大的块读取流。这减少了系统调用的数量,而系统调用通常相对昂贵。但是,由于基于 FILE*stdioiostreams 通常具有单独的实现,因此具有单独的缓冲区,因此如果同时使用两者,可能会导致问题一起。例如:

int myvalue1;
cin >> myvalue1;
int myvalue2;
scanf("%d",&myvalue2);

如果 cin 读取的输入多于实际需要的输入,则第二个整数值将不可用于 scanf 函数,该函数拥有自己的独立缓冲区。这会导致意想不到的结果。

为了避免这种情况,默认情况下,流与 stdio 同步。实现此目的的一种常见方法是使用 stdio 函数根据需要让 cin 一次读取每个字符。不幸的是,这带来了很多开销。对于少量输入,这不是一个大问题,但是当您读取数百万行时,性能损失就很大了。

幸运的是,库设计者认为,如果您知道自己在做什么,您也应该能够禁用此功能以获得更高的性能,因此他们提供了 sync_with_stdio 方法。从这个链接(添加了重点):

如果关闭同步,则允许 C++ 标准流独立缓冲其 I/O,在某些情况下可能会快得多

tl;dr: Because of different default settings in C++ requiring more system calls.

By default, cin is synchronized with stdio, which causes it to avoid any input buffering. If you add this to the top of your main, you should see much better performance:

std::ios_base::sync_with_stdio(false);

Normally, when an input stream is buffered, instead of reading one character at a time, the stream will be read in larger chunks. This reduces the number of system calls, which are typically relatively expensive. However, since the FILE* based stdio and iostreams often have separate implementations and therefore separate buffers, this could lead to a problem if both were used together. For example:

int myvalue1;
cin >> myvalue1;
int myvalue2;
scanf("%d",&myvalue2);

If more input was read by cin than it actually needed, then the second integer value wouldn't be available for the scanf function, which has its own independent buffer. This would lead to unexpected results.

To avoid this, by default, streams are synchronized with stdio. One common way to achieve this is to have cin read each character one at a time as needed using stdio functions. Unfortunately, this introduces a lot of overhead. For small amounts of input, this isn't a big problem, but when you are reading millions of lines, the performance penalty is significant.

Fortunately, the library designers decided that you should also be able to disable this feature to get improved performance if you knew what you were doing, so they provided the sync_with_stdio method. From this link (emphasis added):

If the synchronization is turned off, the C++ standard streams are allowed to buffer their I/O independently, which may be considerably faster in some cases.

漫漫岁月 2025-01-14 16:04:09

出于好奇,我了解了底层发生的情况,并使用了 dtruss/strace每次测试。

C++

./a.out < in
Saw 6512403 lines in 8 seconds.  Crunch speed: 814050

系统调用 sudo dtruss -c ./a.out

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            6
pread                                           8
mprotect                                       17
mmap                                           22
stat64                                         30
read_nocancel                               25958

Python

./a.py < in
Read 6512402 lines in 1 seconds. LPS: 6512402

系统调用中 sudo dtruss -c ./a.py

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            5
pread                                           8
mprotect                                       17
mmap                                           21
stat64                                         29

Just out of curiosity I've taken a look at what happens under the hood, and I've used dtruss/strace on each test.

C++

./a.out < in
Saw 6512403 lines in 8 seconds.  Crunch speed: 814050

syscalls sudo dtruss -c ./a.out < in

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            6
pread                                           8
mprotect                                       17
mmap                                           22
stat64                                         30
read_nocancel                               25958

Python

./a.py < in
Read 6512402 lines in 1 seconds. LPS: 6512402

syscalls sudo dtruss -c ./a.py < in

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            5
pread                                           8
mprotect                                       17
mmap                                           21
stat64                                         29
赠我空喜 2025-01-14 16:04:09

我落后了几年,但是:

在原始帖子的“编辑 4/5/6”中,您正在使用这种结构:

$ /usr/bin/time cat big_file | program_to_benchmark

这在几个不同的方面都是错误的:

  1. 您实际上是在计时cat 的执行,而不是您的基准测试。 time 显示的“user”和“sys”CPU 使用率是 cat 的 CPU 使用率,而不是您的基准测试程序。更糟糕的是,“真实”时间也不一定准确。根据 cat 和本地操作系统中管道的实现,cat 可能会在读取器进程完成其工作之前写入最终的巨大缓冲区并退出。< /p>

  2. 使用cat是不必要的,而且实际上会适得其反;您正在添加移动部件。如果您使用的是一个足够旧的系统(即具有单个 CPU,并且在某些代的计算机中 I/O 比 CPU 更快),仅 cat 正在运行这一事实就可能会显着影响结果。您还受到 cat 可能执行的任何输入和输出缓冲以及其他处理的影响。 (这可能会为您赢得“猫的无用使用”奖,如果我是 Randal Schwartz。

更好的构造是:

$ /usr/bin/time program_to_benchmark < big_file

shell 打开 big_file,将其传递给您的程序(嗯,实际上是 time,然后它会执行您的程序)子进程)作为已打开的文件描述符。 100% 的文件读取完全由您尝试进行基准测试的程序负责。这可以让您真实地了解其性能,而不会出现虚假的复杂情况。

我将提到两个可能但实际上错误的“修复”,也可以考虑(但我对它们进行了不同的“编号”,因为这些不是原始帖子中的错误):

A.您可以通过计时“修复”此问题仅您的程序:

$ cat big_file | /usr/bin/time program_to_benchmark

B. 或对整个管道进行计时:

$ /usr/bin/time sh -c 'cat big_file | program_to_benchmark'

这些都是错误的,原因与 #2 相同:它们仍然不必要地使用 cat。我提到它们有几个原因:

  • 对于那些不太熟悉 POSIX shell 的 I/O 重定向功能的人来说,它们更“自然”

  • 可能在某些情况下需要 cat (例如:要读取的文件需要某种权限才能访问,而您确实这样做不想向要进行基准测试的程序授予该权限:sudo cat /dev/sda | /usr/bin/time my_compression_test --no-output)

  • 实际上,在现代机器上,管道中添加的 cat 可能是没有真正的后果。

但我在说最后一句话时有些犹豫。如果我们检查“Edit 5”中的最后结果 --

$ /usr/bin/time cat temp_big_file | wc -l
0.01user 1.34system 0:01.83elapsed 74%CPU ...

-- 这表明 cat 在测试期间消耗了 74% 的 CPU;事实上 1.34/1.83 大约是 74%。也许运行:

$ /usr/bin/time wc -l < temp_big_file

只需要剩下的 0.49 秒!可能不是:这里的 cat 必须支付从“磁盘”(实际上是缓冲区缓存)传输文件的 read() 系统调用(或等效函数)的费用,以及管道写入将它们传送到wc。正确的测试仍然必须执行这些 read() 调用;只有写入管道和从管道读取的调用会被保存,而且这些调用应该相当便宜。

尽管如此,我预测您将能够测量 cat file | 之间的差异。 wc -l 和 wc -l <​​code>文件 并找到明显的(两位数百分比)差异。每个较慢的测试都会在绝对时间上付出类似的代价;然而,这仅占其较大总时间的一小部分。

事实上,我在 Linux 3.13 (Ubuntu 14.04) 系统上用 1.5 GB 的垃圾文件做了一些快速测试,获得了这些结果(这些结果实际上是“3 中最好的”结果;当然是在启动缓存之后)

$ time wc -l < /tmp/junk
real 0.280s user 0.156s sys 0.124s (total cpu 0.280s)
$ time cat /tmp/junk | wc -l
real 0.407s user 0.157s sys 0.618s (total cpu 0.775s)
$ time sh -c 'cat /tmp/junk | wc -l'
real 0.411s user 0.118s sys 0.660s (total cpu 0.778s)

:这两个管道结果声称占用的 CPU 时间(用户+系统)比实际挂钟时间要多。这是因为我使用的是 shell (bash) 的内置“time”命令,该命令可以识别管道;我在一台多核机器上,管道中的单独进程可以使用单独的核心,从而比实时更快地累积 CPU 时间。使用 /usr/bin/time 我看到比实时更小的 CPU 时间——这表明它只能对在命令行上传递给它的单个管道元素进行计时。此外,shell 的输出给出毫秒,而 /usr/bin/time 只给出百分之一秒。

因此,在 wc -l 的效率水平下,cat 会产生巨大的差异:409 / 283 = 1.453 或实时性提高 45.3%,而 775 / 280 = 2.768,或CPU 使用量增加了 177%!在我随机的当时就在那里的测试盒上。

我应该补充一点,这些测试风格之间至少还有一个显着差异,我不能说这是优点还是缺点;你必须自己决定:

当你运行 cat big_file | 时/usr/bin/time my_program,您的程序正在以 cat 发送的速度从管道接收输入,并且块大小不大于 cat 写入的大小代码>.

当您运行 /usr/bin/time my_program < big_file,您的程序接收实际文件的打开文件描述符。当提供引用常规文件的文件描述符时,您的程序(在许多情况下是编写它的语言的 I/O 库)可能会采取不同的操作。它可以使用 mmap(2) 将输入文件映射到其地址空间,而不是使用显式的 read(2) 系统调用。这些差异对基准测试结果的影响可能比运行 cat 二进制文件的小成本大得多。

当然,如果同一程序在两种情况下的性能显着不同,那么这是一个有趣的基准测试结果。它表明,程序或其 I/O 库确实正在做一些有趣的事情,例如使用 mmap()。因此,在实践中,以两种方式运行基准测试可能会更好;也许可以通过一些小因素对 cat 结果进行折扣,以“原谅”运行 cat 本身的成本。

I'm a few years behind here, but:

In 'Edit 4/5/6' of the original post, you are using the construction:

$ /usr/bin/time cat big_file | program_to_benchmark

This is wrong in a couple of different ways:

  1. You're actually timing the execution of cat, not your benchmark. The 'user' and 'sys' CPU usage displayed by time are those of cat, not your benchmarked program. Even worse, the 'real' time is also not necessarily accurate. Depending on the implementation of cat and of pipelines in your local OS, it is possible that cat writes a final giant buffer and exits long before the reader process finishes its work.

  2. Use of cat is unnecessary and in fact counterproductive; you're adding moving parts. If you were on a sufficiently old system (i.e. with a single CPU and -- in certain generations of computers -- I/O faster than CPU) -- the mere fact that cat was running could substantially color the results. You are also subject to whatever input and output buffering and other processing cat may do. (This would likely earn you a 'Useless Use Of Cat' award if I were Randal Schwartz.

A better construction would be:

$ /usr/bin/time program_to_benchmark < big_file

In this statement it is the shell which opens big_file, passing it to your program (well, actually to time which then executes your program as a subprocess) as an already-open file descriptor. 100% of the file reading is strictly the responsibility of the program you're trying to benchmark. This gets you a real reading of its performance without spurious complications.

I will mention two possible, but actually wrong, 'fixes' which could also be considered (but I 'number' them differently as these are not things which were wrong in the original post):

A. You could 'fix' this by timing only your program:

$ cat big_file | /usr/bin/time program_to_benchmark

B. or by timing the entire pipeline:

$ /usr/bin/time sh -c 'cat big_file | program_to_benchmark'

These are wrong for the same reasons as #2: they're still using cat unnecessarily. I mention them for a few reasons:

  • they're more 'natural' for people who aren't entirely comfortable with the I/O redirection facilities of the POSIX shell

  • there may be cases where cat is needed (e.g.: the file to be read requires some sort of privilege to access, and you do not want to grant that privilege to the program to be benchmarked: sudo cat /dev/sda | /usr/bin/time my_compression_test --no-output)

  • in practice, on modern machines, the added cat in the pipeline is probably of no real consequence.

But I say that last thing with some hesitation. If we examine the last result in 'Edit 5' --

$ /usr/bin/time cat temp_big_file | wc -l
0.01user 1.34system 0:01.83elapsed 74%CPU ...

-- this claims that cat consumed 74% of the CPU during the test; and indeed 1.34/1.83 is approximately 74%. Perhaps a run of:

$ /usr/bin/time wc -l < temp_big_file

would have taken only the remaining .49 seconds! Probably not: cat here had to pay for the read() system calls (or equivalent) which transferred the file from 'disk' (actually buffer cache), as well as the pipe writes to deliver them to wc. The correct test would still have had to do those read() calls; only the write-to-pipe and read-from-pipe calls would have been saved, and those should be pretty cheap.

Still, I predict you would be able to measure the difference between cat file | wc -l and wc -l < file and find a noticeable (2-digit percentage) difference. Each of the slower tests will have paid a similar penalty in absolute time; which would however amount to a smaller fraction of its larger total time.

In fact I did some quick tests with a 1.5 gigabyte file of garbage, on a Linux 3.13 (Ubuntu 14.04) system, obtaining these results (these are actually 'best of 3' results; after priming the cache, of course):

$ time wc -l < /tmp/junk
real 0.280s user 0.156s sys 0.124s (total cpu 0.280s)
$ time cat /tmp/junk | wc -l
real 0.407s user 0.157s sys 0.618s (total cpu 0.775s)
$ time sh -c 'cat /tmp/junk | wc -l'
real 0.411s user 0.118s sys 0.660s (total cpu 0.778s)

Notice that the two pipeline results claim to have taken more CPU time (user+sys) than real wall-clock time. This is because I'm using the shell (bash)'s built-in 'time' command, which is cognizant of the pipeline; and I'm on a multi-core machine where separate processes in a pipeline can use separate cores, accumulating CPU time faster than realtime. Using /usr/bin/time I see smaller CPU time than realtime -- showing that it can only time the single pipeline element passed to it on its command line. Also, the shell's output gives milliseconds while /usr/bin/time only gives hundredths of a second.

So at the efficiency level of wc -l, the cat makes a huge difference: 409 / 283 = 1.453 or 45.3% more realtime, and 775 / 280 = 2.768, or a whopping 177% more CPU used! On my random it-was-there-at-the-time test box.

I should add that there is at least one other significant difference between these styles of testing, and I can't say whether it is a benefit or fault; you have to decide this yourself:

When you run cat big_file | /usr/bin/time my_program, your program is receiving input from a pipe, at precisely the pace sent by cat, and in chunks no larger than written by cat.

When you run /usr/bin/time my_program < big_file, your program receives an open file descriptor to the actual file. Your program -- or in many cases the I/O libraries of the language in which it was written -- may take different actions when presented with a file descriptor referencing a regular file. It may use mmap(2) to map the input file into its address space, instead of using explicit read(2) system calls. These differences could have a far larger effect on your benchmark results than the small cost of running the cat binary.

Of course it is an interesting benchmark result if the same program performs significantly differently between the two cases. It shows that, indeed, the program or its I/O libraries are doing something interesting, like using mmap(). So in practice it might be good to run the benchmarks both ways; perhaps discounting the cat result by some small factor to "forgive" the cost of running cat itself.

拒绝两难 2025-01-14 16:04:09

我在 Mac 上使用 g++ 在我的计算机上重现了原始结果。

while 循环之前将以下语句添加到 C++ 版本中,使其与 Python 版本:

std::ios_base::sync_with_stdio(false);
char buffer[1048576];
std::cin.rdbuf()->pubsetbuf(buffer, sizeof(buffer));

sync_with_stdio 将速度提高到 2 秒,设置更大的缓冲区将其降低到 1 秒。

I reproduced the original result on my computer using g++ on a Mac.

Adding the following statements to the C++ version just before the while loop brings it inline with the Python version:

std::ios_base::sync_with_stdio(false);
char buffer[1048576];
std::cin.rdbuf()->pubsetbuf(buffer, sizeof(buffer));

sync_with_stdio improved speed to 2 seconds, and setting a larger buffer brought it down to 1 second.

_失温 2025-01-14 16:04:09

如果您不关心文件加载时间或加载小文本文件,getline、流运算符、scanf 会很方便。但是,如果您关心性能,那么您实际上应该将整个文件缓冲到内存中(假设它适合)。

下面是一个示例:

//open file in binary mode
std::fstream file( filename, std::ios::in|::std::ios::binary );
if( !file ) return NULL;

//read the size...
file.seekg(0, std::ios::end);
size_t length = (size_t)file.tellg();
file.seekg(0, std::ios::beg);

//read into memory buffer, then close it.
char *filebuf = new char[length+1];
file.read(filebuf, length);
filebuf[length] = '\0'; //make it null-terminated
file.close();

如果需要,您可以在该缓冲区周围包装一个流,以便更方便地访问,如下所示:

std::istrstream header(&filebuf[0], length);

此外,如果您可以控制文件,请考虑使用平面二进制数据格式而不是文本。读写更加可靠,因为您不必处理空白的所有歧义。它也更小并且解析速度更快。

getline, stream operators, scanf, can be convenient if you don't care about file loading time or if you are loading small text files. But, if the performance is something you care about, you should really just buffer the entire file into memory (assuming it will fit).

Here's an example:

//open file in binary mode
std::fstream file( filename, std::ios::in|::std::ios::binary );
if( !file ) return NULL;

//read the size...
file.seekg(0, std::ios::end);
size_t length = (size_t)file.tellg();
file.seekg(0, std::ios::beg);

//read into memory buffer, then close it.
char *filebuf = new char[length+1];
file.read(filebuf, length);
filebuf[length] = '\0'; //make it null-terminated
file.close();

If you want, you can wrap a stream around that buffer for more convenient access like this:

std::istrstream header(&filebuf[0], length);

Also, if you are in control of the file, consider using a flat binary data format instead of text. It's more reliable to read and write because you don't have to deal with all the ambiguities of whitespace. It's also smaller and much faster to parse.

西瑶 2025-01-14 16:04:09

对于我来说,以下代码比迄今为止发布的其他代码更快:
(Visual Studio 2013,64 位,500 MB 文件,行长度统一为 [0, 1000))。

const int buffer_size = 500 * 1024;  // Too large/small buffer is not good.
std::vector<char> buffer(buffer_size);
int size;
while ((size = fread(buffer.data(), sizeof(char), buffer_size, stdin)) > 0) {
    line_count += count_if(buffer.begin(), buffer.begin() + size, [](char ch) { return ch == '\n'; });
}

它比我所有的 Python 尝试都高出 2 倍以上。

The following code was faster for me than the other code posted here so far:
(Visual Studio 2013, 64-bit, 500 MB file with line length uniformly in [0, 1000)).

const int buffer_size = 500 * 1024;  // Too large/small buffer is not good.
std::vector<char> buffer(buffer_size);
int size;
while ((size = fread(buffer.data(), sizeof(char), buffer_size, stdin)) > 0) {
    line_count += count_if(buffer.begin(), buffer.begin() + size, [](char ch) { return ch == '\n'; });
}

It beats all my Python attempts by more than a factor 2.

森末i 2025-01-14 16:04:09

顺便说一句,C++ 版本的行数比 Python 版本的行数大 1 的原因是,仅当尝试读取超出 eof 的内容时,才会设置 eof 标志。所以正确的循环是:

while (cin) {
    getline(cin, input_line);

    if (!cin.eof())
        line_count++;
};

By the way, the reason the line count for the C++ version is one greater than the count for the Python version is that the eof flag only gets set when an attempt is made to read beyond eof. So the correct loop would be:

while (cin) {
    getline(cin, input_line);

    if (!cin.eof())
        line_count++;
};
苦妄 2025-01-14 16:04:09

在第二个示例(使用 scanf())中,速度仍然较慢的原因可能是因为 scanf("%s") 解析字符串并查找任何空格字符(空格、制表符、换行符)。

另外,是的,CPython 会进行一些缓存以避免硬盘读取。

In your second example (with scanf()) reason why this is still slower might be because scanf("%s") parses string and looks for any space char (space, tab, newline).

Also, yes, CPython does some caching to avoid harddisk reads.

忘羡 2025-01-14 16:04:09

答案的第一个元素: 很慢。该死的慢。我使用 scanf 获得了巨大的性能提升,如下所示,但它仍然比 Python 慢两倍。

#include <iostream>
#include <time.h>
#include <cstdio>

using namespace std;

int main() {
    char buffer[10000];
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    int read = 1;
    while(read > 0) {
        read = scanf("%s", buffer);
        line_count++;
    };
    sec = (int) time(NULL) - start;
    line_count--;
    cerr << "Saw " << line_count << " lines in " << sec << " seconds." ;
    if (sec > 0) {
        lps = line_count / sec;
        cerr << "  Crunch speed: " << lps << endl;
    } 
    else
        cerr << endl;
    return 0;
}

A first element of an answer: <iostream> is slow. Damn slow. I get a huge performance boost with scanf as in the below, but it is still two times slower than Python.

#include <iostream>
#include <time.h>
#include <cstdio>

using namespace std;

int main() {
    char buffer[10000];
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    int read = 1;
    while(read > 0) {
        read = scanf("%s", buffer);
        line_count++;
    };
    sec = (int) time(NULL) - start;
    line_count--;
    cerr << "Saw " << line_count << " lines in " << sec << " seconds." ;
    if (sec > 0) {
        lps = line_count / sec;
        cerr << "  Crunch speed: " << lps << endl;
    } 
    else
        cerr << endl;
    return 0;
}
夏尔 2025-01-14 16:04:09

好吧,我看到在你的第二个解决方案中,你从 cin 切换到 scanf,这是我要给你的第一个建议 (cin是 slooooooooooow)。现在,如果您从 scanf 切换到 fgets,您会看到性能的另一个提升:fgets 是最快的字符串输入 C++ 函数。

顺便说一句,不知道同步的事情,很好。但您仍然应该尝试fgets

Well, I see that in your second solution you switched from cin to scanf, which was the first suggestion I was going to make you (cin is sloooooooooooow). Now, if you switch from scanf to fgets, you would see another boost in performance: fgets is the fastest C++ function for string input.

BTW, didn't know about that sync thing, nice. But you should still try fgets.

岁月如刀 2025-01-14 16:04:09

我玩这个游戏已经晚了,但我想我应该把我的两分钱投入:

Python 行:

for line in  sys.stdin:
    count += 1

从流中读取数据。它仅计算流遇到的行数 - 仅此而已

Peter Mortensen 的 dtruss 分析得出了一些结论:
https://stackoverflow.com/a/9657502/1043530

请注意,

read_nocancel                               25958

这不是用 python 完成的。 python 解释器就像任何其他程序一样,如果它执行 I/O,它将通过 strace 或 dtruss 显示它。

用实际读取重做这个“超快”Python 程序,我相信您会看到 dtruss 输出中的一些变化。
是的,我知道我迟到了。 。 。但这引起了我的注意,因为人们正在谈论禁用这个那个,fgets,vs 等等等等。 。 。如果程序“A”不执行磁盘 I/O,但程序“B”执行磁盘 I/O,则在许多/大多数/所有情况下,这可以解释为什么程序“A”更快。

TLDR:Peter Mortensen dtruss run 所证明的程序与他的帖子之间的错误等价应标记为答案。

-标记

I am way late to this game, but I thought I'd put my two cents in:

The python line:

for line in  sys.stdin:
    count += 1

Does NOT read data from the stream. It merely counts the number lines that the stream encounters - nothing more.

Peter Mortensen was on to something with his dtruss analysis:
https://stackoverflow.com/a/9657502/1043530

Note the

read_nocancel                               25958

that is not done with the python. A python interpreter is just like any other program, if it does I/O it will show it via strace or dtruss.

Redo this 'super fast' python program with actual reads and I believe you'll see some changes in the dtruss output.
Yeah, I know I'm late . . . but this one caught my eye because of how folks are talking disabling this and that, fgets, vs blah whatever . . . If program 'A' does not do disk I/O but program 'B' does, in many/most/all cases that explain why program 'A' is faster.

TLDR: False equivalence between programs as proven by Peter Mortensen dtruss run and his post should be marked as the answer.

-Mark

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文