为什么这个 C 代码比这个 C++ 更快?代码 ?获取文件中最大的行

发布于 2024-12-26 13:35:04 字数 1851 浏览 1 评论 0原文

我有一个程序的两个版本,它们基本上做同样的事情,获取文件中一行的最大长度,我有一个大约有 8000 行的文件,我的 C 代码有点原始(当然!)比我用 C++ 编写的代码。 C 程序大约需要 2 秒才能运行,而 C++ 程序则需要 10 秒才能运行(这两种情况我都使用同一文件进行测试)。但为什么?我预计它会花费相同的时间或更多一点,但不会慢 8 秒!

我的 C 代码:

#include <stdio.h>
#include <stdlib.h> 
#include <string.h>

#if _DEBUG
    #define DEBUG_PATH "../Debug/"
#else
    #define DEBUG_PATH ""
#endif

const char FILE_NAME[] = DEBUG_PATH "data.noun";

int main()
{   
    int sPos = 0;
    int maxCount = 0;
    int cPos = 0;
    int ch;
    FILE *in_file;              

    in_file = fopen(FILE_NAME, "r");
    if (in_file == NULL) 
    {
        printf("Cannot open %s\n", FILE_NAME);
        exit(8);
    }       

    while (1) 
    {
        ch = fgetc(in_file);
        if(ch == 0x0A || ch == EOF) // \n or \r or \r\n or end of file
        {           
            if ((cPos - sPos) > maxCount)
                maxCount = (cPos - sPos);

            if(ch == EOF)
                break;

            sPos = cPos;
        }
        else
            cPos++;
    }

    fclose(in_file);

    printf("Max line length: %i\n",  maxCount); 

    getch();
    return (0);
}

我的 C++ 代码:

#include <iostream>
#include <fstream>
#include <stdio.h>
#include <string>

using namespace std;

#ifdef _DEBUG
    #define FILE_PATH "../Debug/data.noun"
#else
    #define FILE_PATH "data.noun"
#endif

int main()
{
    string fileName = FILE_PATH;
    string s = "";
    ifstream file;
    int size = 0;

    file.open(fileName.c_str());
    if(!file)
    {
        printf("could not open file!");
        return 0;
    }

    while(getline(file, s) )
            size = (s.length() > size) ? s.length() : size;
    file.close();

    printf("biggest line in file: %i", size);   

    getchar();
    return 0;
}

I have two versions of a program that does basically the same thing, getting the biggest length of a line in a file, I have a file with about 8 thousand lines, my code in C is a little bit more primitive (of course!) than the code I have in C++. The C programm takes about 2 seconds to run, while the program in C++ takes 10 seconds to run (same file I am testing with for both cases). But why? I was expecting it to take the same amount of time or a little bit more but not 8 seconds slower!

my code in C:

#include <stdio.h>
#include <stdlib.h> 
#include <string.h>

#if _DEBUG
    #define DEBUG_PATH "../Debug/"
#else
    #define DEBUG_PATH ""
#endif

const char FILE_NAME[] = DEBUG_PATH "data.noun";

int main()
{   
    int sPos = 0;
    int maxCount = 0;
    int cPos = 0;
    int ch;
    FILE *in_file;              

    in_file = fopen(FILE_NAME, "r");
    if (in_file == NULL) 
    {
        printf("Cannot open %s\n", FILE_NAME);
        exit(8);
    }       

    while (1) 
    {
        ch = fgetc(in_file);
        if(ch == 0x0A || ch == EOF) // \n or \r or \r\n or end of file
        {           
            if ((cPos - sPos) > maxCount)
                maxCount = (cPos - sPos);

            if(ch == EOF)
                break;

            sPos = cPos;
        }
        else
            cPos++;
    }

    fclose(in_file);

    printf("Max line length: %i\n",  maxCount); 

    getch();
    return (0);
}

my code in C++:

#include <iostream>
#include <fstream>
#include <stdio.h>
#include <string>

using namespace std;

#ifdef _DEBUG
    #define FILE_PATH "../Debug/data.noun"
#else
    #define FILE_PATH "data.noun"
#endif

int main()
{
    string fileName = FILE_PATH;
    string s = "";
    ifstream file;
    int size = 0;

    file.open(fileName.c_str());
    if(!file)
    {
        printf("could not open file!");
        return 0;
    }

    while(getline(file, s) )
            size = (s.length() > size) ? s.length() : size;
    file.close();

    printf("biggest line in file: %i", size);   

    getchar();
    return 0;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

眼趣 2025-01-02 13:35:04

我的猜测是,这是您正在使用的编译器选项、编译器本身或文件系统的问题。我刚刚编译了两个版本(启用了优化)并针对 92,000 行文本文件运行它们:

c++ version:  113 ms
c version:    179 ms

我怀疑 C++ 版本更快的原因是 fgetc 很可能更慢。 fgetc 确实使用缓冲 I/O,但它通过函数调用来检索每个字符。我之前测试过它,fgetc 并不像在一次调用中读取整行的调用那么快(例如,与 fgets 相比)。

My guess is that it is a problem with the compiler options you are using, the compiler itself, or the file system. I just now compiled both versions (with optimizations on) and ran them against a 92,000 line text file:

c++ version:  113 ms
c version:    179 ms

And I suspect that the reason that the C++ version is faster is because fgetc is most likely slower. fgetc does use buffered I/O, but it is making a function call to retrieve every character. I've tested it before and fgetc is not as fast as making a call to read the entire line in one call (e.g., compared to fgets).

撕心裂肺的伤痛 2025-01-02 13:35:04

因此,在一些评论中,我回应了人们的答案,即问题可能是您的 C++ 版本完成的额外复制,它将行复制到字符串中的内存中。但我想测试一下。

首先,我实现了 fgetc 和 getline 版本并对它们进行计时。我确认在调试模式下,getline 版本速度较慢,约为 130 µs,而 fgetc 版本则为 60 µs。鉴于 iostream 比使用 stdio 慢的传统观点,这并不奇怪。然而,根据我过去的经验,iostreams 通过优化获得了显着的加速。当我比较我的释放模式时间时,这一点得到了证实:使用 getline 大约 20 µs,使用 fgetc 大约 48 µs。

事实上,使用 getline 和 iostreams 比 fgetc 更快,至少在发布模式下,与复制所有数据必须比不复制它慢的推理背道而驰,所以我不确定所有优化能够避免什么,我并没有真正寻找任何解释,但了解正在优化的内容会很有趣。 编辑:当我用探查器查看程序时,如何比较性能并不明显,因为不同的方法看起来彼此不同

Anwyay我想看看是否可以获得更快的版本通过避免在 fstream 对象上使用 get() 方法进行复制,只需执行 C 版本正在执行的操作即可。当我这样做时,我非常惊讶地发现在调试和发布中使用 fstream::get() 比 fgetc 和 getline 方法慢得多;调试时约为 230 µs,发布时约为 80 µs。

为了缩小速度下降的范围,我继续做了另一个版本,这次使用附加到 fstream 对象的stream_buf,以及其上的 snextc() 方法。这个版本是迄今为止最快的;调试时间为 25 µs,发布时间为 6 µs。

我猜测 fstream::get() 方法之所以慢得多,是因为它为每次调用构造了一个哨兵对象。虽然我还没有对此进行测试,但我看不出 get() 除了从stream_buf 中获取下一个字符之外还有什么作用,除了这些哨兵对象之外。

不管怎样,这个故事的寓意是,如果你想要快速 io,你可能最好使用高级 iostream 函数而不是 stdio,并且为了真正快速的 io 访问底层的stream_buf。 编辑:实际上这个道德可能只适用于 MSVC,请参阅底部的更新以了解不同工具链的结果。

供参考:

我使用 VS2010 和 boost 1.47 中的 chrono 进行计时。我构建了 32 位二进制文​​件(似乎是 boost chrono 所必需的,因为它似乎找不到该库的 64 位版本)。我没有调整编译选项,但它们可能不是完全标准的,因为我在我保留的临时项目中这样做了。

我测试的文件是 1.1 MB 20,000 行纯文本版本的 Oeuvres Complètes de Frédéric Bastiat, tome 1 by Frédéric Bastiat from Project Gutenberg, http://www.gutenberg.org/ebooks/35390

发布模式次数

fgetc time is: 48150 microseconds
snextc time is: 6019 microseconds
get time is: 79600 microseconds
getline time is: 19881 microseconds

调试模式次数:

fgetc time is: 59593 microseconds
snextc time is: 24915 microseconds
get time is: 228643 microseconds
getline time is: 130807 microseconds

这是我的 fgetc() 版本:

{
    auto begin = boost::chrono::high_resolution_clock::now();
    FILE *cin = fopen("D:/bames/automata/pg35390.txt","rb");
    assert(cin);
    unsigned maxLength = 0;
    unsigned i = 0;
    int ch;
    while(1) {
        ch = fgetc(cin);
        if(ch == 0x0A || ch == EOF) {
            maxLength = std::max(i,maxLength);
            i = 0;
            if(ch==EOF)
                break;
        } else {
            ++i;
        }
    }
    fclose(cin);
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "fgetc time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

这是我的 getline() 版本:

{
    auto begin = boost::chrono::high_resolution_clock::now();
    std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary);
    unsigned maxLength = 0;
    std::string line;
    while(std::getline(fin,line)) {
        maxLength = std::max(line.size(),maxLength);
    }
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "getline time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

fstream::get() 版本

{
    auto begin = boost::chrono::high_resolution_clock::now();
    std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary);
    unsigned maxLength = 0;
    unsigned i = 0;
    while(1) {
        int ch = fin.get();
        if(fin.good() && ch == 0x0A || fin.eof()) {
            maxLength = std::max(i,maxLength);
            i = 0;
            if(fin.eof())
                break;
        } else {
            ++i;
        }
    }
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "get time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

snextc() 版本

{
    auto begin = boost::chrono::high_resolution_clock::now();
    std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary);
    std::filebuf &buf = *fin.rdbuf();
    unsigned maxLength = 0;
    unsigned i = 0;
    while(1) {
        int ch = buf.snextc();
        if(ch == 0x0A || ch == std::char_traits<char>::eof()) {
            maxLength = std::max(i,maxLength);
            i = 0;
            if(ch == std::char_traits<char>::eof())
                break;
        } else {
            ++i;
        }
    }
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "snextc time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

更新:

我在 OS X 上使用 clang (trunk) 和 libc++ 重新运行了测试。基于 iostream 的实现的结果保持相对相同(打开优化); fstream::get()std::getline() 慢得多,比 filebuf::snextc() 慢得多。但 fgetc() 的性能相对于 getline() 实现有所提高,并且变得更快。也许这是因为 getline() 完成的复制成为此工具链的问题,而 MSVC 则不然?也许微软的 fgetc() 的 CRT 实现很糟糕还是什么?

无论如何,这里是时间(我使用了一个更大的文件,5.3 MB):

使用 -Os

fgetc time is: 39004 microseconds
snextc time is: 19374 microseconds
get time is: 145233 microseconds
getline time is: 67316 microseconds

使用 -O0

fgetc time is: 44061 microseconds
snextc time is: 92894 microseconds
get time is: 184967 microseconds
getline time is: 209529 microseconds

-O2

fgetc time is: 39356 microseconds
snextc time is: 21324 microseconds
get time is: 149048 microseconds
getline time is: 63983 microseconds

-O3

fgetc time is: 37527 microseconds
snextc time is: 22863 microseconds
get time is: 145176 microseconds
getline time is: 67899 microseconds

So in a few comments I echoed peoples' answers that the problem was likely the extra copying done by your C++ version, where it copies the lines into memory in a string. But I wanted to test that.

First I implemented the fgetc and getline versions and timed them. I confirmed that in debug mode the getline version is slower, about 130 µs vs 60 µs for the fgetc version. This is unsurprising given conventional wisdom that iostreams are slower than using stdio. However in the past it's been my experience that iostreams get a significant speed up from optimization. This was confirmed when I compared my release mode times: about 20 µs using getline and 48 µs with fgetc.

The fact that using getline with iostreams is faster than fgetc, at least in release mode, runs counter to the reasoning that copying all that data must be slower than not copying it, so I'm not sure what all optimization is able to avoid, and I didn't really look to find any explanation, but it'd be interesting to understand what's being optimized away. edit: when I looked at the programs with a profiler it wasn't obvious how to compare the performance since the different methods looked so different from each other

Anwyay I wanted to see if I could get a faster version by avoiding the copying using the get() method on the fstream object and just do exactly what the C version is doing. When I did this I was quite surprised to find that using fstream::get() was quite a bit slower than both the fgetc and getline methods in both debug and release; About 230 µs in debug, and 80 µs in Release.

To narrow down whatever the slow-down is I went ahead and and did another version, this time using the stream_buf attached to the fstream object, and snextc() method on that. This version is by far the fastest; 25 µs in debug and 6 µs in release.

I'm guessing that the thing that makes the fstream::get() method so much slower is that it constructs a sentry objects for every call. Though I haven't tested this, I can't see that get() does much beyond just getting the next character from the stream_buf, except for these sentry objects.

Anyway, the moral of the story is that if you want fast io you're probably best off using high level iostream functions rather than stdio, and for really fast io access the underlying stream_buf. edit: actually this moral may only apply to MSVC, see update at bottom for results from a different toolchain.

For reference:

I used VS2010 and chrono from boost 1.47 for timing. I built 32-bit binaries (seems required by boost chrono because it can't seem to find a 64 bit version of that lib). I didn't tweak the compile options but they may not be completely standard since I did this in a scratch vs project I keep around.

The file I tested with was the 1.1 MB 20,000 line plain text version of Oeuvres Complètes de Frédéric Bastiat, tome 1 by Frédéric Bastiat from Project Gutenberg, http://www.gutenberg.org/ebooks/35390

Release mode times

fgetc time is: 48150 microseconds
snextc time is: 6019 microseconds
get time is: 79600 microseconds
getline time is: 19881 microseconds

Debug mode times:

fgetc time is: 59593 microseconds
snextc time is: 24915 microseconds
get time is: 228643 microseconds
getline time is: 130807 microseconds

Here's my fgetc() version:

{
    auto begin = boost::chrono::high_resolution_clock::now();
    FILE *cin = fopen("D:/bames/automata/pg35390.txt","rb");
    assert(cin);
    unsigned maxLength = 0;
    unsigned i = 0;
    int ch;
    while(1) {
        ch = fgetc(cin);
        if(ch == 0x0A || ch == EOF) {
            maxLength = std::max(i,maxLength);
            i = 0;
            if(ch==EOF)
                break;
        } else {
            ++i;
        }
    }
    fclose(cin);
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "fgetc time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

Here's my getline() version:

{
    auto begin = boost::chrono::high_resolution_clock::now();
    std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary);
    unsigned maxLength = 0;
    std::string line;
    while(std::getline(fin,line)) {
        maxLength = std::max(line.size(),maxLength);
    }
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "getline time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

the fstream::get() version

{
    auto begin = boost::chrono::high_resolution_clock::now();
    std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary);
    unsigned maxLength = 0;
    unsigned i = 0;
    while(1) {
        int ch = fin.get();
        if(fin.good() && ch == 0x0A || fin.eof()) {
            maxLength = std::max(i,maxLength);
            i = 0;
            if(fin.eof())
                break;
        } else {
            ++i;
        }
    }
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "get time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

and the snextc() version

{
    auto begin = boost::chrono::high_resolution_clock::now();
    std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary);
    std::filebuf &buf = *fin.rdbuf();
    unsigned maxLength = 0;
    unsigned i = 0;
    while(1) {
        int ch = buf.snextc();
        if(ch == 0x0A || ch == std::char_traits<char>::eof()) {
            maxLength = std::max(i,maxLength);
            i = 0;
            if(ch == std::char_traits<char>::eof())
                break;
        } else {
            ++i;
        }
    }
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "snextc time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

update:

I reran the tests using clang (trunk) on OS X with libc++. The results for the iostream based implementations stayed relatively the same (with optimization turned on); fstream::get() much slower than std::getline() much slower than filebuf::snextc(). But the performance of fgetc() improved relative to the getline() implementation and became faster. Perhaps this is because the copying done by getline() becomes an issue with this toolchain whereas it wasn't with MSVC? Maybe Microsoft's CRT implementation of fgetc() is bad or something?

Anyway, here are the times (I used a much larger file, 5.3 MB):

using -Os

fgetc time is: 39004 microseconds
snextc time is: 19374 microseconds
get time is: 145233 microseconds
getline time is: 67316 microseconds

using -O0

fgetc time is: 44061 microseconds
snextc time is: 92894 microseconds
get time is: 184967 microseconds
getline time is: 209529 microseconds

-O2

fgetc time is: 39356 microseconds
snextc time is: 21324 microseconds
get time is: 149048 microseconds
getline time is: 63983 microseconds

-O3

fgetc time is: 37527 microseconds
snextc time is: 22863 microseconds
get time is: 145176 microseconds
getline time is: 67899 microseconds
皇甫轩 2025-01-02 13:35:04

C++ 版本不断分配和释放 std::string 的实例。内存分配是一项代价高昂的操作。除此之外,还会执行构造函数/析构函数。

然而,C 版本使用常量内存,并且只需要这样做:读取单个字符,将行长度计数器设置为新值(如果更高),对于每个换行符,仅此而已。

The C++ version constantly allocates and deallocates instances of std::string. Memory allocation is a costly operation. In addition to that the constructors/destructors are executed.

The C version however uses constant memory, and just does was necessary: Reading in single characters, setting the line-length counter to the new value if higher, for each newline and that's it.

一口甜 2025-01-02 13:35:04

你不是在比较苹果与苹果。您的 C 程序不会将数据从 FILE* 缓冲区复制到程序内存中。它还对原始文件进行操作。

您的 C++ 程序需要多次遍历每个字符串的长度 - 一次在流代码中,以了解何时终止它返回给您的字符串,一次在 std::string 的构造函数中,< Strike> 以及一次在代码中对 s.length() 的调用。

您可以提高 C 程序的性能,例如使用 getc_unlocked(如果您可以使用)。但最大的胜利来自于不必复制数据。

编辑:针对 bames53 的评论进行编辑

You are not comparing apples to apples. Your C program does no copying of data from FILE* buffer into your program's memory. It also operates on raw files.

Your C++ program needs to traverse the length of each string several times - once in the stream code to know when to terminate the string that it returns to you, once in the constructor of std::string, and once in your code's call to s.length().

It is possible that you could improve the performance of your C program, for example by using getc_unlocked if it is available to you. But the biggest win comes from not having to copy your data.

EDIT: edited in response to a comment by bames53

逆蝶 2025-01-02 13:35:04

2 秒只需要 8000 行?我不知道你的队伍排了多久,但很可能你做错了什么。

这个简单的 Python 程序几乎可以立即通过从 Project Gutenberg 下载的 El Quijote 执行(40006 行,2.2MB):

import sys
print max(len(s) for s in sys.stdin)

时机:

~/test$ time python maxlen.py < pg996.txt
76

real    0m0.034s
user    0m0.020s
sys     0m0.010s

您可以通过缓冲输入而不是逐个字符地读取字符来改进 C 代码。

至于为什么C++比C慢,应该与构建字符串对象然后调用length方法有关。在 C 语言中,你只需边计算字符数即可。

2 seconds for just 8.000 lines? I don't know how long your lines are, but the chances are that you are doing something very wrong.

This trivial Python program executes almost instantly with El Quijote downloaded from Project Gutenberg (40006 lines, 2.2MB):

import sys
print max(len(s) for s in sys.stdin)

The timing:

~/test$ time python maxlen.py < pg996.txt
76

real    0m0.034s
user    0m0.020s
sys     0m0.010s

You could improve your C code by buffering the input rather than reading char by char.

About why is the C++ slower than C, it should be related with building the string objects and then calling the length method. In C you are just counting the chars as you go.

物价感观 2025-01-02 13:35:04

我尝试针对 40K 行 C++ 源代码编译并运行您的程序,它们都在大约 25 毫秒左右完成。我只能得出结论,您的输入文件有极长行,每行可能有 10K-100K 个字符。在这种情况下,C 版本不会因长行长度而产生任何负面性能,而 C++ 版本则必须不断增加字符串的大小并将旧数据复制到新缓冲区中。如果必须将大小增加足够多的次数,则可能会导致性能差异过大。

这里的关键是这两个程序不执行相同的操作,因此您无法真正比​​较它们的结果。如果您能够提供输入文件,我们也许能够提供更多详细信息。

您可能可以使用 tellgignore 在 C++ 中更快地完成此操作。

I tried compiling and running your programs against 40K lines of C++ source and they both completed in about 25ms or so. I can only conclude that your input files have extremely long lines, possibly 10K-100K characters per line. In that case the C version doesn't have any negative performance from the long line length while the C++ version would have to keep increasing the size of the string and copying the old data into the new buffer. If it had to increase in size a sufficient number of times that could account for the excessive performance difference.

The key here is that the two programs don't do the same thing so you can't really compare their results. If you were able to provide the input file we might be able to provide additional details.

You could probably use tellg and ignore to do this faster in C++.

鹤仙姿 2025-01-02 13:35:04

C++ 程序构建行的字符串对象,而 C 程序只是读取字符并查看字符。

编辑:

感谢您的投票,但经过讨论,我现在认为这个答案是错误的。这是一个合理的初步猜测,但在这种情况下,不同(并且非常慢)的执行时间似乎是由其他原因引起的。

The C++ program builds string objects of the lines, while the C program just reads characters and looks at the characters.

EDIT:

Thanks for the upvotes, but after the discussion I now think this answer is wrong. It was a reasonable first guess, but in this case it seems that the different (and very slow) execution times are caused by other things.

带上头具痛哭 2025-01-02 13:35:04

我对理论家们的看法没问题。但让我们来实证一下。

我生成了一个包含 1300 万行文本文件的文件供使用。

~$ for i in {0..1000}; do cat /etc/* | strings; done &> huge.txt

编辑原始代码以从 stdin 读取(不应影响太多性能)
差不多2分钟就做好了。

C++ 代码:

#include <iostream>
#include <stdio.h>

using namespace std;

int main(void)
{
    string s = "";
    int size = 0;

    while (cin) {
        getline(cin, s);
        size = (s.length() > size) ? s.length() : size;
    }
    printf("Biggest line in file: %i\n", size);

    return 0;
}

C++ 时间:

~$ time ./cplusplus < huge.txt
real    1m53.122s
user    1m29.254s
sys     0m0.544s

“C”版本:

#include <stdio.h>
int main(void)
{
    char *line = NULL;
    int read, max = 0, len = 0;

    while ((read = getline(&line, &len, stdin)) != -1)
        if (max < read)
            max = read -1;
    printf("Biggest line in file %d\n", max);

    return 0;
}

C 性能:

~$ time ./ansic < huge.txt
real    0m4.015s
user    0m3.432s
sys     0m0.328s

自己算算...

I'm alright with the theory folks. But let's get empirical.

I generated a file with 13 million lines of text file to work with.

~$ for i in {0..1000}; do cat /etc/* | strings; done &> huge.txt

The original code edited to read from stdin (shouldn't affect too much the performance)
made it in almost 2 min.

C++ code:

#include <iostream>
#include <stdio.h>

using namespace std;

int main(void)
{
    string s = "";
    int size = 0;

    while (cin) {
        getline(cin, s);
        size = (s.length() > size) ? s.length() : size;
    }
    printf("Biggest line in file: %i\n", size);

    return 0;
}

C++ time:

~$ time ./cplusplus < huge.txt
real    1m53.122s
user    1m29.254s
sys     0m0.544s

A 'C' version:

#include <stdio.h>
int main(void)
{
    char *line = NULL;
    int read, max = 0, len = 0;

    while ((read = getline(&line, &len, stdin)) != -1)
        if (max < read)
            max = read -1;
    printf("Biggest line in file %d\n", max);

    return 0;
}

C performance:

~$ time ./ansic < huge.txt
real    0m4.015s
user    0m3.432s
sys     0m0.328s

Do your own math...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文