c++文本文件读取性能

发布于 2024-11-30 19:18:27 字数 1035 浏览 1 评论 0原文

我正在尝试将 ac# 程序迁移到 c++。 C#程序逐行读取1~5GB大小的文本文件，并对每一行进行一些分析。 C# 代码如下所示。

using (var f = File.OpenRead(fname))
using (var reader = new StreamReader(f))
    while (!reader.EndOfStream) {
        var line = reader.ReadLine();
        // do some analysis
    }

对于给定的包含 700 万行的 1.6 GB 文件，此代码大约需要 18 秒。

我首先编写的用于迁移的 C++ 代码如下所示

ifstream f(fname);
string line;    
while (getline(f, line)) {
    // do some analysis
}

上面的 C++ 代码大约需要 420 秒。我写的第二个C++代码如下。

ifstream f(fname);
char line[2000];
while (f.getline(line, 2000)) {
    // do some analysis
}

上面的c++大约需要85秒。

我尝试的最后一个代码是c代码，如下所示。

FILE *file = fopen ( fname, "r" );
char line[2000];
while (fgets(line, 2000, file) != NULL ) {
    // do some analysis
}
fclose ( file );

上面的c代码大约需要33秒。

最后两段代码都将行解析为 char[] 而不是字符串，需要大约 30 秒才能将 char[] 转换为字符串。

有没有办法提高 c/c++ 代码的性能以逐行读取文本文件以匹配 c# 性能？（补充：我使用的是 Windows 7 64 位操作系统和 VC++ 10.0，x64）

原文

I'm trying to migrate a c# program to c++.
The c# program reads a 1~5 gb sized text file line by line and does some analysis on each line.
The c# code is like below.

using (var f = File.OpenRead(fname))
using (var reader = new StreamReader(f))
    while (!reader.EndOfStream) {
        var line = reader.ReadLine();
        // do some analysis
    }

For a given 1.6 gb file with 7 million lines, this code takes about 18 seconds.

The c++ code I wrote first to migrate is like below

ifstream f(fname);
string line;    
while (getline(f, line)) {
    // do some analysis
}

The c++ code above takes about 420 seconds. The second c++ code I wrote is like below.

ifstream f(fname);
char line[2000];
while (f.getline(line, 2000)) {
    // do some analysis
}

The c++ above takes about 85 seconds.

The last code I tried is c code, like below.

FILE *file = fopen ( fname, "r" );
char line[2000];
while (fgets(line, 2000, file) != NULL ) {
    // do some analysis
}
fclose ( file );

The c code above takes about 33 seconds.

Both of the last 2 codes, which parse the lines into char[] instead of string, need about 30 seconds more to convert char[] to string.

Is there a way to improve the performance of c/c++ code to read a text file line by line to match the c# performance?
(Added : I'm using windows 7 64 bit OS with VC++ 10.0, x64)

分享到QQ

分享到微博