如何提高我的 C++ 的速度读取分隔文本文件的程序?

发布于 2024-11-30 01:49:17 字数 2235 浏览 1 评论 0原文

我向您展示执行相同作业的 C# 和 C++ 代码:读取由“|”分隔的相同文本文件并用“#”分隔的文本保存。

当我执行C++程序时,经过的时间是169秒。

更新 1:感谢 Seth(编译:cl /EHsc /Ox /Ob2 /Oi)和 GWW 更改循环外字符串 s 的位置,所用时间减少到 53 秒。我也更新了代码。

更新 2:您还有其他增强 C++ 代码的建议吗?

当我执行C#程序时,经过的时间是34秒!

问题是,与C#相比,如何提高C++的速度呢?

C++ 程序:

int main ()
{
    Timer t;
    cout << t.ShowStart() << endl;

    ifstream input("in.txt");
    ofstream output("out.txt", ios::out);
    char const row_delim = '\n';
    char const field_delim = '|';
    string s1, s2;

    while (input)
    {
        if (!getline( input, s1, row_delim ))
            break;
        istringstream iss(s1);
        while (iss)
        {
            if (!getline(iss, s2, field_delim ))
                break;
            output << s2 << "#";
        }
        output << "\n";
    }

    t.Stop();
    cout << t.ShowEnd() << endl;
    cout << "Executed in: " << t.ElapsedSeconds() << " seconds." << endl;
    return 0;
}

C# 程序:

    static void Main(string[] args)
    {
        long i;
        Stopwatch sw = new Stopwatch();
        Console.WriteLine(DateTime.Now);
        sw.Start();
        StreamReader sr = new StreamReader("in.txt", Encoding.Default);
        StreamWriter wr = new StreamWriter("out.txt", false, Encoding.Default);
        object[] cols = new object[0];  // allocates more elements automatically when filling
        string line;
        while (!string.Equals(line = sr.ReadLine(), null)) // Fastest way
        {
        cols = line.Split('|');  // Faster than using a List<>
        foreach (object col in cols)
            wr.Write(col + "#");
        wr.WriteLine();
        }
        sw.Stop();
        Console.WriteLine("Conteo tomó {0} secs", sw.Elapsed);
        Console.WriteLine(DateTime.Now);
    }

更新 3:

嗯,我必须说我对收到的帮助感到非常高兴,因为我的问题的答案已经得到满足。

我对问题的文字做了一些修改,使其更加具体,并测试了 Molbdlino 和 Bo Persson 提出的解决方案。

保留编译命令的 Seth 指示(即 cl /EHsc /Ox /Ob2 /Oi pgm.cpp):

Bo Persson 的解决方案平均需要 18 秒才能完成执行,考虑到代码接近于我喜欢)。

Molbdlino 解决方案平均需要 6 秒,真的太神奇了! (也感谢康斯坦丁)。

学习永远不会太晚,我从我的问题中学到了宝贵的东西。

我致以最诚挚的问候。

I show you C# and C++ code that execute the same job: to read the same text file delimited by “|” and save with “#” delimited text.

When I execute C++ program, the time elapsed is 169 seconds.

UPDATE 1: Thanks to Seth (compilation with: cl /EHsc /Ox /Ob2 /Oi) and GWW for changing the positions of string s outside the loops, the elapsed time was reduced to 53 seconds. I updated the code also.

UPDATE 2: Do you have any other suggestion to enhace the C++ code?

When I execute the C# program, the elapsed time is 34 seconds!

The question is, how can I enhance the speed of C++ comparing with the C# one?

C++ Program:

int main ()
{
    Timer t;
    cout << t.ShowStart() << endl;

    ifstream input("in.txt");
    ofstream output("out.txt", ios::out);
    char const row_delim = '\n';
    char const field_delim = '|';
    string s1, s2;

    while (input)
    {
        if (!getline( input, s1, row_delim ))
            break;
        istringstream iss(s1);
        while (iss)
        {
            if (!getline(iss, s2, field_delim ))
                break;
            output << s2 << "#";
        }
        output << "\n";
    }

    t.Stop();
    cout << t.ShowEnd() << endl;
    cout << "Executed in: " << t.ElapsedSeconds() << " seconds." << endl;
    return 0;
}

C# program:

    static void Main(string[] args)
    {
        long i;
        Stopwatch sw = new Stopwatch();
        Console.WriteLine(DateTime.Now);
        sw.Start();
        StreamReader sr = new StreamReader("in.txt", Encoding.Default);
        StreamWriter wr = new StreamWriter("out.txt", false, Encoding.Default);
        object[] cols = new object[0];  // allocates more elements automatically when filling
        string line;
        while (!string.Equals(line = sr.ReadLine(), null)) // Fastest way
        {
        cols = line.Split('|');  // Faster than using a List<>
        foreach (object col in cols)
            wr.Write(col + "#");
        wr.WriteLine();
        }
        sw.Stop();
        Console.WriteLine("Conteo tomó {0} secs", sw.Elapsed);
        Console.WriteLine(DateTime.Now);
    }

UPDATE 3:

Well, I must say I am very happy for the help received and because the answer to my question has been satisfied.

I changed the text of the question a little to be more specific, and I tested the solutions that kindly raised Molbdlino and Bo Persson.

Keeping Seth indications for the compile command (i.e. cl /EHsc /Ox /Ob2 /Oi pgm.cpp):

Bo Persson's solution took 18 seconds on average to complete the execution, really a good one taking in account that the code is near to what I like).

Molbdlino solution took 6 seconds on average, really amazing!! (thanks to Constantine also).

Never too late to learn, and I learned valuable things with my question.

My best regards.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

半寸时光 2024-12-07 01:49:17

正如 Constantine 建议的那样,使用 read 一次读取大块。

我将一个 129M 文件、100,000 行、5M“条目”(每个条目 26 字节)的时间从约 25 秒缩短到约 3 秒。

#include <iostream>
#include <fstream>
#include <sstream>
#include <algorithm>

using namespace std;

int main ()
{
  ifstream input("in.txt");
  ofstream output("out.txt", ios::out);

  const size_t size = 512 * 1024;
  char buffer[size];

  while (input) {
    input.read(buffer, size);
    size_t readBytes = input.gcount();
    replace(buffer, buffer+readBytes, '|', '#');
    output.write(buffer, readBytes);
  }
  input.close();
  output.close();

  return 0;
}

As Constantine suggests, read large chunks at a time using read.

I cut the time from ~25s to ~3s on a 129M file with 5M "entries" (26 bytes each) in 100,000 lines.

#include <iostream>
#include <fstream>
#include <sstream>
#include <algorithm>

using namespace std;

int main ()
{
  ifstream input("in.txt");
  ofstream output("out.txt", ios::out);

  const size_t size = 512 * 1024;
  char buffer[size];

  while (input) {
    input.read(buffer, size);
    size_t readBytes = input.gcount();
    replace(buffer, buffer+readBytes, '|', '#');
    output.write(buffer, readBytes);
  }
  input.close();
  output.close();

  return 0;
}
淑女气质 2024-12-07 01:49:17

对于中央循环来说怎么样

while (getline( input, s1, row_delim ))
{
    for (string::iterator c = s1.begin(); c != s1.end(); ++c)
        if (*c == field_delim)
            *c = '#';

    output << s1 << '\n';
}

How about this for the central loop

while (getline( input, s1, row_delim ))
{
    for (string::iterator c = s1.begin(); c != s1.end(); ++c)
        if (*c == field_delim)
            *c = '#';

    output << s1 << '\n';
}
左耳近心 2024-12-07 01:49:17

在我看来,你的慢速部分是在 getline 内。我没有精确的文档来支持我的想法,但这就是我的感受。您应该尝试使用 read 来代替。因为getline有分隔符,所以它需要检查每个符号是否找到分隔符,这样看起来就像多个in操作,所以你的程序访问一个符号一个文件,然后将其写入程序的内存中,换句话说,就是磁头移动所消耗的时间。但是如果您使用read函数,您将复制符号块,然后在程序内存中使用它们,这可能会减少耗时。

再次强调一下,我没有关于 getline 及其工作原理的文档,但我确定 read ,希望它有帮助。

It seems to me that Your slow part is within getline. I don't have precise documentation which would support my idea, but it's how it feels for me. You should try using read instead. Because getline has the delimiter, so it need to check every symbol whether it has found the delimiter symbol, so that looks like multiple in operations, so Your program accesses a symbol in a file, then write it to the memory of your program, in other words, the time consumed on disk head movement. But if You use read function, You will copy the block of symbols and then work with them within program's memory, that may reduce time consuming.

PS again, I don't have documentation about getline and how it works, but I'm sure about read, hope it is helpful.

风启觞 2024-12-07 01:49:17

如果您知道最大行长度,您可以使用 stdio+fgets 并以 null 结尾弦,它会摇滚。

对于 C# 来说,如果它适合内存(如果需要 34 秒,则可能不适合),我很好奇如何 IO.File.WriteAllText("out.txt",IO.File.ReadAllText("in.txt") ").Replace("|","#")); 执行!

If you know the max line length you can your stdio+fgets and null terminated strings, it will rock.

For c# if it will fit in memory (probably not if it takes 34 sec) I'd be curious to see how IO.File.WriteAllText("out.txt",IO.File.ReadAllText("in.txt").Replace("|","#")); performs!

情何以堪。 2024-12-07 01:49:17

如果这击败了 @molbdnilo 的版本,我会感到非常惊讶,但它可能是第二快的,并且(我认为)最简单和最干净的:

#include <fstream>
#include <string>
#include <sstream>
#include <algorithm>

int main() {
    std::ifstream in("in.txt");
    std::ostringstream buffer;
    buffer << in.rdbuf();
    std::string s(buffer.str());
    std::replace(s.begin(), s.end(), '|', '#');
    std::ofstream out("out.txt");
    out << s;
    return 0;
}

根据过去使用此方法的经验,我预计它不会比一半差@molbdnilo 发布的速度——它仍然应该是 C# 版本速度的三倍左右,是 C++ 原始版本速度的十倍以上。 [编辑:我刚刚编写了一个文件生成器,在一个略多于 100 MB 的文件上,它比我预期的还要接近——我得到了 4.4 秒,而 @molbdnilo 的代码为 3.5 秒。] 合理的速度与真正的速度相结合简短的代码通常是一个相当不错的权衡。当然,这一切都取决于您有足够的物理 RAM 来将整个文件内容保存在内存中,但这通常是一个相当安全的假设。

I'd be really surprised if this beat @molbdnilo's version, but it's probably the second fastest, and (I would posit) the simplest and cleanest:

#include <fstream>
#include <string>
#include <sstream>
#include <algorithm>

int main() {
    std::ifstream in("in.txt");
    std::ostringstream buffer;
    buffer << in.rdbuf();
    std::string s(buffer.str());
    std::replace(s.begin(), s.end(), '|', '#');
    std::ofstream out("out.txt");
    out << s;
    return 0;
}

Based on past experience with this method, I'd expect it to be no worse than half the speed of what @molbdnilo posted -- which should still be around triple the speed of your C# version, and over ten times as fast as your original version in C++. [Edit: I just wrote a file generator, and on a file a little over 100 megabytes, it's even closer than I expected -- I'm getting 4.4 seconds, versus 3.5 for @molbdnilo's code.] The combination of reasonable speed with really short, simple code is often quite a decent trade-off. Of course, that's all predicated on your having enough physical RAM to hold the entire file content in memory, but that's generally a fairly safe assumption these days.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文