为什么 C++ 的速度这么慢？

发布于 2024-12-10 11:12:21 字数 3121 浏览 0 评论 0原文

我已将这个简单的方法从 C# 转换为 C++。它读取路径表并填充整数列表的列表（或整数向量的向量）。

路径表中的示例行就像

0 12 5 16 n

我意识到一般有更好的方法可以做到这一点，但现在我只想知道为什么我的 C++ 代码采取如此更长。例如，C# 版本需要 10 分钟，而 C# 版本需要 10 秒。这是我的 C++ 代码。我猜我做了一些非常错误的事情。

//Parses the text path vector into the engine
void Level::PopulatePathVectors(string pathTable)
{
    // Read the file line by line.
    ifstream myFile(pathTable);

        for (unsigned int i = 0; i < nodes.size(); i++)
        {
            pathLookupVectors.push_back(vector<vector<int>>());

            for (unsigned int j = 0; j < nodes.size(); j++)
            {
                string line;

                if (getline(myFile, line)) //Enter if a line is read successfully
                {
                    stringstream ss(line);
                    istream_iterator<int> begin(ss), end;
                    pathLookupVectors[i].push_back(vector<int>(begin, end));
                }
            }
        }
    myFile.close();
}

这是 C# 版本：

private void PopulatePathLists(string pathList)
{
    // Read the file and display it line by line.
    StreamReader streamReader = new StreamReader(pathList);

    for (int i = 0; i < nodes.Count; i++)
    {
        pathLookupLists.Add(new List<List<int>>());

        for (int j = 0; j < nodes.Count; j++)
        {
            string str = streamReader.ReadLine();
            pathLookupLists[i].Add(new List<int>());

            //For every string (list of ints) - put each one into these lists
            int count = 0;
            string tempString = "";

            while (str[count].ToString() != "n") //While character does not equal null terminator
            {
                if (str[count].ToString() == " ") //Character equals space, set the temp string 
                                                  //as the node index, and move on
                {
                    pathLookupLists[i][j].Add(Convert.ToInt32(tempString));
                    tempString = "";
                }
                else //If characters are adjacent, put them together
                {
                    tempString = tempString + str[count];
                }
                count++;
            }
        }
    }
    streamReader.Close();
}

抱歉，这太具体了，但我很困惑。

编辑-很多人都说他们已经测试了这段代码，而且他们只需要几秒钟。我所知道的是，如果我注释掉对此函数的调用，程序将在几秒钟内加载。函数调用需要 5 分钟。几乎完全一样。我真的很困惑。问题可能是什么？

这是它正在使用的 PathTable 。

编辑 - 我尝试在程序中单独运行该函数，花了几秒钟，但恐怕我不知道如何解决这个问题。显然这不是代码。可能是什么？我检查了它的调用位置，看看是否有多个调用，但没有。它位于游戏关卡的构造函数中，并且仅被调用一次。

编辑 - 我知道代码并不是最好的，但这不是重点。它自己运行得很快——大约 3 秒，这对我来说很好。我试图解决的问题是为什么项目内部花费了如此长的时间。

编辑-我注释掉了除主游戏循环之外的所有游戏代码。我将该方法放入代码的初始化部分，该部分在启动时运行一次。除了设置窗口的一些方法之外，它现在与仅包含该方法的程序几乎相同，只是它仍然需要大约 5 分钟才能运行。现在我知道它与 pathLookupVectors 的依赖关系无关。另外，我知道计算机开始写入硬盘驱动器不是内存问题，因为当缓慢的程序运行该方法时，我可以打开 Visual Studio 的另一个实例并同时运行单个方法程序，从而完成几秒钟内。我意识到问题可能是一些基本设置，但我没有经验，所以如果这确实令人失望地成为原因，我深表歉意。我仍然不明白为什么要花这么长时间。

原文

I have converted this simple method from C# to C++. It reads a path table and populates a list of lists of ints (or a vector of vectors of ints).

A sample line from the path table would be something like

0 12 5 16 n

I realise there are better ways of doing this in general, but for now I just want to know why my C++ code is taking so much longer. e.g. 10 minutes as opposed to 10 seconds with the C# version. Here is my C++ code. I'm guessing I've done something a bit drastically wrong.

//Parses the text path vector into the engine
void Level::PopulatePathVectors(string pathTable)
{
    // Read the file line by line.
    ifstream myFile(pathTable);

        for (unsigned int i = 0; i < nodes.size(); i++)
        {
            pathLookupVectors.push_back(vector<vector<int>>());

            for (unsigned int j = 0; j < nodes.size(); j++)
            {
                string line;

                if (getline(myFile, line)) //Enter if a line is read successfully
                {
                    stringstream ss(line);
                    istream_iterator<int> begin(ss), end;
                    pathLookupVectors[i].push_back(vector<int>(begin, end));
                }
            }
        }
    myFile.close();
}

Here is the C# version:

private void PopulatePathLists(string pathList)
{
    // Read the file and display it line by line.
    StreamReader streamReader = new StreamReader(pathList);

    for (int i = 0; i < nodes.Count; i++)
    {
        pathLookupLists.Add(new List<List<int>>());

        for (int j = 0; j < nodes.Count; j++)
        {
            string str = streamReader.ReadLine();
            pathLookupLists[i].Add(new List<int>());

            //For every string (list of ints) - put each one into these lists
            int count = 0;
            string tempString = "";

            while (str[count].ToString() != "n") //While character does not equal null terminator
            {
                if (str[count].ToString() == " ") //Character equals space, set the temp string 
                                                  //as the node index, and move on
                {
                    pathLookupLists[i][j].Add(Convert.ToInt32(tempString));
                    tempString = "";
                }
                else //If characters are adjacent, put them together
                {
                    tempString = tempString + str[count];
                }
                count++;
            }
        }
    }
    streamReader.Close();
}

Sorry this is so specific, but I'm stumped.

EDIT - A lot of people have said they have tested this code, and it takes mere seconds for them. All I know is, if I comment out the call to this function, the program loads in seconds. With the function call it takes 5 minutes. Almost exactly. I'm really stumped. What could the problem be?

Here is the PathTable it's using.

EDIT - I tried running the function in a program on its own, and it took a few seconds, but I'm afraid I don't know enough to be able to know how to fix this problem. Obviously it's not the code. What could it be? I checked where it's being called to see if there were multiple calls, but there aren't. It's in a constructor of the game's level and that is only called once.

EDIT - I understand that the code is not the best it could be, but that isn't the point here. It runs quickly on its own - about 3 seconds and that's fine for me. The problem I'm trying to solve is why it takes so much longer inside the project.

EDIT - I commented out all of the game code apart from the main game loop. I placed the method into the initialize section of the code which is run once on start up. Apart from a few methods setting up a window it's now pretty much the same as the program with ONLY the method in, only it STILL takes about 5 minutes to run. Now I know it has nothing to do with dependencies on the pathLookupVectors. Also, I know it's not a memory thing where the computer starts writing to the hard drive because while the slow program is chugging away running the method, I can open another instance of Visual Studio and run the single method program at the same time which completes in seconds. I realise that the problem might be some basic settings, but I'm not experienced so apologies if this does disappointingly end up being the reason why. I still don't have a clue why it's taking so much longer.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

執念 2024-12-17 11:12:21

我用 Very Sleepy (Visual C++ 2010，32 位 Windows XP）。我不知道我的输入数据有多相似，但无论如何，结果如下：

39% 的时间花在 basic_istream::operator>> 上
12% basic_iostream::basic_iostream
9% 操作员+
8% _Mutex::互斥体
5% 获取线
5% basic_stringbuf::_Init
4% 语言环境::_Locimp::_Addfac
4% 向量::保留
4% basic_string::分配
3% 操作员删除
2% basic_Streambuf::basic_streambuf
1% Wcsxfrm
5% 其他功能

有些内容似乎来自内联调用，因此很难说它实际上来自哪里。但你仍然可以明白这个想法。这里唯一应该执行 I/O 的是 getline，而且只占用 5%。其余的都是流和字符串操作的开销。 C++ 流慢得要命。

回复收藏 0 原文

亚希 2024-12-17 11:12:21

根据您的更新，很明显您发布的功能本身不会导致性能问题，因此虽然您可以通过多种方法对其进行优化，但这似乎没有帮助。

我想您每次运行代码时都可以重现这个性能问题，对吗？那么我想建议你做以下测试：

如果你在调试模式下编译你的程序（即没有优化），那么重新编译发布（完全优化，有利于速度），看看这是否会产生影响。
要检查是否在这个可疑函数上花费了额外的时间，您可以在函数的开头和结尾添加包含时间戳的 printf 语句。如果这不是控制台应用程序，而是 GUI 应用程序，并且 printfs 不会去任何地方，则写入日志文件。如果您使用的是 Windows，则也可以使用 OutputDebugString 并使用调试器捕获 printfs。如果您使用的是 Linux，则可以使用 syslog 写入系统日志。
使用源代码分析器来确定所有时间都花在哪里了。如果调用此函数与不调用此函数之间的差异是几分钟，那么探查器肯定会提供有关正在发生的情况的线索。如果您使用的是 Windows，那么 Very Sleepy 是一个不错的选择，如果您使用的是 Linux，则可以使用OProfile。

更新：所以你说发布版本很快。这可能意味着您在此函数中使用的库函数的调试实现速度很慢。众所周知，STL 就是这样。

我确信您需要调试应用程序的其他部分，并且您不想等待所有这些分钟来让此函数在调试模式下完成。此问题的解决方案是在发布模式下构建项目，但按以下方式更改发布配置：

仅对要调试的文件禁用优化（确保优化保持启用状态）至少对于具有慢速功能的文件）。要禁用文件优化，请在解决方案资源管理器中选择该文件，右键单击，选择属性，然后转到配置属性|C/C++/优化。查看该页面中的所有项目是如何为调试版本设置的，然后复制发布版本中的所有项目。对您希望调试器可用的所有文件重复此操作。
启用调试信息（pdb 文件）生成。为此，请选择解决方案资源管理器顶部的项目，右键单击，选择属性。然后转到“配置属性|链接器|调试”并将所有设置从“调试”版本复制到“发布”版本中。

通过上述更改，您将能够调试如上所述配置的发布二进制文件的部分，就像在调试版本中所做的那样。

当然，完成调试后，您将需要重置所有这些设置。

我希望这有帮助。

Based on your update it is pretty clear that the function you posted by itself is not causing the performance problem, so while there are many ways in which you can optimize it it seems that is not going to help.

I presume you can reproduce this performance problem every time you run your code, correct? Then I would like to suggest that you do the following tests:

if you are compiling your program in debug mode (i.e. no optimizations), then recompile for release (full optimizations, favoring speed) and see if that makes a difference.
To check if the extra time is spent on this suspected function you can add printf statements at the start and end of the function that include timestamps. If this is not a console app but a GUI app and printfs are not going anywhere, then write to a log file. If you are on Windows, you can alternatively use OutputDebugString and use a debugger to capture the printfs. If you are on Linux, you can write to the system log using syslog.
Use a source code profiler to determine where is all that time spent. If the difference between calling this function or not is several minutes, then a profiler will surely give a clue as to what is happening. If you are on Windows, then Very Sleepy is a good choice, and if you are on Linux you can use OProfile.

Update: So you say that a release build is fast. That likely means that the library functions that you use in this function have slow debug implementations. The STL is know to be that way.

I'm sure you need to debug other parts of your application and you don't want to wait all those minutes for this function to complete in debug mode. The solution to this problem is to build your project in release mode, but change the release configuration in the following way:

disable optimizations only for the files you want to debug (make sure optimizations remain enabled at least for the file that has the slow function). To disable optimizations on a file, select the file in the Solution Explorer, right click, select Properties, then go to Configuration Properties|C/C++/Optimization. Look at how all the items in that page are set for the Debug build, and copy all of those in your Release build. Repeat for all the files that you want to be available to the debugger.
enable debugging info (the pdb file) to be generated. To do this, select the Project at the top of the Solution Explorer, right click, select Properties. Then go to Configuration Properties|Linker|Debugging and copy all the settings from the Debug build into the Release build.

With the above changes you will be able to debug the parts of the release binary that were configured as above just like you do it in the debug build.

Once you are done debugging you will need to reset all those settings back, of course.

I hope this helps.

回复收藏 0 原文

帝王念 2024-12-17 11:12:21

代码中的 while 循环似乎非常混乱且漫长，因为它以不需要的方式做事：

一个简单而快速的等效代码如下：

int result;
stringstream ss(line);
while ( ss >> result ) //reads all ints untill it encounters non-int
{
    pathLookupVectors[i][j].push_back(result);
}

在 C++ 中，这样的循环是也是惯用语。来代替此手动循环

std::copy(std::istream_iterator<int>( ss ), 
          std::istream_iterator<int>(), 
          std::back_inserter(pathLookupVectors[i][j]));

或者，您可以编写 use std::copy ¹: ^{1 。它取自 @David 的评论。}

或者，如果您这样做，当您 push_back 向量本身时，效果会更好：

 if (getline(myFile, line)) //enter if a line is read successfully
 {
   stringstream ss(line);
   std::istream_iterator<int> begin(ss), end;
   pathLookupVectors[i].push_back(vector<int>(begin, end));
 }

完成！

The whileloop in your code seems to be very messy and long, as it is doing things in a way which is not needed:

A simple and fast equivalent code would be this:

int result;
stringstream ss(line);
while ( ss >> result ) //reads all ints untill it encounters non-int
{
    pathLookupVectors[i][j].push_back(result);
}

In C++, such loop is idiomatic as well. Or instead of this manual loop, you could write use std::copy ¹:

std::copy(std::istream_iterator<int>( ss ), 
          std::istream_iterator<int>(), 
          std::back_inserter(pathLookupVectors[i][j]));

^{1. It is taken from @David's comment.}

Or even better if you do this, when you push_back the vector itself:

 if (getline(myFile, line)) //enter if a line is read successfully
 {
   stringstream ss(line);
   std::istream_iterator<int> begin(ss), end;
   pathLookupVectors[i].push_back(vector<int>(begin, end));
 }

Done!

回复收藏 0 原文

一束光，穿透我孤独的魂 2024-12-17 11:12:21

我不太确定这里发生了什么，但我看到了一些可以优化代码的方法。如果这不能让你到达那里，那么可能还有其他事情发生。

你的弦有多大？当您在 C++ 版本中传递它们时，您正在制作副本，因为您是“按值传递”。尝试通过常量引用传递它：

void Level::PopulatePathVectors(const string &pathTable)

这通过引用传递对象，这意味着它不会复制。然后，通常将其设置为 const 以确保它不会在您的函数中被修改。

使用 .append 或 += 扩展 tempString。我相信您正在创建一个新的字符串对象，然后用 + 替换旧的字符串对象，而 += 和 .append 将进行修改当前的变量：

tempString.append(line[count]);

您还可以通过在顶部声明变量然后重新分配给它们来调整更多的性能。这将防止它们每次都被重新创建。例如，将 string line; 放在 for 循环之前，因为无论如何它都会被覆盖。

您可以在几个地方执行此操作，例如使用 tempString。

I'm not exactly sure what is going on here, but I see a few ways in which you can optimize your code. If this doesn't get you there, then there might be something else going on.

How big are your strings? As you are passing them in your C++ version, you are making copies because you are "passing by value". Try passing it by constant reference:

void Level::PopulatePathVectors(const string &pathTable)

This passes the object by reference, meaning it is not making a copy. Then, it is customary to make it const to ensure that it is not getting modified in your function.

Use .append or += to extend tempString. I believe you are making a new string object, then replacing the old one with just +, while += and .append are going to modify the current one in place:

tempString.append(line[count]);

You can also tweak out a bit more performance by declaring your variables at the top and then reassigning into them. This will prevent them from getting recreated every time. For example, put string line; before your for-loop, because it's going to get overwritten anyways.

There are a few places you can do this, such as with tempString.

回复收藏 0 原文

绝不服输 2024-12-17 11:12:21

这里有一些我没有看到其他人提到过的事情。它们有些模糊，但无法重现事物使得很难详细说明所有内容。

穷人的分析。

当代码运行时，只要继续中断它即可。通常您会一遍又一遍地看到相同的堆栈帧。

开始评论一些东西。如果您注释掉拆分并且它立即完成，那么从哪里开始就很清楚了。

某些代码是相关的，但您可以将完整文件读入内存，然后进行解析，以在其花费时间的位置上创建明显的分离。如果两者都独立快速完成，那么很可能是交互。

缓冲。

我没有看到您的读取有任何缓冲。如果您要将任何内容写入磁盘，这一点就变得尤为重要。磁盘上的机械臂将在读取位置、写入位置等之间来回跳转。

虽然看起来不像您正在此处写入，但您的主程序可能正在使用更多内存。达到最高水位后，操作系统可能会开始将部分内存分页到磁盘。当您在分页的同时逐行阅读时，您会感到混乱。

通常，我会设置一个简单的迭代器接口来验证一切是否正常。然后在它周围编写一个装饰器以一次读取 500 行。标准流还内置了一些缓冲选项，这些选项可能更好用。我猜他们的缓冲默认值相当保守。

保留。

当您同时使用 std::vector::reserve 时，std::vector::push_back 效果最佳。如果您可以在进入紧密循环之前使大部分内存可用，那么您就赢了。您甚至不必知道具体多少，只需猜测即可。

~~实际上，您也可以用它来击败 std::vector::resize 性能，因为 std::vector::resize 使用 alloc 和 std::vector::push_back 将使用 realloc~~

最后一点是有争议的，尽管我读过其他内容。我没有理由怀疑我错了，尽管我需要做更多的研究来证实或否认。

尽管如此，如果您使用reserve，push_back 可以运行得更快。

字符串分割。

我从未见过在处理 GB+ 文件时表现出色的 C++ 迭代器解决方案。不过，我还没有具体尝试过这一点。我的猜测是他们倾向于进行大量小额分配。

这是我平时使用的参考。

将字符数组拆分为两个数组chars

关于 std::vector::reserve 的建议适用于此处。

出于维护考虑，我更喜欢使用 boost::lexical_cast 来流实现，尽管我不能说它比流实现的性能更高或更低。我想说的是，真正看到对流使用的正确错误检查是极其罕见的。

STL 恶作剧。

我故意在这些方面含糊其辞，抱歉。我通常会编写避免这些条件的代码，尽管我确实记得同事告诉我的一些考验和磨难。使用 STLPort 完全避免了其中的很大一部分。

在某些平台上，使用流操作默认启用一些奇怪的线程安全性。所以我看到少量的 std::cout 使用绝对会破坏算法的性能。这里没有任何内容，但如果您在另一个线程中进行日志记录，则可能会引起问题。我在另一条评论中看到 8% _Mutex::Mutex ，这可能说明了它的存在。

退化的 STL 实现甚至可能在词法解析流操作方面出现上述问题，这似乎是合理的。

某些容器存在奇怪的性能特征。我从来没有遇到过向量问题，但我真的不知道 istream_iterator 内部使用什么。例如，过去，我通过一种行为不当的算法进行追踪，找到一个使用 GCC 完全遍历列表的 std::list::size 调用。我不知道新版本是否不那么愚蠢。

通常愚蠢的 SECURE_CRT 愚蠢行为应该被愚蠢地处理。我想知道这是否是微软认为我们想要花时间做的事情？

Here are a few things that I haven't seen anyone else mention. They are somewhat vague, but being unable to reproduce things makes it hard to go into specifics on all of it.

Poor man's profiling.

While the code is running, just keep interrupting it. Usually you'll see the same stack frame over and over.

Start commenting stuff out. If you comment out your splitting and it completes instantly, then its pretty clear where to start.

Some of the code is dependent, but you could read the full file into memory then do the parsing to create an obvious separation on where its spending its time. If both finish quickly independently, then it's probably interaction.

Buffering.

I don't seen any buffering on your reads. This becomes especially important if you are writing anything to disk. The arm on your disk will jump back and forth between your read location, then write location, etc.

While it doesn't look like you are writing here, your main program may have more memory being used. It is possible that after you reach your high water, the OS starts paging some of the memory to disk. You'll thrash when you are reading line by line while the paging is happening.

Usually, I'll set up a simple iterator interface to verify everything is working. Then write a decorator around it to read 500 lines at a time. The standard streams have some buffering options built in as well, and those may be better to use. I'm going to guess that their buffering defaults are pretty conservative.

Reserve.

std::vector::push_back works best when you also use std::vector::reserve. If you can make most of the memory is available before entering a tight loop, you win. You don't even have to know exactly how much, just guess.

~~You can actually beat std::vector::resize performance with this as well, because std::vector::resize uses alloc and std::vector::push_back will use realloc~~

That last bit is contested, though I've read otherwise. I have no reason to doubt that I'm wrong, though I will have to do more research to confirm or deny.

Nevertheless, push_back can run faster if you use reserve with it.

String splitting.

I've never seen a C++ iterator solution that was performant when it comes to dealing with gb+ files. I haven't tried that one specifically, though. My guess at why is that they tend to make a lot of small allocations.

Here is a reference with what I usually use.

Split array of chars into two arrays of chars

Advice on std::vector::reserve applies here.

I prefer boost::lexical_cast to stream implementations for maintenance concerns, though I can't say its more or less performant than stream implementations. I will say it is exceedingly rare to actually see correct error checking on stream usage.

STL shenanigans.

I'm intentionally vague on these, sorry. I usually write code that avoids the conditions, though I do remember some of the trials and tribulations that co-workers have told me about. Using STLPort avoids a good chunk of these entirely.

On some platforms, using stream operations have some weird thread safety enabled by default. So I've seen minor std::cout usage absolutely destroy an algorithm's performance. You don't have anything here, but if you had logging going on in another thread that could pose problems. I see a 8% _Mutex::Mutex in another comment, which may speak to its existence.

It's plausible that a degenerate STL implementation could even have the above issue with the lexical parsing stream operations.

There are odd performance characteristics on some of the containers. I don't I ever had problems with vector, but I really have no idea what istream_iterator uses internally. In the past, I've traced through an misbehaving algorithm to find a std::list::size call doing full traversal of the list with GCC, for instance. I don't know if newer versions are less inane.

The usual stupid SECURE_CRT stupidity should stupidly be taken care of. I wonder if this is what microsoft thinks we want to spend our time doing?

回复收藏 0 原文