可能很长的循环并在内部声明变量

发布于 2024-11-10 17:11:17 字数 732 浏览 4 评论 0原文

我最近编写了一个动态程序,用于计算两个 DNA 链序列(可能很长)之间的相似性(修改后的编辑距离)。

我的代码如下(不是实际代码,因为它是作业):

while(!file.eof){
   string line;
   int sizeY, sizeX;

   //get first strand
   getline(db, line)

   //second strand
   getline(db, line)

   double ** ary = new double[sizeY];
   //loop to initialize array

   for(i to sizeY)
   {
      for(i to sizex)
      {
            pair<string,string> p,d;
            p.first = "A";
            p.second = "T";
            d.first = "G";
            d.second = "C";
            //do some comparisons
      }
   }
}

上面的代码大约需要 40 分钟才能完成大约 2400 行的文件。 如果我将 p、d 和赋值对移到嵌套 for 循环之外并运行完全相同的文件,它将在大约 1 分钟内完成。

我在其他线程中读到性能几乎相同。我还用-O2 编译了它。

为什么上面的代码慢这么多?

I've recently written a dynamic program that calculates the similarity (modified edit distance) between two sequences of DNA strands (can be lengthy).

My code is like (not actual code since its an assignment):

while(!file.eof){
   string line;
   int sizeY, sizeX;

   //get first strand
   getline(db, line)

   //second strand
   getline(db, line)

   double ** ary = new double[sizeY];
   //loop to initialize array

   for(i to sizeY)
   {
      for(i to sizex)
      {
            pair<string,string> p,d;
            p.first = "A";
            p.second = "T";
            d.first = "G";
            d.second = "C";
            //do some comparisons
      }
   }
}

The code above will take approximately 40 minutes to complete on a file with ~2400 lines.
If I move the pair p,d and assignments outside the nested for-loop and run the exact same file, it will complete in about ~1 minute.

I've read in other threads that the performance is pretty much the same. I've also compiled it with -O2.

Why is the code above so much slower?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

国产ˉ祖宗 2024-11-17 17:11:17

考虑内部循环的每次迭代中必须发生的各种分配/解除分配。

  1. 在堆栈上分配pair对象
  2. 分配四个空字符串,每个可能在堆上分配1个字节
  3. 四个字符串分配,可能需要4次堆释放和新分配
  4. 破坏涉及4次堆释放的字符串
  5. 破坏pair对象

忽略堆栈分配(这应该相对便宜)总共 8 次堆分配和另外 8 次释放(或最好的情况是 4/4)。如果这是调试版本,则检查每个堆操作可能会产生额外的开销。

如果您的 sizeX/sizeY 常量为 2400,那么您总共执行了 9200 万 堆操作。如果幸运的话,每个操作将花费大约相同的时间,因为您在每个循环中分配相同大小的对象。如果您不幸运,那么由于堆碎片,某些堆操作可能需要更长的时间才能完成。

正如您所发现的,显而易见的解决方案是将变量定义和赋值放在循环之外。仅当它们在循环中的某个时刻被覆盖时,您才需要重新分配对值。

Consider the various allocations/deallocations that have to happen on each and every iteration of the inner loop.

  1. Allocate a pair object on the stack
  2. Allocate four empty strings, each probably a 1 byte allocation on the heap
  3. Four string assignments which may require 4 heap deallocations and new allocations
  4. Destruction of the strings involving 4 heap deallocations
  5. Destruction of the pair object

Ignoring the stack allocations (which should be relatively cheap) that is a total of 8 heap allocations and another 8 deallocations (or a best case of 4/4). If this is a debug build there may be additional overhead in checking each heap operation.

If your sizeX/sizeY constants are 2400 then you're doing a total of of 92 million heap operations. If you're lucky each of those operations will take around the same time since you're allocating the same sized object each loop. If you're unlucky then some heap operations may take significantly longer to accomplish due to heap fragmentation.

The obvious solution, as you've found, is to put the variable definition and assignment outside the loop. You only need to reassign the pair values if they are being overwritten within the loop at some point.

百思不得你姐 2024-11-17 17:11:17

通用答案:
看来您正在使用 gcc(即 g++);你总是可以执行 g++ -S [stuff] 来查看 G++ 对你的代码做了什么(假设你可以很好地阅读汇编)。

具体回答:
我很惊讶差异是 40 倍,但在你的代码中,每次完成循环时,它都必须调用 create_new_pair 两次(而且我原以为它必须进行一些清理才能“释放”老对,但考虑到它在堆栈上,我想这并不像我想象的那么难,或者至少我没有看到它......从 Gcc 读取代码曾经比读取 C++ 代码容易得多眼下)

Generic answer:
It would appear you're using gcc (that is to say g++); you can always do g++ -S [stuff] to see what G++ is doing to your code (assuming you can read assembly well enough to get by).

Specific answer:
I'm surprised the difference is 40 times, but in your code, every time you finish a loop, it has to call create_new_pair twice (and I would have thought it would have had to do a little bit of cleanup to "free" the old pair, but given that its on the stack, I guess that's not as hard as I would have thought, or at least I'm not seeing it... reading code from Gcc used to be a lot easier than reading C++ code is at the moment)

孤寂小茶 2024-11-17 17:11:17

这可能是因为变量是一个对象。由于 p 和 d 不是原始类型,除非编译器内联它的构造函数和析构函数(如果使用 -O3 而不是 -O2 则可能会发生这种情况),否则它将构造和析构两个 std::pair (并因此产生四个 std ::string) 每次迭代。如果它是一个原始变量(如 int),即使您没有启用内联优化,编译器也可以对其进行优化。

编辑:
请注意,由于 std::string 内部使用堆分配,因此即使内联也不会优化这些分配(但您仍然可以通过内联节省一些开销)。对于 int 的 std::pair,使用 -O3,循环内部或外部的性能应该相同。

It is probably becouse the variable is an object. Since p and d are not a primitive type, unless the compiler inline it's constructor and destructor (which may happen if you use -O3 instead of -O2), it will construct and destruct the two std::pair (and as consequence four std::string) each iteration. If it was a primitive variable (like int), the compiler could optimize this even if you don't have inline optimization enabled.

EDIT:
Note that since std::string internally use heap allocations, not even inline would optimize those allocations (but you still save some overhead with the inline). For a std::pair of int, with -O3, the performance should be the same inside or outside the loop.

把昨日还给我 2024-11-17 17:11:17

在具有良好编译器(至少进行中等优化)的编译语言中,在循环内声明变量永远不会成为“失败者”,并且通常(特别是对于仅进行适度优化的编译器)将是“胜利者”。

不过,对于解释性语言来说,情况可能会有所不同。每次通过循环,解释器都需要分配变量位置,这可能会很昂贵。

您还可能会遇到设计不良的编译环境的“失败者”情况,其中在堆栈上分配变量的成本很高。

尽管对于这些场景中的任何一个,我都无法解释 40:1 的差异。我怀疑省略的代码可能包含一些重要的线索。

[啊,在重新阅读时(可能在海报的重新编辑中)我发现这不仅仅是变量声明,而是被移到循环之外的对象创建。]

In a compiled language with a good compiler (that does at least mediocre optimization) having variables declared inside a loop is never going to be a "loser" and often (especially for compilers with only modest optimization) will be a "winner".

With interpreted languages it's likely to be different, though. Each time through the loop the interpreter will need to allocate the variable locations, and that could be expensive.

You could also have a "loser" situation with a poorly-designed compiled environment where allocating variables on the stack is expensive.

Though with any of these scenarios I'd be at a loss to explain a 40:1 difference. I suspect that the omitted code may contain some important clues.

[Ah, on re-reading (and possibly on the poster's re-editing) I see that it's not simply variable DECLARATION, but OBJECT creation that's being moved outside of the loop.]

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文