继续循环正在大大减慢Clang的运行时间

发布于 2025-02-02 11:15:44 字数 7396 浏览 1 评论 0 原文

问题

我遇到了一个leetcode问题 gas-station> gas-station

但是我发现我的代码稍微有点如果/else 而不是,如果/继续编辑:在用测试案例进行扭曲之后。我发现问题是继续
Clang似乎正在认真对待继续。我的猜测是,帽子叮当声以某种方式试图将零件放在继续之前,然后循环进行检查。

简而言之:我想知道:

  • 我如何“说服”叮当声理解我对继续的实施,例如/否则?因为我不想改变我的编码风格。

编辑 sidenote:删除分支预测标签。

相对问题

我应该使用返回/继续语句而不是if-else?看来它们几乎没有差异。

调查

编辑:演示表明问题是继续

我使用a 10k数据集测试代码,发现问题是继续本身。
如果 的身体结束,然后将运行时增加了,然后将运行时间增加到 30%〜50%,这对我来说仍然令人惊讶。
live demo
数量差:11070至15060(w/继续

于2022/5/30编辑:对不起,我发现我以前的天真分析错误地从一部分计算中优化了...这会导致差异按100 ... 糟糕的演示

propoling code

#include <iostream>
#include <vector>
#include <chrono>
#include <cstdlib>

using namespace std; // Yeah I know it's bad.
// input , output
class Solution {
public:
    int canCompleteCircuit(vector<int>& gas, vector<int>& cost) {
        if(gas.empty()){
            return -1;
        }
        int len = gas.size();
        // start at the last station, move start backward if we cannot finished the loop
        int start = len - 1;
        int tank = gas[start] - cost[start];
        int cur = 0; // from first station
        while(start != cur){ // other condition?
            if(tank < 0){
                // if not, move start backward, check the value
                // continue...start > 0;
                start--;
                tank += gas[start] - cost[start];
                // continue;
            }
            else{
                tank += gas[cur] - cost[cur];
                cur++;   
            }
        }
        //cout << tank ;
        if(tank < 0){
            return -1;    
        }
        return start;
    }
};
volatile int tryNoToBeOptimizedOut = 0;    
int main() {

    Solution s;
    vector<int> gas{...}; // 10k data here
    vector<int> cost{...}; 
    int count = 100; 
    vector<int> results;
    for(int i = 0; i <count; i++){
        // size_t rIndex = rand()%gas.size();
        // gas[rIndex] += i;
        gas[i] += i;
        auto start = std::chrono::system_clock::now();
        int ret = s.canCompleteCircuit(gas,cost);
        auto end = std::chrono::system_clock::now();
        auto elapsed = end - start;
        std::cout << elapsed.count() << '\n';
        tryNoToBeOptimizedOut = ret;
        results.push_back(ret);
    }   
    tryNoToBeOptimizedOut = results[rand()%results.size()];
   
    return 0;
}

旧测试

在第一位,我检查了我检查了我检查的。在GCC中实施,发现两个版本在组装中几乎相同。
gcc中的实时演示

我知道leetcode使用 clang 11 with o2 。 因此,我检查了Clang 11的差异。
clang 11
实时演示 clang 14中的实时演示没有更好的

它们之间的结果有所不同。 看来 cur ++; 合并为,而 If/code> If/contine 版本中。
我不确定为什么....这与优化政策有关吗?

我的代码:

    int canCompleteCircuit(vector<int>& gas, vector<int>& cost) {
        if(gas.empty()){
            return -1;
        }
        int len = gas.size();
        // start at the last station, move start backward if we cannot finished the loop
        int start = len - 1;
        int tank = gas[start] - cost[start];
        int cur = 0; // from first station
        while(start != cur){ // other condition?
            if(tank < 0){
               // if not, move start backward, check the value
                // continue...start > 0;
                start--;
                tank += gas[start] - cost[start];
                continue;
            }
            // else{
                tank += gas[cur] - cost[cur];
                cur++;   
                // continue; <-- If I put another continue there then the runtime is similar to if/else version. But is there a better way than adding conitnue manually?
            // }
        }
        if(tank < 0){
            return -1;    
        }
        return start;
    }

/ else 版本的

.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
        movsxd  rcx, ecx                #   `else` part
        mov     edx, dword ptr [r14 + 4*rcx]
        sub     edx, dword ptr [rbx + 4*rcx]
        add     ecx, 1
.LBB0_6:                                #   in Loop: Header=BB0_3 Depth=1
        add     eax, edx
        mov     edx, eax
        shr     edx, 31
        cmp     esi, ecx                # `while(start != cur){`
        je      .LBB0_7
.LBB0_3:                                # =>This Inner Loop Header: Depth=1
        test    dl, 1                   # `if(tank < 0){`
        je      .LBB0_5                 # go to else
        movsxd  rdi, esi
        add     esi, -1
        mov     edx, dword ptr [r14 + 4*rdi - 4]
        sub     edx, dword ptr [rbx + 4*rdi - 4]
        jmp     .LBB0_6

中的 indembly in /继续继续继续版本

.LBB0_3:                                # =>This Loop Header: Depth=1
        mov     r9, rdx
        mov     edx, r10d
        sub     edx, r9d
        add     rdx, 4
        movsxd  rdi, r9d
        lea     rsi, [r12 + 4*rdi]
        lea     rbx, [r14 + 4*rdi]
        xor     edi, edi
.LBB0_4:                                #   Parent Loop BB0_3 Depth=1
        test    cl, 1                   # `if` part
        jne     .LBB0_5
        add     eax, dword ptr [rbx + 4*rdi]  # after `if` statement
        sub     eax, dword ptr [rsi + 4*rdi]
        mov     ecx, eax
        shr     ecx, 31
        add     rdi, 1
        cmp     edx, edi
        jne     .LBB0_4
        jmp     .LBB0_7
.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
        add     eax, dword ptr [r14 + 4*r8 - 4]    # Body of `if`
        sub     eax, dword ptr [r12 + 4*r8 - 4]
        add     r8, -1
        mov     esi, r10d
        sub     esi, r9d
        add     esi, 3
        mov     ecx, eax
        shr     ecx, 31
        movsxd  rdx, r9d
        add     rdx, rdi
        add     r10d, -1
        cmp     esi, edi
        jne     .LBB0_3

Question

I came across a leetcode question gas-station

However I found my code is slightly faster if I use if/else rather than if/continue.
Edit: after twiddling with the test case. I found the problem is continue
It seems that clang is taking continue seriously..? My guess is hat clang somehow tries to put the part before continue and loop checking together.

In short: I want to know:

  • How do I "persuade" clang to understand my implementation about continue like if/else? Because I don't want to change my style of coding.

Edited sidenote: branch-prediction tag is removed.

Relative question

Should I use return/continue statement instead of if-else? which seems that they should have nearly no difference.

Investigation

Edit: the demo shows that the problem is continue

I used a 10k data set to test the code and found that the problem is continue itself.
I put continue in the if's end of body and then the runtime is increased by 30% ~ 50% which is still surprising to me.
Live Demo
Difference in number: 11070 to 15060(w/ continue)

Edited on 2022/5/30: Sorry I found that my previous naive profiling mistakenly got optimized out of part of calculation... which causes the difference in order of 100...
Bad demo

Profiling code

#include <iostream>
#include <vector>
#include <chrono>
#include <cstdlib>

using namespace std; // Yeah I know it's bad.
// input , output
class Solution {
public:
    int canCompleteCircuit(vector<int>& gas, vector<int>& cost) {
        if(gas.empty()){
            return -1;
        }
        int len = gas.size();
        // start at the last station, move start backward if we cannot finished the loop
        int start = len - 1;
        int tank = gas[start] - cost[start];
        int cur = 0; // from first station
        while(start != cur){ // other condition?
            if(tank < 0){
                // if not, move start backward, check the value
                // continue...start > 0;
                start--;
                tank += gas[start] - cost[start];
                // continue;
            }
            else{
                tank += gas[cur] - cost[cur];
                cur++;   
            }
        }
        //cout << tank ;
        if(tank < 0){
            return -1;    
        }
        return start;
    }
};
volatile int tryNoToBeOptimizedOut = 0;    
int main() {

    Solution s;
    vector<int> gas{...}; // 10k data here
    vector<int> cost{...}; 
    int count = 100; 
    vector<int> results;
    for(int i = 0; i <count; i++){
        // size_t rIndex = rand()%gas.size();
        // gas[rIndex] += i;
        gas[i] += i;
        auto start = std::chrono::system_clock::now();
        int ret = s.canCompleteCircuit(gas,cost);
        auto end = std::chrono::system_clock::now();
        auto elapsed = end - start;
        std::cout << elapsed.count() << '\n';
        tryNoToBeOptimizedOut = ret;
        results.push_back(ret);
    }   
    tryNoToBeOptimizedOut = results[rand()%results.size()];
   
    return 0;
}

old tests before edit

In the first place I checked the implementation in gcc and found that both versions are nearly identical in assembly.
Live Demo in gcc

I know that Leetcode uses clang 11 with O2.
So I checked the difference in clang 11.
Live Demo in clang 11
Live Demo in clang 14 doesn't get better

The results are a bit different between them.
It seems that cur++; is combined into while predicates in if/continue version.
I'm not sure why....Is it related to optimization policies?

My code:

    int canCompleteCircuit(vector<int>& gas, vector<int>& cost) {
        if(gas.empty()){
            return -1;
        }
        int len = gas.size();
        // start at the last station, move start backward if we cannot finished the loop
        int start = len - 1;
        int tank = gas[start] - cost[start];
        int cur = 0; // from first station
        while(start != cur){ // other condition?
            if(tank < 0){
               // if not, move start backward, check the value
                // continue...start > 0;
                start--;
                tank += gas[start] - cost[start];
                continue;
            }
            // else{
                tank += gas[cur] - cost[cur];
                cur++;   
                // continue; <-- If I put another continue there then the runtime is similar to if/else version. But is there a better way than adding conitnue manually?
            // }
        }
        if(tank < 0){
            return -1;    
        }
        return start;
    }

Assembly in if/else version

.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
        movsxd  rcx, ecx                #   `else` part
        mov     edx, dword ptr [r14 + 4*rcx]
        sub     edx, dword ptr [rbx + 4*rcx]
        add     ecx, 1
.LBB0_6:                                #   in Loop: Header=BB0_3 Depth=1
        add     eax, edx
        mov     edx, eax
        shr     edx, 31
        cmp     esi, ecx                # `while(start != cur){`
        je      .LBB0_7
.LBB0_3:                                # =>This Inner Loop Header: Depth=1
        test    dl, 1                   # `if(tank < 0){`
        je      .LBB0_5                 # go to else
        movsxd  rdi, esi
        add     esi, -1
        mov     edx, dword ptr [r14 + 4*rdi - 4]
        sub     edx, dword ptr [rbx + 4*rdi - 4]
        jmp     .LBB0_6

Assembly in if/continue version

.LBB0_3:                                # =>This Loop Header: Depth=1
        mov     r9, rdx
        mov     edx, r10d
        sub     edx, r9d
        add     rdx, 4
        movsxd  rdi, r9d
        lea     rsi, [r12 + 4*rdi]
        lea     rbx, [r14 + 4*rdi]
        xor     edi, edi
.LBB0_4:                                #   Parent Loop BB0_3 Depth=1
        test    cl, 1                   # `if` part
        jne     .LBB0_5
        add     eax, dword ptr [rbx + 4*rdi]  # after `if` statement
        sub     eax, dword ptr [rsi + 4*rdi]
        mov     ecx, eax
        shr     ecx, 31
        add     rdi, 1
        cmp     edx, edi
        jne     .LBB0_4
        jmp     .LBB0_7
.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
        add     eax, dword ptr [r14 + 4*r8 - 4]    # Body of `if`
        sub     eax, dword ptr [r12 + 4*r8 - 4]
        add     r8, -1
        mov     esi, r10d
        sub     esi, r9d
        add     esi, 3
        mov     ecx, eax
        shr     ecx, 31
        movsxd  rdx, r9d
        add     rdx, rdi
        add     r10d, -1
        cmp     esi, edi
        jne     .LBB0_3

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

幸福还没到 2025-02-09 11:15:44

简而言之,继续在您的代码中不是理想的选择,但是如果> ,我不认为我可以给您一个具体的答案但是我认为这是分支大小的某种比例,而相同情况的机会。运行此示例我得到的结果是野性的,从跑步到运行可能会改变大量数量。在示例中,我尝试以编译器不会尝试压缩主内部所有内容的方式将主循环混合在一起,但基本上由您的解决方案组成,并使用IF/Regine和两个LOOP测试。

我添加了&lt; type&gt; _loop()以比较循环如何在没有条件的情况下运行并实现

#include <iostream>
#include <iomanip>
#include <vector>
#include <chrono>
#include <cmath>

#include <sample.hpp>

int solution_a_if(std::vector<int>& gas, std::vector<int>& cost) {
    int start = gas.size() - 1;
    int tank = gas[start] - cost[start];
    int cur = 0;
    while(start != cur){ 
        if(tank < 0){
            start--;
            tank += gas[start] - cost[start];
        }
        else{
            tank += gas[cur] - cost[cur];
            cur++;   
        }
    }
    if(tank < 0){
        return -1;    
    }
    return start;
}
int solution_a_continue(std::vector<int>& gas, std::vector<int>& cost) {
    int start = gas.size() - 1;
    int tank = gas[start] - cost[start];
    int cur = 0;
    while(start != cur){ 
        if(tank < 0){
            start--;
            tank += gas[start] - cost[start];
            continue;
        }
        tank += gas[cur] - cost[cur];
        cur++;   
    }
    if(tank < 0){
        return -1;    
    }
    return start;
}

int solution_b_if(std::vector<int>& gas, std::vector<int>& cost) {
    const int * p_cost = cost.data();
    const int * p_gas = gas.data();
    const int size = (int)gas.size();

    int start = -1, step = 0, rem = 0, n = 0;

    for (; n < size; n++){
        step += p_gas[n] - p_cost[n];
        if(step < 0){
            rem  += step; 
            step ^= step;
            start = n;
        }
    }
    if(rem + step < 0){
        return -1;
    }
    return ++start;
}
int solution_b_continue(std::vector<int>& gas, std::vector<int>& cost) {
    const int * p_cost = cost.data();
    const int * p_gas = gas.data();
    const int size = (int)gas.size();

    int start = -1, step = 0, rem = 0, n = 0;

    for (; n < size; n++){
        step += p_gas[n] - p_cost[n];
        if(step >= 0){ 
            continue; 
        }
        rem  += step; 
        step ^= step;
        start = n;
    }
    if(rem + step < 0){
        return -1;
    }
    return ++start;
}

int thin_loop(std::vector<int>& gas, std::vector<int>& cost) {
    const int size = (int)gas.size();
    int sum = 0;
    for (int n = 0; n < size; n++){
        sum += gas[n] - cost[n];
    }
    return sum;
}
int thick_loop(std::vector<int>& gas, std::vector<int>& cost) {
    const int size = (int)gas.size();
    int sum = 0, n = 0;
    switch (size % 4){
        case 3: sum += gas[n] - cost[n]; n++;
        case 2: sum += gas[n] - cost[n]; n++;
        case 1: sum += gas[n] - cost[n]; n++;
    }
    for (; n < size; n += 4){
        sum += gas[n + 0] - cost[n + 0];
        sum += gas[n + 1] - cost[n + 1];
        sum += gas[n + 2] - cost[n + 2];
        sum += gas[n + 3] - cost[n + 3];
    }
    return sum;
}

std::chrono::steady_clock::time_point tmps;
std::chrono::duration<double> duration;
double time_foo(int ((*foo)(std::vector<int>&,std::vector<int>&))){
    tmps = std::chrono::steady_clock::now();
    {
        foo(gas, cost);
    }
    duration = std::chrono::steady_clock::now() - tmps;
    return duration.count();
}

void print_info(const char * str, double avr, double dev){
    int mili = 1000;  // nano -> micro
    std::cout << std::left << std::setw(15) << str << "->  (avr) " << 
    std::fixed << std::setprecision(8) << avr * mili << "  (dev) " << 
    std::fixed << std::setprecision(8) << dev * mili << std::endl;

}

int main(){
    int i, loops = 10;
    double * times = new double[loops * 6];

    for (i = 0; i < loops; i += 6){
        times[i + 0] = time_foo(solution_a_if);
        times[i + 1] = time_foo(solution_a_continue);
        times[i + 2] = time_foo(solution_b_if);
        times[i + 3] = time_foo(solution_b_continue);
        times[i + 4] = time_foo(thin_loop);
        times[i + 5] = time_foo(thick_loop);

        std::cout << "\r" << "Loop i -> " << i;
    }

    double avr [6] = { 0, 0, 0, 0, 0, 0 };
    double dev [6] = { 0, 0, 0, 0, 0, 0 };

    for (i = 0; i < loops; i += 6){
        avr[0] += times[i + 0];
        avr[1] += times[i + 1];
        avr[2] += times[i + 2];
        avr[3] += times[i + 3];
        avr[4] += times[i + 4];
        avr[5] += times[i + 5];
    }
    avr[0] /= loops; avr[1] /= loops;
    avr[2] /= loops; avr[3] /= loops;
    avr[4] /= loops; avr[5] /= loops;

    for (i = 0; i < loops; i += 6){
        dev[0] += std::pow(times[i + 0] - avr[0], 2);
        dev[1] += std::pow(times[i + 1] - avr[1], 2);
        dev[2] += std::pow(times[i + 2] - avr[2], 2);
        dev[3] += std::pow(times[i + 3] - avr[3], 2);
        dev[4] += std::pow(times[i + 4] - avr[4], 2);
        dev[5] += std::pow(times[i + 5] - avr[5], 2);
    }
    dev[0] = std::sqrt(dev[0] / (loops - 1));
    dev[1] = std::sqrt(dev[1] / (loops - 1));
    dev[2] = std::sqrt(dev[2] / (loops - 1));
    dev[3] = std::sqrt(dev[3] / (loops - 1));
    dev[4] = std::sqrt(dev[4] / (loops - 1));
    dev[5] = std::sqrt(dev[5] / (loops - 1));

    std::cout << std::endl << "Scale microseconds" << std::endl; 
    print_info("A (if)",       avr[0], dev[0]);
    print_info("A (continue)", avr[1], dev[1]);
    print_info("B (if)",       avr[2], dev[2]);
    print_info("B (continue)", avr[3], dev[3]);
    print_info("Thin loop",    avr[4], dev[4]);
    print_info("Thick loop",   avr[5], dev[5]);

    delete [] times;
    return 0;
}

I used the AMD profiler, this is the first time i used one of these and a least with this one is realy nice to use

Loop i -> 999996
Scale microseconds
A (if)         ->  (avr) 0.00141345  (dev) 0.00629380
A (continue)   ->  (avr) 0.00147324  (dev) 0.00349820
B (if)         ->  (avr) 0.00127486  (dev) 0.00452548
B (continue)   ->  (avr) 0.00102505  (dev) 0.00627695
Thin loop      ->  (avr) 0.00018223  (dev) 0.00130967
Thick loop     ->  (avr) 0.00043947  (dev) 0.00244409

“

Loop i -> 996
Scale microseconds
A (if)         ->  (avr) 0.00148050  (dev) 0.00339158
A (continue)   ->  (avr) 0.00152250  (dev) 0.00325213
B (if)         ->  (avr) 0.00138230  (dev) 0.00322943
B (continue)   ->  (avr) 0.00114570  (dev) 0.00315594
Thin loop      ->  (avr) 0.00019380  (dev) 0.00043278
Thick loop     ->  (avr) 0.00046620  (dev) 0.00100103

Small rant: The leetcode testing system is so bad that makes overwatch look balanced, just by rerunning my code the timing changed drastically (min->

< /code>是在您提供的样本的时间内定义的气/成本阵列

It is not, in short, continue is not ideal in your code but it is not slower or faster than if and i don't think i can give you a concrete answer but i think it's some kind of ratio of branch size over the chance of same condition. Running this example the results i get are wild, from run to run it can change a massive amount; in the example i tried to mix up a bit the main loop in a way that the compiler wont try to compress everything inside main but it basically consists of your solution and mine with if/continue and with two loop test.

I added the <type>_loop() to compare how the loop would run without the condition and implementing something like a duff or a loop unroling (to me they are the same thing but writen in a slightly different way); somehow the "faster" version runs slower and i only suspect is a bandwidth problem just from the time waited

#include <iostream>
#include <iomanip>
#include <vector>
#include <chrono>
#include <cmath>

#include <sample.hpp>

int solution_a_if(std::vector<int>& gas, std::vector<int>& cost) {
    int start = gas.size() - 1;
    int tank = gas[start] - cost[start];
    int cur = 0;
    while(start != cur){ 
        if(tank < 0){
            start--;
            tank += gas[start] - cost[start];
        }
        else{
            tank += gas[cur] - cost[cur];
            cur++;   
        }
    }
    if(tank < 0){
        return -1;    
    }
    return start;
}
int solution_a_continue(std::vector<int>& gas, std::vector<int>& cost) {
    int start = gas.size() - 1;
    int tank = gas[start] - cost[start];
    int cur = 0;
    while(start != cur){ 
        if(tank < 0){
            start--;
            tank += gas[start] - cost[start];
            continue;
        }
        tank += gas[cur] - cost[cur];
        cur++;   
    }
    if(tank < 0){
        return -1;    
    }
    return start;
}

int solution_b_if(std::vector<int>& gas, std::vector<int>& cost) {
    const int * p_cost = cost.data();
    const int * p_gas = gas.data();
    const int size = (int)gas.size();

    int start = -1, step = 0, rem = 0, n = 0;

    for (; n < size; n++){
        step += p_gas[n] - p_cost[n];
        if(step < 0){
            rem  += step; 
            step ^= step;
            start = n;
        }
    }
    if(rem + step < 0){
        return -1;
    }
    return ++start;
}
int solution_b_continue(std::vector<int>& gas, std::vector<int>& cost) {
    const int * p_cost = cost.data();
    const int * p_gas = gas.data();
    const int size = (int)gas.size();

    int start = -1, step = 0, rem = 0, n = 0;

    for (; n < size; n++){
        step += p_gas[n] - p_cost[n];
        if(step >= 0){ 
            continue; 
        }
        rem  += step; 
        step ^= step;
        start = n;
    }
    if(rem + step < 0){
        return -1;
    }
    return ++start;
}

int thin_loop(std::vector<int>& gas, std::vector<int>& cost) {
    const int size = (int)gas.size();
    int sum = 0;
    for (int n = 0; n < size; n++){
        sum += gas[n] - cost[n];
    }
    return sum;
}
int thick_loop(std::vector<int>& gas, std::vector<int>& cost) {
    const int size = (int)gas.size();
    int sum = 0, n = 0;
    switch (size % 4){
        case 3: sum += gas[n] - cost[n]; n++;
        case 2: sum += gas[n] - cost[n]; n++;
        case 1: sum += gas[n] - cost[n]; n++;
    }
    for (; n < size; n += 4){
        sum += gas[n + 0] - cost[n + 0];
        sum += gas[n + 1] - cost[n + 1];
        sum += gas[n + 2] - cost[n + 2];
        sum += gas[n + 3] - cost[n + 3];
    }
    return sum;
}

std::chrono::steady_clock::time_point tmps;
std::chrono::duration<double> duration;
double time_foo(int ((*foo)(std::vector<int>&,std::vector<int>&))){
    tmps = std::chrono::steady_clock::now();
    {
        foo(gas, cost);
    }
    duration = std::chrono::steady_clock::now() - tmps;
    return duration.count();
}

void print_info(const char * str, double avr, double dev){
    int mili = 1000;  // nano -> micro
    std::cout << std::left << std::setw(15) << str << "->  (avr) " << 
    std::fixed << std::setprecision(8) << avr * mili << "  (dev) " << 
    std::fixed << std::setprecision(8) << dev * mili << std::endl;

}

int main(){
    int i, loops = 10;
    double * times = new double[loops * 6];

    for (i = 0; i < loops; i += 6){
        times[i + 0] = time_foo(solution_a_if);
        times[i + 1] = time_foo(solution_a_continue);
        times[i + 2] = time_foo(solution_b_if);
        times[i + 3] = time_foo(solution_b_continue);
        times[i + 4] = time_foo(thin_loop);
        times[i + 5] = time_foo(thick_loop);

        std::cout << "\r" << "Loop i -> " << i;
    }

    double avr [6] = { 0, 0, 0, 0, 0, 0 };
    double dev [6] = { 0, 0, 0, 0, 0, 0 };

    for (i = 0; i < loops; i += 6){
        avr[0] += times[i + 0];
        avr[1] += times[i + 1];
        avr[2] += times[i + 2];
        avr[3] += times[i + 3];
        avr[4] += times[i + 4];
        avr[5] += times[i + 5];
    }
    avr[0] /= loops; avr[1] /= loops;
    avr[2] /= loops; avr[3] /= loops;
    avr[4] /= loops; avr[5] /= loops;

    for (i = 0; i < loops; i += 6){
        dev[0] += std::pow(times[i + 0] - avr[0], 2);
        dev[1] += std::pow(times[i + 1] - avr[1], 2);
        dev[2] += std::pow(times[i + 2] - avr[2], 2);
        dev[3] += std::pow(times[i + 3] - avr[3], 2);
        dev[4] += std::pow(times[i + 4] - avr[4], 2);
        dev[5] += std::pow(times[i + 5] - avr[5], 2);
    }
    dev[0] = std::sqrt(dev[0] / (loops - 1));
    dev[1] = std::sqrt(dev[1] / (loops - 1));
    dev[2] = std::sqrt(dev[2] / (loops - 1));
    dev[3] = std::sqrt(dev[3] / (loops - 1));
    dev[4] = std::sqrt(dev[4] / (loops - 1));
    dev[5] = std::sqrt(dev[5] / (loops - 1));

    std::cout << std::endl << "Scale microseconds" << std::endl; 
    print_info("A (if)",       avr[0], dev[0]);
    print_info("A (continue)", avr[1], dev[1]);
    print_info("B (if)",       avr[2], dev[2]);
    print_info("B (continue)", avr[3], dev[3]);
    print_info("Thin loop",    avr[4], dev[4]);
    print_info("Thick loop",   avr[5], dev[5]);

    delete [] times;
    return 0;
}

I used the AMD profiler, this is the first time i used one of these and a least with this one is realy nice to use

Loop i -> 999996
Scale microseconds
A (if)         ->  (avr) 0.00141345  (dev) 0.00629380
A (continue)   ->  (avr) 0.00147324  (dev) 0.00349820
B (if)         ->  (avr) 0.00127486  (dev) 0.00452548
B (continue)   ->  (avr) 0.00102505  (dev) 0.00627695
Thin loop      ->  (avr) 0.00018223  (dev) 0.00130967
Thick loop     ->  (avr) 0.00043947  (dev) 0.00244409

run of 1m

Loop i -> 996
Scale microseconds
A (if)         ->  (avr) 0.00148050  (dev) 0.00339158
A (continue)   ->  (avr) 0.00152250  (dev) 0.00325213
B (if)         ->  (avr) 0.00138230  (dev) 0.00322943
B (continue)   ->  (avr) 0.00114570  (dev) 0.00315594
Thin loop      ->  (avr) 0.00019380  (dev) 0.00043278
Thick loop     ->  (avr) 0.00046620  (dev) 0.00100103

run of 1k

Small rant: The leetcode testing system is so bad that makes overwatch look balanced, just by rerunning my code the timing changed drastically (min->71ms, max->155ms; with 26 runs) and the memory usage is just a lie, this example runs with ~10mb and it is telling me that just the function is consuming close to 70mb

<sample.hpp> is defines the gas/cost arrays from that long as sample you gave

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文