C 中 switch 语句的开销

发布于 2024-07-22 02:48:07 字数 1306 浏览 10 评论 0原文

我是一位相当有能力的 Java 程序员,但对 C 非常陌生。我正在尝试优化具有四种操作模式的例程。

我循环遍历图像中的所有像素,并根据传递的“模式”计算新的像素值。

我的问题是关于两个嵌套 for 循环中 switch 语句的开销。 我对有关基本 C 语句、数学和逻辑运算的相对效率的任何文档链接感兴趣。

代码如下:

for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             switch (mode)                  /* select the type of calculation */
             {
                case 0:
                weight = dCentre / maxDistanceEdge;
                case 1: 
                    weight = (float)x/width;             
                    break;
                case 2: 
                    weight = (float)y/height;
                    break;
                case 3: 
                    weight = dBottomLeft / maxDistanceCorner;
                    break;
                case 4: 
                    weight = dTopRight / maxDistanceCorner;
                    break;
                default: 
                weight = 1;
                break;
            }
             // Calculate the new pixel value given the weight
             ...
            }             

    }

如果这是超过 5000 x 5000 像素的图像,您会期望看到很多开销吗? 我尝试做一些测试,但我的结果到处都是,因为系统(移动设备)在后台运行着各种可能会扭曲结果的东西。

另一种选择是为每种模式提供单独的方法,每种模式都有自己的四个循环。 这显然会引入冗余代码,但效率才是这里的关键。

提前致谢!

加夫

I'm a fairly competent Java programmer who's very new to C. I am trying to optimize a routine that has four modes of operation.

I loop over all the pixels in an image and compute a new pixel value depending on the 'mode' passed.

My question regards the overhead of a switch statement within two nested for loops. I'd be interested in any links to documentation regarding the relative efficiency of basic C statements, math and logical operations.

The code would go as follows;

for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             switch (mode)                  /* select the type of calculation */
             {
                case 0:
                weight = dCentre / maxDistanceEdge;
                case 1: 
                    weight = (float)x/width;             
                    break;
                case 2: 
                    weight = (float)y/height;
                    break;
                case 3: 
                    weight = dBottomLeft / maxDistanceCorner;
                    break;
                case 4: 
                    weight = dTopRight / maxDistanceCorner;
                    break;
                default: 
                weight = 1;
                break;
            }
             // Calculate the new pixel value given the weight
             ...
            }             

    }

Would you expect to see much overhead if this was over a 5000 x 5000 pixel image? I've tried to do some testing but my results are all over the place as the system (Mobile Device) has all sorts of stuff running in the background that may skew results.

The other option is to have a separate method for each mode, each with its own four loops. This would obviously introduce redundant code but efficiency is the name of the game here.

Thanks in advance!

Gav

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(16

递刀给你 2024-07-29 02:48:08

Switch 语句编译为连续值的跳转表和稀疏值的一堆 if-else 语句。 无论如何,如果您关心性能,则不希望在图像处理的内部循环中使用 switch 语句。 您想要如下所示。

另请注意,我将权重计算移出了内部循环(并交换了情况 2 的循环以实现此目的)。 这种将内容移出内循环的思维方式将为您带来您想要的 C 性能。

switch (mode)                  /* select the type of calculation */
{
case 0:
    weight = dCentre / maxDistanceEdge;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 1:
    for (x = 0; x < width; x++) {
        weight = (float)x/width;
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 2:
    // note - the loops have been swapped to get the weight calc out of the inner loop
    for (y = 0; y < height; y++) {
        weight = (float)y/height;
        for (x = 0; x < width; x++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 3:
    weight = dBottomLeft / maxDistanceCorner;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 4:
    weight = dTopRight / maxDistanceCorner;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
default:
    weight = 1;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;

// etc..
}

Switch statements compile to a jump table for consecutive values and to a bunch of if-else statements for sparse values. In any case, you don't want a switch statement in your inner loop for image processing if you care about performance. You want to as below instead.

Also, note that I moved the weight calculation out of the inner loop (and swapped the loops for case 2 in order to achieve this). This type of thinking, moving stuff out of the inner loop, will get you the performance you want out of C.

switch (mode)                  /* select the type of calculation */
{
case 0:
    weight = dCentre / maxDistanceEdge;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 1:
    for (x = 0; x < width; x++) {
        weight = (float)x/width;
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 2:
    // note - the loops have been swapped to get the weight calc out of the inner loop
    for (y = 0; y < height; y++) {
        weight = (float)y/height;
        for (x = 0; x < width; x++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 3:
    weight = dBottomLeft / maxDistanceCorner;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
case 4:
    weight = dTopRight / maxDistanceCorner;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;
default:
    weight = 1;
    for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             // Calculate the new pixel value given the weight
             ...
        }
    }
    break;

// etc..
}
菩提树下叶撕阳。 2024-07-29 02:48:08

如果效率比代码大小更重要,那么您应该创建冗余例程。 case 语句是您可以在 C 中执行的开销较低的操作之一,但它不是零 - 它必须根据模式进行分支,因此需要时间。 如果您确实想要最大性能,请将案例移出循环,即使以重复循环为代价。

If efficiency is more important than code size, then yes you should create redundant routines. The case statement is one of the lower overhead things you can do in C, but it's not zero - it's going to have to branch based on the mode, and so it's going to take time. If you really want max performance, get the case out of the loop, even at the cost of duplicating the loop.

绝情姑娘 2024-07-29 02:48:08

Switch 语句尽可能高效。 它们被编译成跳转表。 事实上,这就是 switch 受到限制的原因:您只能编写一个可以根据固定值编译跳转表的 switch。

Switch statements are about as efficient as they can possibly be. They're compiled to a jump table. In fact, that's why switch is as limited as it is: you can only write a switch for which you can compile a jump tables based on a fixed value.

幻想少年梦 2024-07-29 02:48:08

与您在循环中进行的数学运算相比,切换的开销可能是最小的。 话虽如此,唯一确定的方法是为两种不同的方法创建不同的版本,并为其计时。

Compared with the maths you are doing in the loop, the overhead of the switch will probably be minimal. having said that, the only way to be sure is to create different versions for the two different approaches, and time them.

孤独岁月 2024-07-29 02:48:08

与 if/else 的等效项相比,switch/case 速度极快:它通常作为跳转表实现。 然而它仍然有成本。

当您优化事物时:

1)尝试在行上循环,而不是在列上循环(切换x和y“for”循环),由于缓存内存管理,一种解决方案可能比另一种解决方案快得多。

2)用(预先计算的)逆的乘法替换所有除法将为您带来显着的增益,并且可能会带来可接受的精度损失。

Switch/case is extremely fast compared to the equivalent with if/else: it is typically implemented as a jump table. However it still has a cost.

While you are optimizing things:

1) Try to loop over lines, not over columns (switch x and y "for" loops), one solution may be incredibly faster than the other, due to cache memory management.

2) Replacing all divisions by multiplications of the (pre-calculated) inverse will give you significant gain, and probably an acceptable precision loss.

夜雨飘雪 2024-07-29 02:48:08

为了提高效率,您最好将 switch 移到循环之外。

我会使用这样的函数指针:

double fun0(void) { return dCentre/maxDistanceEdge; }
double fun1(void) { return (float)x/width; }
/* and so on ... */

double (*fun)(void);

switch (mode)                  /* select the type of calculation */
{
    case 0: fun = fun0;
            break;
    case 1: fun = fun1;
            break;
    case 2: fun = fun2;
            break;
    case 3: fun = fun3;
            break;
    case 4: fun = fun3;
            break;
    default : fun = fun_default;
            break;
}

for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             weight = fun();
             // Calculate the new pixel value given the weight
             ...
        }
}

它增加了函数调用开销,但它不应该太大,因为您不向函数传递任何参数。 我认为这是性能和可读性之间的良好权衡。

编辑:如果您使用GCC,要摆脱函数调用,您可以使用goto标签作为值: 在开关中找到正确的标签,然后每次都跳转到它。 我认为它应该可以节省更多的周期。

For efficiency sake you better move switch outside the loop.

I'd use function pointers like this:

double fun0(void) { return dCentre/maxDistanceEdge; }
double fun1(void) { return (float)x/width; }
/* and so on ... */

double (*fun)(void);

switch (mode)                  /* select the type of calculation */
{
    case 0: fun = fun0;
            break;
    case 1: fun = fun1;
            break;
    case 2: fun = fun2;
            break;
    case 3: fun = fun3;
            break;
    case 4: fun = fun3;
            break;
    default : fun = fun_default;
            break;
}

for (x = 0; x < width; x++) {
        for (y = 0; y < height; y++) {
             weight = fun();
             // Calculate the new pixel value given the weight
             ...
        }
}

It adds function call overhead but it shouldn't be too big as you pass no params to the function. I think it is good trade-off between performance and readability.

EDIT: If you use GCC, to get rid of function call you can use goto and labels as values: find the right label within the switch and then just jump to it every time. I think it should save few more cycles.

绅士风度i 2024-07-29 02:48:08

开关不应该产生任何显着的开销,它们被编译成低端的一种指针数组,那么这是有效的情况:

jmp {baseaddress}+switchcasenum

Switches shouldnt produce any significant overhead, they get compiled into a sort of array of pointers at the low end, then it's a case of effectively:

jmp {baseaddress}+switchcasenum

属性 2024-07-29 02:48:08

这可能取决于 CPU 的分支预测器有多好,以及编译器如何生成切换代码。 对于如此少量的情况,它可能会生成决策树,在这种情况下,正常的 CPU 分支预测应该能够消除大部分开销。 如果它生成一个切换表,情况可能会更糟……

也就是说,找出答案的最佳方法是分析它并查看。

This would probably depend on how good your CPU's branch predictor is, and how your compiler generates the code for the switch. For such a small number of cases, it might generate a decision tree, in which case normal CPU branch prediction should be able to remove most of the overhead. Things might be a bit worse if it generates a switch table...

That said, the best way to find out is to profile it and see.

薔薇婲 2024-07-29 02:48:08

除了吉姆的建议之外,还可以尝试交换循环的顺序。 循环交换是否适合情况 1 需要测试,但我怀疑确实如此。 您几乎总是希望您的 x 坐标位于内部循环中,以提高分页性能,因为这会导致您的函数在每次迭代时更倾向于保留在相同的通用内存区域中。 资源有限的移动设备可能具有足够低的内存,因此这种差异将被强调。

In addition to Jim's advice, try swapping the order of the loops. Whether loop-swapping is ideal for case 1 would require testing, but I suspect it is. You almost always want your x coordinate inside your inner loop in order to improve paging performance, as this causes your function to have a better tendency to stay in the same general memory area each iteration. And a mobile device with limitted resources might have low enough ram that this difference will be emphasized.

明天过后 2024-07-29 02:48:08

很抱歉打扰这个帖子,但在我看来,这个开关离问题还很远。

在这种情况下,效率的真正问题是部门。 在我看来,除法运算的所有分母都是常数(宽度、高度、最大值...),并且这些在整个图像过程中不会改变。 如果我的猜测是正确的,那么这些都是简单的变量,可以根据加载的图像进行更改,以便在运行时可以使用任何大小的图像,现在这允许加载任何图像大小,但这也意味着编译器无法优化它们如果它们被声明为“const”,它就可以进行更简单的乘法运算。 我的建议是预先计算这些常数的倒数并相乘。 据我记得,乘法运算大约需要 10 个时钟周期,而除法大约需要 70 个时钟周期。这意味着每个像素增加了 60 个周期,对于上述 5000x5000,预计速度会增加 1.5 秒。 1 GHz CPU。

Sorry to bump this thread, but it seems to me that the switch is FAR from the problem.

The real problem with the efficiency in that case is the divisions. It seems to me that all of denominators of the division operations are constants (width, height, max...) and these will not change throughout the course of the image. If my guess is right, then these are simple variables that can change based on the image loaded so that any size image can be used at runtime, now this allows for any image size to be loaded, but this also means the compiler cannot optimize them into the much simpler multiplication operation which it could do if they were declared "const". My suggestion would be to pre-calculate the inverses of these constants and multiply. As far as I can remember, the multiplication operation takes about 10 clock cycles, where as the division takes around 70. That's an increase of 60 cycles per pixel, and with the above mentioned 5000x5000, that's an estimated speed increase of 1.5 seconds on a 1 GHz CPU.

遥远的绿洲 2024-07-29 02:48:08

取决于芯片和编译器以及代码的细节,并且......但这通常会作为跳转表实现,这应该非常快。

顺便说一句——理解这种事情是在你职业生涯的某个阶段花几周时间学习一些汇编的一个很好的论据......

Depends on the chip and the compiler and the details of the code, and...but this will often be implemented as a jump table, which should be pretty fast.

BTW-- Understanding this kind of thing is a pretty good argument for spending a couple of weeks learning some assembly at some point in your career...

第七度阳光i 2024-07-29 02:48:08

使用开关对于速度和编程时间来说可能更好。 您正在编写更少的冗余代码,并且可能不需要新的堆栈框架。

开关非常高效,以至于它们可以用于非常奇怪和令人困惑的黑魔法

Using a switch is probably better both for speed and programmer time. You're making less redundant code and it probably won't require a new stack frame.

Switches are so efficient that they can used for really weird and confusing black magic.

就此别过 2024-07-29 02:48:08

但效率才是这里的关键。

迭代图像缓冲区以计算新的像素值听起来像是一个典型的令人尴尬的并行问题,从这个意义上说,您可能需要考虑将一些工作推入工作线程,这应该比微优化更显着地加快您的操作速度例如 switch/case 问题。

此外,您可以从函数指针数组中调用函数指针,而不是每次都执行分支指令,其中索引用作模式标识符。

因此,您最终会得到如下调用:

computeWeight[mode](pixel);

对于 5000x5000 像素,还可以通过针对一系列像素(而不是单个像素)调用函数来减少函数调用开销。

您还可以使用循环展开和通过引用/指针传递参数,以便进一步优化。

but efficiency is the name of the game here.

iterating over an image buffer in order to compute new pixel values sounds like a typical embarrassingly-parallel problem, in this sense you might want to consider pushing some of the work into worker threads, this should speed up your operation more notably than micro-optimizations such as switch/case concerns.

Also, instead of doing the branching instructions every time, you could invoke a function pointer from an array of function pointers, where the index serves as your mode identifier.

So that you end up with calls such as:

computeWeight[mode](pixel);

With 5000x5000 pixels, the function call overhead could also be reduced by calling the function for a range of pixels, rather than individual pixels.

You could also use loop unrolling and parameter passing by reference/pointer, in order to optimize this further.

末が日狂欢 2024-07-29 02:48:08

已经给出了许多优点。 我唯一能想到的就是将开关中最常见的情况向上移动,将最不常见的情况向下移动。

因此,如果情况 4 比情况 1 发生得更频繁,则它应该位于其之上:

switch (mode) {
    case 4:
        // ..
        break;
    case 1:
        // ..
        break;
}

太糟糕了,您没有使用 c++,因为这样可以用多态性来替换 switch 语句。

干杯!

Many good points are already given. Only thing I could think of to add to this, is to move the most frequent cases up in the switch and the least frequent down.

So if case 4 happens more often than case 1, it should be above it:

switch (mode) {
    case 4:
        // ..
        break;
    case 1:
        // ..
        break;
}

Too bad you weren't using c++, because then the switch statement could be replaced with polymorphism.

Cheers !

秋心╮凉 2024-07-29 02:48:08

在这个线程中有很多创造性的建议,不需要编写 5 个单独的函数。

除非您从文件或键入的输入中读取“模式”,否则可以在编译时确定计算方法。 作为一般规则,您不希望将计算从编译时移动到运行时。

无论哪种方式,代码都会更容易阅读,并且不会有人对您是否打算在第一种情况下放入break语句感到困惑。

此外,当您在周围代码中遇到错误时,您不必查找枚举是否设置为错误的值。

There are a lot of creative suggestions in this thread of ways not having to write 5 separate functions.

Unless you read 'mode' from a file or from typed input the calculation method can be determined at compile time. As a general rule you don't want to move calculations from compile time to run time.

Either way the code would be easier to read and no-one would be confused as to whether or not you meant to put in the break statement in the first case or not.

Also when you get bugs in the surrounding code you will not have to look up if the enum was set to the wrong value or not.

十年不长 2024-07-29 02:48:08

关于内部循环... 0->var 最好是 var->0,因为 var-- 会触发零标志(6502 天)。 这种方法还意味着“宽度”被加载到 x 中并且可以被忘记,“高度”也是如此。 另外,内存中的像素通常是左->右、上->下,所以肯定有 x 作为你的内循环。

for (y = height; y--;) {
    for (x = width; x--;) {
         weight = fun();
         // Calculate the new pixel value given the weight
         ...
    }
}

另外...非常重要的是你的 switch 语句只有 2 个使用 x 或 y 的情况。 其余的都是常数。

 switch (mode)                  /* select the type of calculation */
 {
     case 0:
        weight = dCentre / maxDistanceEdge;
        break;
     //case 1: 
     //  weight = (float)x/width;             
     // break;
     //case 2: 
     //     weight = (float)y/height;
     //     break;
     case 3: 
          weight = dBottomLeft / maxDistanceCorner;
          break;
      case 4: 
           weight = dTopRight / maxDistanceCorner;
           break;
      default: 
           weight = 1;
           break;
 }

所以基本上除非模式 1 或 2 权重是在循环之前计算的。

... Y loop code here

    if (mode == 2) { weight = (float)y/height; } // calc only once per Y loop

    ... X loop here

        if (mode == 1) { weight = (float)x/width; } // after this all cases have filled weight
        calc_pixel_using_weight(weight);

我发现如果数据稀疏, switch 语句非常不友善。 对于< 我会选择 if-then-else 的 4 个元素,并确保最常见的情况位于顶部。 如果第一个条件适用于 90% 的情况,那么您基本上就打出了全垒打。 同样,如果某个其他条件< 1% 把它放在最后。

With regard to the inner loops... 0->var is better done var->0 as var-- triggers the zero flag (6502 days). This approach also means "width" is loaded into x and can be forgotten about, same goes for "height". Also pixels in memory are usually left->right, top->bottom so definitely have x as your inner loop.

for (y = height; y--;) {
    for (x = width; x--;) {
         weight = fun();
         // Calculate the new pixel value given the weight
         ...
    }
}

Also... and very important is your switch statements only have 2 cases that use x or y. The rest are constants.

 switch (mode)                  /* select the type of calculation */
 {
     case 0:
        weight = dCentre / maxDistanceEdge;
        break;
     //case 1: 
     //  weight = (float)x/width;             
     // break;
     //case 2: 
     //     weight = (float)y/height;
     //     break;
     case 3: 
          weight = dBottomLeft / maxDistanceCorner;
          break;
      case 4: 
           weight = dTopRight / maxDistanceCorner;
           break;
      default: 
           weight = 1;
           break;
 }

So basically unless mode 1 or 2 weight is calculated before the loop.

... Y loop code here

    if (mode == 2) { weight = (float)y/height; } // calc only once per Y loop

    ... X loop here

        if (mode == 1) { weight = (float)x/width; } // after this all cases have filled weight
        calc_pixel_using_weight(weight);

I have found switch statements to be very unkind if data is sparse. For < 4 elements I'd go for if-then-else and make sure the most common cases are up the top. If the first condition catches 90% of cases you've basically hit a home run. Likewise if some other condition is < 1% put it last.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文