零位移位能正常工作吗？

发布于 2024-07-25 04:09:29 字数 332 浏览 6 评论 0原文

假设我有一个这样的函数：

inline int shift( int what, int bitCount )
{
    return what >> bitCount;
}

每次 bitCount 为非负且在 int 中的位数内时，都会从不同的站点调用它。我特别担心 bitCount 等于零的调用 - 那么它能正常工作吗？

另外，编译器在编译其调用站点时看到函数的整个代码是否有可能将 bitCount 等于 0 的调用减少为无操作？

原文

Say I have a function like this:

inline int shift( int what, int bitCount )
{
    return what >> bitCount;
}

It will be called from different sites each time bitCount will be non-negative and within the number of bits in int. I'm particularly concerned about call with bitCount equal to zero - will it work correctly then?

Also is there a chance that a compiler seeing the whole code of the function when compiling its call site will reduce calls with bitCount equal to zero to a no-op?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

缱倦旧时光 2024-08-01 04:09:30

根据K&R“结果未定义如果右操作数为负，或者大于或等于左表达式类型中的位数。” (A.7.8) 因此 >> 0 是恒等右移并且完全合法。

回复收藏 0 原文

泪痕残 2024-08-01 04:09:30

肯定至少有一个 C++ 编译器会识别这种情况（当编译时已知 0 时）并使其成为无操作：

源代码

inline int shift( int what, int bitcount)
{
  return what >> bitcount ;
}

int f() {
  return shift(42,0);
}

编译器开关

icpc -S -O3 -mssse3 -fp-model fast=2 bitsh.C

英特尔 C++ 11.0 程序集

# -- Begin  _Z1fv
# mark_begin;
       .align    16,0x90
        .globl _Z1fv
_Z1fv:
..B1.1:                         # Preds ..B1.0
        movl      $42, %eax                                     #7.10
        ret                                                     #7.10
        .align    16,0x90
                                # LOE
# mark_end;
        .type   _Z1fv,@function
        .size   _Z1fv,.-_Z1fv
        .data
# -- End  _Z1fv
        .data
        .section .note.GNU-stack, ""
# End

正如您在 ..B1.1 中看到的，英特尔将“return shift(42,0)”编译为“return 42”。

Intel 11 还剔除了这两种变体的移位：

int g() {
  int a = 5;
  int b = 5;
  return shift(42,a-b);
}

int h(int k) {
  return shift(42,k*0);
}

对于编译时移位值未知的情况……

int egad(int m, int n) {
  return shift(42,m-n);
}

移位无法避免……

# -- Begin  _Z4egadii
# mark_begin;
       .align    16,0x90
        .globl _Z4egadii
_Z4egadii:
# parameter 1: 4 + %esp
# parameter 2: 8 + %esp
..B1.1:                         # Preds ..B1.0
        movl      4(%esp), %ecx                                 #20.5
        subl      8(%esp), %ecx                                 #21.21
        movl      $42, %eax                                     #21.10
        shrl      %cl, %eax                                     #21.10
        ret                                                     #21.10
        .align    16,0x90
                                # LOE
# mark_end;

但至少它是内联的，因此没有调用开销。

额外的组装：挥发性是昂贵的。源......

int g() {
  int a = 5;
  volatile int b = 5;
  return shift(42,a-b);
}

而不是无操作，编译为......

..B3.1:                         # Preds ..B3.0
        pushl     %esi                                          #10.9
        movl      $5, (%esp)                                    #12.18
        movl      (%esp), %ecx                                  #13.21
        negl      %ecx                                          #13.21
        addl      $5, %ecx                                      #13.21
        movl      $42, %eax                                     #13.10
        shrl      %cl, %eax                                     #13.10
        popl      %ecx                                          #13.10
        ret                                                     #13.10
        .align    16,0x90
                                # LOE
# mark_end;

所以如果你正在一台机器上工作，当你弹出它们时，你压入堆栈的值可能不一样，那么，这种错过的优化可能是您遇到的最小的麻烦。

It is certain that at least one C++ compiler will recognize the situation (when the 0 is known at compile time) and make it a no-op:

Source

inline int shift( int what, int bitcount)
{
  return what >> bitcount ;
}

int f() {
  return shift(42,0);
}

Compiler switches

icpc -S -O3 -mssse3 -fp-model fast=2 bitsh.C

Intel C++ 11.0 assembly

# -- Begin  _Z1fv
# mark_begin;
       .align    16,0x90
        .globl _Z1fv
_Z1fv:
..B1.1:                         # Preds ..B1.0
        movl      $42, %eax                                     #7.10
        ret                                                     #7.10
        .align    16,0x90
                                # LOE
# mark_end;
        .type   _Z1fv,@function
        .size   _Z1fv,.-_Z1fv
        .data
# -- End  _Z1fv
        .data
        .section .note.GNU-stack, ""
# End

As you can see at ..B1.1, Intel compiles "return shift(42,0)" to "return 42."

Intel 11 also culls the shift for these two variations:

int g() {
  int a = 5;
  int b = 5;
  return shift(42,a-b);
}

int h(int k) {
  return shift(42,k*0);
}

For the case when the shift value is unknowable at compile time ...

int egad(int m, int n) {
  return shift(42,m-n);
}

... the shift cannot be avoided ...

# -- Begin  _Z4egadii
# mark_begin;
       .align    16,0x90
        .globl _Z4egadii
_Z4egadii:
# parameter 1: 4 + %esp
# parameter 2: 8 + %esp
..B1.1:                         # Preds ..B1.0
        movl      4(%esp), %ecx                                 #20.5
        subl      8(%esp), %ecx                                 #21.21
        movl      $42, %eax                                     #21.10
        shrl      %cl, %eax                                     #21.10
        ret                                                     #21.10
        .align    16,0x90
                                # LOE
# mark_end;

... but at least it's inlined so there's no call overhead.

Bonus assembly: volatile is expensive. The source ...

int g() {
  int a = 5;
  volatile int b = 5;
  return shift(42,a-b);
}

... instead of a no-op, compiles to ...

..B3.1:                         # Preds ..B3.0
        pushl     %esi                                          #10.9
        movl      $5, (%esp)                                    #12.18
        movl      (%esp), %ecx                                  #13.21
        negl      %ecx                                          #13.21
        addl      $5, %ecx                                      #13.21
        movl      $42, %eax                                     #13.10
        shrl      %cl, %eax                                     #13.10
        popl      %ecx                                          #13.10
        ret                                                     #13.10
        .align    16,0x90
                                # LOE
# mark_end;

... so if you're working on a machine where values you push on the stack might not be the same when you pop them, well, this missed optimization is likely the least of your troubles.

回复收藏 0 原文

南街九尾狐 2024-08-01 04:09:30

它可以在任何广泛使用的架构上正常工作（我可以保证 x86、PPC、ARM）。除非该函数被内联，否则编译器将无法将其减少为 noop。

回复收藏 0 原文

爱人如己 2024-08-01 04:09:30

关于arg<<的正确性 0 或 arg>> 0，没问题，绝对没问题。

关于最终的优化：
这不会被简化为>nop<。当使用常量 What=0 和/或 bitcount=0 调用时，除非您将其声明为内联并选择优化（并且您选择的编译器了解什么是内联）。

因此，最重要的是，只有当参数的 OR 不为零时，才通过有条件地调用函数来优化此代码（这大约是我认为测试两个参数都不为零的最快方法）。

回复收藏 0 原文

情定在深秋 2024-08-01 04:09:30

如果编译器在编译时知道 bitCount 值为零，则它只能执行此优化。这意味着传递的参数必须是常量：

const int N = 0;
int x = shift( 123, N );

C++ 当然允许执行此类优化，但我不知道有任何编译器这样做。编译器可以采取的替代方法：

int x = n == 0 ? 123 : shift( 123, n );

在大多数情况下都是悲观的，我无法想象编译器编写者会实现这样的事情。

编辑： AA 零位移位保证对被移位的内容没有影响。

The compiler could only perform this optimisation do that if it knew at compile time that the bitCount value was zero. That would mean that the passed parameter would have to be a constant:

const int N = 0;
int x = shift( 123, N );

C++ certainly allows such an optimisation to be performed, but I'm not aware of any compilers that do so. The alternative approach the compiler could take: