C++ 中循环移位(旋转)操作的最佳实践

发布于 2024-07-17 20:43:23 字数 366 浏览 11 评论 0原文

C++ 中已提供左移和右移运算符(<< 和 >>)。 但是,我不知道如何执行循环移位或旋转操作。

如何进行“向左旋转”、“向右旋转”等操作?

在这里向右旋转两次

Initial --> 1000 0011 0100 0010

应该会导致:

Final   --> 1010 0000 1101 0000

一个例子会很有帮助。

(编者注:如果旋转计数为零,或者编译为不仅仅是单个旋转机器指令,则在 C 中表达旋转的许多常见方法都会出现未定义的行为。这个问题的答案应该记录最佳实践。)

Left and right shift operators (<< and >>) are already available in C++.
However, I couldn't find out how I could perform circular shift or rotate operations.

How can operations like "Rotate Left" and "Rotate Right" be performed?

Rotating right twice here

Initial --> 1000 0011 0100 0010

should result in:

Final   --> 1010 0000 1101 0000

An example would be helpful.

(editor's note: Many common ways of expressing rotates in C suffer from undefined behaviour if the rotate count is zero, or compile to more than just a single rotate machine instruction. This question's answer should document best practices.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(15

笑看君怀她人 2024-07-24 20:43:23

另请参阅这个关于另一个轮换问题的答案的早期版本,其中包含有关 asm gcc/clang 为 x86 生成的内容的更多详细信息。

在 C 和 C++ 中表达轮换以避免任何未定义行为的对编译器最友好的方式似乎是 John Regehr 的实施。 我已将其调整为按类型的宽度进行旋转(使用像 uint32_t 这样的固定宽度类型)。

#include <stdint.h>   // for uint32_t
#include <limits.h>   // for CHAR_BIT
// #define NDEBUG
#include <assert.h>

static inline uint32_t rotl32 (uint32_t n, unsigned int c)
{
  const unsigned int mask = (CHAR_BIT*sizeof(n) - 1);  // assumes width is a power of 2.

  // assert ( (c<=mask) &&"rotate by type width or more");
  c &= mask;
  return (n<<c) | (n>>( (-c)&mask ));
}

static inline uint32_t rotr32 (uint32_t n, unsigned int c)
{
  const unsigned int mask = (CHAR_BIT*sizeof(n) - 1);

  // assert ( (c<=mask) &&"rotate by type width or more");
  c &= mask;
  return (n>>c) | (n<<( (-c)&mask ));
}

适用于任何无符号整数类型,而不仅仅是 uint32_t,因此您可以制作其他大小的版本。

请参阅还有一个 C++11 模板版本,其中包含大量安全检查(包括static_assert 类型宽度是 2) 的幂,例如,在某些 24 位 DSP 或 36 位大型机上并非如此。

我建议仅使用模板作为名称明确包含旋转宽度的包装器的后端。 整数提升规则意味着 rotl_template(u16 & 0x11UL, 7) 将执行 32 或 64 位旋转,而不是 16(取决于 无符号长)。 甚至 uint16_t & uint16_t 根据 C++ 的整数提升规则提升为 signed int,除非在 int 不宽于 uint16_t 的平台上。


On x86, this version 内联到单个 rol r32, cl (或 rol r32, imm8),编译器能够理解它,因为编译器知道 x86 旋转和移位指令 屏蔽了移位计数与 C 源代码的做法相同。

编译器在 x86 上支持这种避免 UB 的习惯用法,对于用于变量计数移位的 uint32_t xunsigned int n

  • clang:从 clang3.5 开始识别变量计数旋转,在此之前多次轮班+或insn。
  • gcc:自 gcc4.9 起可识别可变计数循环,多个之前的转变+或insns。 gcc5 及更高版本也优化了维基百科版本中的分支和掩码,仅使用 rorrol 指令进行变量计数。
  • icc:支持可变计数轮换自 ICC13 或更早以来。 常量计数循环使用 shld edi,edi,7 ,在某些 CPU(尤其是 AMD,但也有一些 Intel)上,它比 rol edi,7 更慢并且占用更多字节,当 BMI2 无法用于 rorx eax,edi,25 保存 MOV 时。
  • MSVC:x86-64 CL19:仅识别恒定计数循环。 (维基百科习语已被识别,但分支和 AND 并未被优化掉)。 在 x86(包括 x86-64)上使用 中的 _rotl / _rotr 内在函数。

ARM 的 gcc 使用和 r1, r1, #31 进行可变计数循环,但仍然使用单个指令进行实际循环ror r0, r0 ,r1。 所以 gcc 没有意识到旋转计数本质上是模块化的。 正如 ARM 文档所说, "移位长度 n 大于 32 的 ROR 与移位长度 n-32" 的 ROR 相同。 我认为 gcc 在这里会感到困惑,因为 ARM 上的左/右移位使计数饱和,因此移位 32 或更多将清除寄存器。 (与 x86 不同,x86 中的移位屏蔽计数与旋转相同)。 由于非循环移位在该目标上的工作方式,它可能决定在识别旋转习惯用法之前需要 AND 指令。

当前的 x86 编译器仍然使用额外的指令来屏蔽 8 位和 16 位循环的变量计数,可能与它们不避免 ARM 上的 AND 的原因相同。 这是一个错过的优化,因为性能不依赖于任何 x86-64 CPU 上的循环计数。 (出于性能原因,在 286 中引入了计数掩码,因为它迭代地处理移位,而不是像现代 CPU 那样具有恒定延迟。)

顺便说一句,对于可变计数旋转,更喜欢向右旋转,以避免使编译器执行 32- n 在仅提供右旋转的 ARM 和 MIPS 等架构上实现左旋转。 (这可以优化编译时间常数计数。)

有趣的事实:ARM 并没有真正专用的移位/旋转指令,它只是带有 源操作数在 ROR 模式下通过桶形移位器mov r0, r0, ror r1。 因此,循环可以折叠到 EOR 指令或其他指令的寄存器源操作数中。


确保 n 和返回值使用无符号类型,否则它不会是旋转。 (x86 目标的 gcc 进行算术右移,移入符号位的副本而不是零,当您将两个移位的值进行“或”运算时,会导致出现问题。负符号整数的右移是C 中实现定义的行为。)

此外,确保移位计数是无符号类型,因为带符号类型的 (-n)&31 可能是补码或符号/数值,与使用无符号或二进制补码得到的模 2^n 不同。 (请参阅雷格尔博客文章的评论)。 unsigned int 在我见过的每个编译器上,对于 x 的每个宽度都表现良好。 某些其他类型实际上无法识别某些编译器的惯用语,因此不要仅使用与 x 相同的类型。


某些编译器提供旋转的内在函数,如果可移植版本不能在您的目标编译器上生成良好的代码,则这比内联汇编要好得多。 据我所知,任何编译器都没有跨平台的内在函数。 以下是一些 x86 选项:

  • 英特尔文档 提供 _rotl_rotl64 内在函数,右移相同。 MSVC 需要 ,而 gcc 需要 #ifdef 负责处理 gcc 与 icc。 Clang 9.0 也有它,但在此之前它似乎没有在任何地方提供它们,除非在带有 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 的 MSVC 兼容模式下。 它为它们发出的 asm 很糟糕(额外的掩蔽和 CMOV)。
  • MSVC:_rotr8_rotr16
  • gcc 和 icc(不是 clang): 还提供 __rolb/__rorb 用于 8 位左/右旋转,<代码>__rolw/__rorw(16位),__rold/__rord(32位),__rolq< /code>/__rorq (64 位,仅为 64 位目标定义)。 对于窄循环,实现使用 __builtin_ia32_rolhi...qi,但 32 和 64 位循环是使用 shift/or 定义的(没有针对 UB 的保护,因为ia32intrin 中的代码。 h 只需要在 x86 的 gcc 上工作)。 GNU C 似乎没有像 __builtin_popcount 那样具有任何跨平台的 __builtin_rotate 函数(即使它不是单个指令,它也会扩展到目标平台上的最佳值) )。 大多数时候,您可以通过习语识别获得好的代码。
  • Clang 有 __builtin_rotateleft8__builtin_rotateright8,其他宽度也类似。 请参阅 Clang 的语言扩展文档。 还有其他位操作的内置函数/内在函数,例如 __builtin_bitreverse32 (可以在 ARM / AArch64 上编译为 rbit)。 您可以使用 __has_builtin 来测试它们。
// For real use, probably use a rotate intrinsic for MSVC, or this idiom for other compilers.  This pattern of #ifdefs may be helpful
#if defined(__x86_64__) || defined(__i386__)

#ifdef _MSC_VER
#include <intrin.h>
#else
#include <x86intrin.h>  // Not just <immintrin.h> for compilers other than icc
#endif

uint32_t rotl32_x86_intrinsic(rotwidth_t x, unsigned n) {
  //return __builtin_ia32_rorhi(x, 7);  // 16-bit rotate, GNU C
  return _rotl(x, n);  // gcc, icc, msvc.  Intel-defined.
  //return __rold(x, n);  // gcc, icc.
  // can't find anything for clang
}
#endif

想必某些非 x86 编译器也具有内在函数,但我们不要扩展此社区 wiki 答案以包含所有这些编译器。 (也许可以在有关内在函数的现有答案中做到这一点)。


(此答案的旧版本建议使用 MSVC 特定的内联汇编(仅适用于 32 位 x86 代码),或 http://www.devx.com/tips/Tip/14043 用于 C 版本。评论正在对此进行回复。)

内联汇编击败了许多优化特别是MSVC风格,因为它强制存储/重新加载输入。 精心编写的 GNU C 内联汇编循环将允许计数成为编译时常量移位计数的立即操作数,但如果要移位的值也是编译时常量,它仍然无法完全优化掉内联后。 https://gcc.gnu.org/wiki/DontUseInlineAsm

See also an earlier version of this answer on another rotate question with some more details about what asm gcc/clang produce for x86.

The most compiler-friendly way to express a rotate in C and C++ that avoids any Undefined Behaviour seems to be John Regehr's implementation. I've adapted it to rotate by the width of the type (using fixed-width types like uint32_t).

#include <stdint.h>   // for uint32_t
#include <limits.h>   // for CHAR_BIT
// #define NDEBUG
#include <assert.h>

static inline uint32_t rotl32 (uint32_t n, unsigned int c)
{
  const unsigned int mask = (CHAR_BIT*sizeof(n) - 1);  // assumes width is a power of 2.

  // assert ( (c<=mask) &&"rotate by type width or more");
  c &= mask;
  return (n<<c) | (n>>( (-c)&mask ));
}

static inline uint32_t rotr32 (uint32_t n, unsigned int c)
{
  const unsigned int mask = (CHAR_BIT*sizeof(n) - 1);

  // assert ( (c<=mask) &&"rotate by type width or more");
  c &= mask;
  return (n>>c) | (n<<( (-c)&mask ));
}

Works for any unsigned integer type, not just uint32_t, so you could make versions for other sizes.

See also a C++11 template version with lots of safety checks (including a static_assert that the type width is a power of 2), which isn't the case on some 24-bit DSPs or 36-bit mainframes, for example.

I'd recommend only using the template as a back-end for wrappers with names that include the rotate width explicitly. Integer-promotion rules mean that rotl_template(u16 & 0x11UL, 7) would do a 32 or 64-bit rotate, not 16 (depending on the width of unsigned long). Even uint16_t & uint16_t is promoted to signed int by C++'s integer-promotion rules, except on platforms where int is no wider than uint16_t.


On x86, this version inlines to a single rol r32, cl (or rol r32, imm8) with compilers that grok it, because the compiler knows that x86 rotate and shift instructions mask the shift-count the same way the C source does.

Compiler support for this UB-avoiding idiom on x86, for uint32_t x and unsigned int n for variable-count shifts:

  • clang: recognized for variable-count rotates since clang3.5, multiple shifts+or insns before that.
  • gcc: recognized for variable-count rotates since gcc4.9, multiple shifts+or insns before that. gcc5 and later optimize away the branch and mask in the wikipedia version, too, using just a ror or rol instruction for variable counts.
  • icc: supported for variable-count rotates since ICC13 or earlier. Constant-count rotates use shld edi,edi,7 which is slower and takes more bytes than rol edi,7 on some CPUs (especially AMD, but also some Intel), when BMI2 isn't available for rorx eax,edi,25 to save a MOV.
  • MSVC: x86-64 CL19: Only recognized for constant-count rotates. (The wikipedia idiom is recognized, but the branch and AND aren't optimized away). Use the _rotl / _rotr intrinsics from <intrin.h> on x86 (including x86-64).

gcc for ARM uses an and r1, r1, #31 for variable-count rotates, but still does the actual rotate with a single instruction: ror r0, r0, r1. So gcc doesn't realize that rotate-counts are inherently modular. As the ARM docs say, "ROR with shift length, n, more than 32 is the same as ROR with shift length n-32". I think gcc gets confused here because left/right shifts on ARM saturate the count, so a shift by 32 or more will clear the register. (Unlike x86, where shifts mask the count the same as rotates). It probably decides it needs an AND instruction before recognizing the rotate idiom, because of how non-circular shifts work on that target.

Current x86 compilers still use an extra instruction to mask a variable count for 8 and 16-bit rotates, probably for the same reason they don't avoid the AND on ARM. This is a missed optimization, because performance doesn't depend on the rotate count on any x86-64 CPU. (Masking of counts was introduced with 286 for performance reasons because it handled shifts iteratively, not with constant-latency like modern CPUs.)

BTW, prefer rotate-right for variable-count rotates, to avoid making the compiler do 32-n to implement a left rotate on architectures like ARM and MIPS that only provide a rotate-right. (This optimizes away with compile-time-constant counts.)

Fun fact: ARM doesn't really have dedicated shift/rotate instructions, it's just MOV with the source operand going through the barrel-shifter in ROR mode: mov r0, r0, ror r1. So a rotate can fold into a register-source operand for an EOR instruction or something.


Make sure you use unsigned types for n and the return value, or else it won't be a rotate. (gcc for x86 targets does arithmetic right shifts, shifting in copies of the sign-bit rather than zeroes, leading to a problem when you OR the two shifted values together. Right-shifts of negative signed integers is implementation-defined behaviour in C.)

Also, make sure the shift count is an unsigned type, because (-n)&31 with a signed type could be one's complement or sign/magnitude, and not the same as the modular 2^n you get with unsigned or two's complement. (See comments on Regehr's blog post). unsigned int does well on every compiler I've looked at, for every width of x. Some other types actually defeat the idiom-recognition for some compilers, so don't just use the same type as x.


Some compilers provide intrinsics for rotates, which is far better than inline-asm if the portable version doesn't generate good code on the compiler you're targeting. There aren't cross-platform intrinsics for any compilers that I know of. These are some of the x86 options:

  • Intel documents that <immintrin.h> provides _rotl and _rotl64 intrinsics, and same for right shift. MSVC requires <intrin.h>, while gcc require <x86intrin.h>. An #ifdef takes care of gcc vs. icc. Clang 9.0 also has it, but before that it doesn't seem to provide them anywhere, except in MSVC compatibility mode with -fms-extensions -fms-compatibility -fms-compatibility-version=17.00. And the asm it emits for them sucks (extra masking and a CMOV).
  • MSVC: _rotr8 and _rotr16.
  • gcc and icc (not clang): <x86intrin.h> also provides __rolb/__rorb for 8-bit rotate left/right, __rolw/__rorw (16-bit), __rold/__rord (32-bit), __rolq/__rorq (64-bit, only defined for 64-bit targets). For narrow rotates, the implementation uses __builtin_ia32_rolhi or ...qi, but the 32 and 64-bit rotates are defined using shift/or (with no protection against UB, because the code in ia32intrin.h only has to work on gcc for x86). GNU C appears not to have any cross-platform __builtin_rotate functions the way it does for __builtin_popcount (which expands to whatever's optimal on the target platform, even if it's not a single instruction). Most of the time you get good code from idiom-recognition.
  • Clang has __builtin_rotateleft8, __builtin_rotateright8, and similar for other widths. See Clang's language extensions documentation. There are also builtins / intrinsics for other bit-manipulation things like __builtin_bitreverse32 (which can compile to rbit on ARM / AArch64). You can test for them with __has_builtin.
// For real use, probably use a rotate intrinsic for MSVC, or this idiom for other compilers.  This pattern of #ifdefs may be helpful
#if defined(__x86_64__) || defined(__i386__)

#ifdef _MSC_VER
#include <intrin.h>
#else
#include <x86intrin.h>  // Not just <immintrin.h> for compilers other than icc
#endif

uint32_t rotl32_x86_intrinsic(rotwidth_t x, unsigned n) {
  //return __builtin_ia32_rorhi(x, 7);  // 16-bit rotate, GNU C
  return _rotl(x, n);  // gcc, icc, msvc.  Intel-defined.
  //return __rold(x, n);  // gcc, icc.
  // can't find anything for clang
}
#endif

Presumably some non-x86 compilers have intrinsics, too, but let's not expand this community-wiki answer to include them all. (Maybe do that in the existing answer about intrinsics).


(The old version of this answer suggested MSVC-specific inline asm (which only works for 32bit x86 code), or http://www.devx.com/tips/Tip/14043 for a C version. The comments are replying to that.)

Inline asm defeats many optimizations, especially MSVC-style because it forces inputs to be stored/reloaded. A carefully-written GNU C inline-asm rotate would allow the count to be an immediate operand for compile-time-constant shift counts, but it still couldn't optimize away entirely if the value to be shifted is also a compile-time constant after inlining. https://gcc.gnu.org/wiki/DontUseInlineAsm.

时光倒影 2024-07-24 20:43:23

C++20 std::rotlstd::rotr

已经到来! http://www.open-std.org /jtc1/sc22/wg21/docs/papers/2019/p0553r4.html 并应将其添加到 标头中。

cppreference 说 用法如下:

#include <bit>
#include <bitset>
#include <cstdint>
#include <iostream>

int main()
{
    std::uint8_t i = 0b00011101;
    std::cout << "i          = " << std::bitset<8>(i) << '\n';
    std::cout << "rotl(i,0)  = " << std::bitset<8>(std::rotl(i,0)) << '\n';
    std::cout << "rotl(i,1)  = " << std::bitset<8>(std::rotl(i,1)) << '\n';
    std::cout << "rotl(i,4)  = " << std::bitset<8>(std::rotl(i,4)) << '\n';
    std::cout << "rotl(i,9)  = " << std::bitset<8>(std::rotl(i,9)) << '\n';
    std::cout << "rotl(i,-1) = " << std::bitset<8>(std::rotl(i,-1)) << '\n';
}

给出输出:

i          = 00011101
rotl(i,0)  = 00011101
rotl(i,1)  = 00111010
rotl(i,4)  = 11010001
rotl(i,9)  = 00111010
rotl(i,-1) = 10001110

我会给它尝试一下,当 GCC 支持到达时,带有 g++-9 -std=c++2a 的 GCC 9.1.0 仍然不支持它。

该提案称:

标题:

命名空间 std { 
    // 25.5.5,旋转    
    模板<类T> 
      [[nodiscard]] constexpr T rotl(T x, int s) noexcept; 
    模板<类T> 
      [[nodiscard]] constexpr T rotr(T x, int s) noexcept; 
  

和:

25.5.5 旋转 [bitops.rot]

在以下描述中,让 N 表示 std::numeric_limits::digits

模板 
    [[nodiscard]] constexpr T rotl(T x, int s) noexcept; 
  

约束:T 是无符号整数类型(3.9.1 [basic.fundamental])。

设 r 为 s % N。

返回:若r为0,则x; 如果 r 为正,(x << r) | (x>>(N-r)); 如果 r 为负数,则 rotr(x, -r)

模板 
    [[nodiscard]] constexpr T rotr(T x, int s) noexcept; 
  

约束:T 是无符号整数类型(3.9.1 [basic.fundamental])。
设 r 为 s % N。

返回:若r为0,则x; 如果 r 为正,(x >> r) | (x << (N - r)); 如果 r 为负数,则 rotl(x, -r)

还添加了 std::popcount 来计算 1 位的数量:如何计算 32 位整数中设置的位数?

C++20 std::rotl and std::rotr

It has arrived! http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0553r4.html and should add it to the <bit> header.

cppreference says that the usage will be like:

#include <bit>
#include <bitset>
#include <cstdint>
#include <iostream>

int main()
{
    std::uint8_t i = 0b00011101;
    std::cout << "i          = " << std::bitset<8>(i) << '\n';
    std::cout << "rotl(i,0)  = " << std::bitset<8>(std::rotl(i,0)) << '\n';
    std::cout << "rotl(i,1)  = " << std::bitset<8>(std::rotl(i,1)) << '\n';
    std::cout << "rotl(i,4)  = " << std::bitset<8>(std::rotl(i,4)) << '\n';
    std::cout << "rotl(i,9)  = " << std::bitset<8>(std::rotl(i,9)) << '\n';
    std::cout << "rotl(i,-1) = " << std::bitset<8>(std::rotl(i,-1)) << '\n';
}

giving output:

i          = 00011101
rotl(i,0)  = 00011101
rotl(i,1)  = 00111010
rotl(i,4)  = 11010001
rotl(i,9)  = 00111010
rotl(i,-1) = 10001110

I'll give it a try when support arrives to GCC, GCC 9.1.0 with g++-9 -std=c++2a still doesn't support it.

The proposal says:

Header:

namespace std {
  // 25.5.5, rotating   
  template<class T>
    [[nodiscard]] constexpr T rotl(T x, int s) noexcept;
  template<class T>
    [[nodiscard]] constexpr T rotr(T x, int s) noexcept;

and:

25.5.5 Rotating [bitops.rot]

In the following descriptions, let N denote std::numeric_limits<T>::digits.

template<class T>
  [[nodiscard]] constexpr T rotl(T x, int s) noexcept;

Constraints: T is an unsigned integer type (3.9.1 [basic.fundamental]).

Let r be s % N.

Returns: If r is 0, x; if r is positive, (x << r) | (x >> (N - r)); if r is negative, rotr(x, -r).

template<class T>
  [[nodiscard]] constexpr T rotr(T x, int s) noexcept;

Constraints: T is an unsigned integer type (3.9.1 [basic.fundamental]).
Let r be s % N.

Returns: If r is 0, x; if r is positive, (x >> r) | (x << (N - r)); if r is negative, rotl(x, -r).

A std::popcount was also added to count the number of 1 bits: How to count the number of set bits in a 32-bit integer?

南巷近海 2024-07-24 20:43:23

由于它是 C++,因此使用内联函数:

template <typename INT> 
INT rol(INT val) {
    return (val << 1) | (val >> (sizeof(INT)*CHAR_BIT-1));
}

C++11 变体:

template <typename INT> 
constexpr INT rol(INT val) {
    static_assert(std::is_unsigned<INT>::value,
                  "Rotate Left only makes sense for unsigned types");
    return (val << 1) | (val >> (sizeof(INT)*CHAR_BIT-1));
}

Since it's C++, use an inline function:

template <typename INT> 
INT rol(INT val) {
    return (val << 1) | (val >> (sizeof(INT)*CHAR_BIT-1));
}

C++11 variant:

template <typename INT> 
constexpr INT rol(INT val) {
    static_assert(std::is_unsigned<INT>::value,
                  "Rotate Left only makes sense for unsigned types");
    return (val << 1) | (val >> (sizeof(INT)*CHAR_BIT-1));
}
注定孤独终老 2024-07-24 20:43:23

大多数编译器都有相应的内在函数。
Visual Studio 例如 _rotr8、_rotr16

Most compilers have intrinsics for that.
Visual Studio for example _rotr8, _rotr16

花开柳相依 2024-07-24 20:43:23

明确地说:

template<class T>
T ror(T x, unsigned int moves)
{
  return (x >> moves) | (x << sizeof(T)*8 - moves);
}

Definitively:

template<class T>
T ror(T x, unsigned int moves)
{
  return (x >> moves) | (x << sizeof(T)*8 - moves);
}
獨角戲 2024-07-24 20:43:23

如果 x 是 8 位值,则可以使用:

x=(x>>1 | x<<7);

If x is an 8 bit value, you can use this:

x=(x>>1 | x<<7);
瀟灑尐姊 2024-07-24 20:43:23

使用标准位集怎么可能是这样的……

#include <bitset> 
#include <iostream> 

template <std::size_t N> 
inline void 
rotate(std::bitset<N>& b, unsigned m) 
{ 
   b = b << m | b >> (N-m); 
} 

int main() 
{ 
   std::bitset<8> b(15); 
   std::cout << b << '\n'; 
   rotate(b, 2); 
   std::cout << b << '\n'; 

   return 0;
}

HTH,

How abt something like this, using the standard bitset ...

#include <bitset> 
#include <iostream> 

template <std::size_t N> 
inline void 
rotate(std::bitset<N>& b, unsigned m) 
{ 
   b = b << m | b >> (N-m); 
} 

int main() 
{ 
   std::bitset<8> b(15); 
   std::cout << b << '\n'; 
   rotate(b, 2); 
   std::cout << b << '\n'; 

   return 0;
}

HTH,

一页 2024-07-24 20:43:23

详细来说,您可以应用以下逻辑。

如果位模式为整数 33602

1000 0011 0100 0010

,并且您需要使用 2 个右移进行翻转,则:
首先复制位模式,然后左移它:长度 - 右移
即长度为16右移值为2
16 - 2 = 14

左移 14 次后得到。

1000 0000 0000 0000

现在根据需要右移值 33602 2 次。
现在,您可以

0010 0000 1101 0000

在 14 次左移值和 2 次右移值之间进行“或”运算。

1000 0000 0000 0000
0010 0000 1101 0000
===================
1010 0000 1101 0000
===================

然后您就得到了转移的展期价值。 请记住,按位运算速度更快,甚至不需要任何循环。

In details you can apply the following logic.

If Bit Pattern is 33602 in Integer

1000 0011 0100 0010

and you need to Roll over with 2 right shifs then:
first make a copy of bit pattern and then left shift it: Length - RightShift
i.e. length is 16 right shift value is 2
16 - 2 = 14

After 14 times left shifting you get.

1000 0000 0000 0000

Now right shift the value 33602, 2 times as required.
You get

0010 0000 1101 0000

Now take an OR between 14 time left shifted value and 2 times right shifted value.

1000 0000 0000 0000
0010 0000 1101 0000
===================
1010 0000 1101 0000
===================

And you get your shifted rollover value. Remember bit wise operations are faster and this don't even required any loop.

花想c 2024-07-24 20:43:23

假设您要右移 L 位,并且输入 x 是一个 N 位的数字:

unsigned ror(unsigned x, int L, int N) 
{
    unsigned lsbs = x & ((1 << L) - 1);
    return (x >> L) | (lsbs << (N-L));
}

Assuming you want to shift right by L bits, and the input x is a number with N bits:

unsigned ror(unsigned x, int L, int N) 
{
    unsigned lsbs = x & ((1 << L) - 1);
    return (x >> L) | (lsbs << (N-L));
}
烟─花易冷 2024-07-24 20:43:23

正确答案如下:

#define BitsCount( val ) ( sizeof( val ) * CHAR_BIT )
#define Shift( val, steps ) ( steps % BitsCount( val ) )
#define ROL( val, steps ) ( ( val << Shift( val, steps ) ) | ( val >> ( BitsCount( val ) - Shift( val, steps ) ) ) )
#define ROR( val, steps ) ( ( val >> Shift( val, steps ) ) | ( val << ( BitsCount( val ) - Shift( val, steps ) ) ) )

The correct answer is following:

#define BitsCount( val ) ( sizeof( val ) * CHAR_BIT )
#define Shift( val, steps ) ( steps % BitsCount( val ) )
#define ROL( val, steps ) ( ( val << Shift( val, steps ) ) | ( val >> ( BitsCount( val ) - Shift( val, steps ) ) ) )
#define ROR( val, steps ) ( ( val >> Shift( val, steps ) ) | ( val << ( BitsCount( val ) - Shift( val, steps ) ) ) )
心如荒岛 2024-07-24 20:43:23

源代码
x 位数

int x =8;
data =15; //input
unsigned char tmp;
for(int i =0;i<x;i++)
{
printf("Data & 1    %d\n",data&1);
printf("Data Shifted value %d\n",data>>1^(data&1)<<(x-1));
tmp = data>>1|(data&1)<<(x-1);
data = tmp;  
}

Source Code
x bit number

int x =8;
data =15; //input
unsigned char tmp;
for(int i =0;i<x;i++)
{
printf("Data & 1    %d\n",data&1);
printf("Data Shifted value %d\n",data>>1^(data&1)<<(x-1));
tmp = data>>1|(data&1)<<(x-1);
data = tmp;  
}
深居我梦 2024-07-24 20:43:23

另一个建议

template<class T>
inline T rotl(T x, unsigned char moves){
    unsigned char temp;
    __asm{
        mov temp, CL
        mov CL, moves
        rol x, CL
        mov CL, temp
    };
    return x;
}

another suggestion

template<class T>
inline T rotl(T x, unsigned char moves){
    unsigned char temp;
    __asm{
        mov temp, CL
        mov CL, moves
        rol x, CL
        mov CL, temp
    };
    return x;
}
泅人 2024-07-24 20:43:23

下面是 Dídac Pérez 的答案的稍微改进版本,有两个方向实现,以及使用 unsigned char 和 unsigned long long 值演示这些函数的用法。 几个注意事项:

  1. 这些函数是为了编译器优化而内联的,
  2. 我使用了 cout << 我在这里找到了以数字方式简洁输出无符号字符的 +value 技巧: https://stackoverflow.com/a/28414758 /1599699
  3. 为了清晰和安全,我建议使用显式的 语法。
  4. 我使用 unsigned char 作为 shiftNum 参数,因为我在“其他详细信息”部分中找到了 这里

如果加法表达式为,则移位运算的结果未定义
负数或者如果additive-expression大于或等于
(升级的)shift-表达式中的位数。

这是我正在使用的代码:

#include <iostream>

using namespace std;

template <typename T>
inline T rotateAndCarryLeft(T rotateMe, unsigned char shiftNum)
{
    static const unsigned char TBitCount = sizeof(T) * 8U;

    return (rotateMe << shiftNum) | (rotateMe >> (TBitCount - shiftNum));
}

template <typename T>
inline T rotateAndCarryRight(T rotateMe, unsigned char shiftNum)
{
    static const unsigned char TBitCount = sizeof(T) * 8U;

    return (rotateMe >> shiftNum) | (rotateMe << (TBitCount - shiftNum));
}

void main()
{
    //00010100 == (unsigned char)20U
    //00000101 == (unsigned char)5U == rotateAndCarryLeft(20U, 6U)
    //01010000 == (unsigned char)80U == rotateAndCarryRight(20U, 6U)

    cout << "unsigned char " << 20U << " rotated left by 6 bits == " << +rotateAndCarryLeft<unsigned char>(20U, 6U) << "\n";
    cout << "unsigned char " << 20U << " rotated right by 6 bits == " << +rotateAndCarryRight<unsigned char>(20U, 6U) << "\n";

    cout << "\n";


    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned char) * 8U; ++shiftNum)
    {
        cout << "unsigned char " << 21U << " rotated left by " << +shiftNum << " bit(s) == " << +rotateAndCarryLeft<unsigned char>(21U, shiftNum) << "\n";
    }

    cout << "\n";

    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned char) * 8U; ++shiftNum)
    {
        cout << "unsigned char " << 21U << " rotated right by " << +shiftNum << " bit(s) == " << +rotateAndCarryRight<unsigned char>(21U, shiftNum) << "\n";
    }


    cout << "\n";

    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned long long) * 8U; ++shiftNum)
    {
        cout << "unsigned long long " << 3457347ULL << " rotated left by " << +shiftNum << " bit(s) == " << rotateAndCarryLeft<unsigned long long>(3457347ULL, shiftNum) << "\n";
    }

    cout << "\n";

    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned long long) * 8U; ++shiftNum)
    {
        cout << "unsigned long long " << 3457347ULL << " rotated right by " << +shiftNum << " bit(s) == " << rotateAndCarryRight<unsigned long long>(3457347ULL, shiftNum) << "\n";
    }

    cout << "\n\n";
    system("pause");
}

Below is a slightly improved version of Dídac Pérez's answer, with both directions implemented, along with a demo of these functions' usages using unsigned char and unsigned long long values. Several notes:

  1. The functions are inlined for compiler optimizations
  2. I used a cout << +value trick for tersely outputting an unsigned char numerically that I found here: https://stackoverflow.com/a/28414758/1599699
  3. I recommend using the explicit <put the type here> syntax for clarity and safety.
  4. I used unsigned char for the shiftNum parameter because of what I found in the Additional Details section here:

The result of a shift operation is undefined if additive-expression is
negative or if additive-expression is greater than or equal to the
number of bits in the (promoted) shift-expression.

Here's the code I'm using:

#include <iostream>

using namespace std;

template <typename T>
inline T rotateAndCarryLeft(T rotateMe, unsigned char shiftNum)
{
    static const unsigned char TBitCount = sizeof(T) * 8U;

    return (rotateMe << shiftNum) | (rotateMe >> (TBitCount - shiftNum));
}

template <typename T>
inline T rotateAndCarryRight(T rotateMe, unsigned char shiftNum)
{
    static const unsigned char TBitCount = sizeof(T) * 8U;

    return (rotateMe >> shiftNum) | (rotateMe << (TBitCount - shiftNum));
}

void main()
{
    //00010100 == (unsigned char)20U
    //00000101 == (unsigned char)5U == rotateAndCarryLeft(20U, 6U)
    //01010000 == (unsigned char)80U == rotateAndCarryRight(20U, 6U)

    cout << "unsigned char " << 20U << " rotated left by 6 bits == " << +rotateAndCarryLeft<unsigned char>(20U, 6U) << "\n";
    cout << "unsigned char " << 20U << " rotated right by 6 bits == " << +rotateAndCarryRight<unsigned char>(20U, 6U) << "\n";

    cout << "\n";


    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned char) * 8U; ++shiftNum)
    {
        cout << "unsigned char " << 21U << " rotated left by " << +shiftNum << " bit(s) == " << +rotateAndCarryLeft<unsigned char>(21U, shiftNum) << "\n";
    }

    cout << "\n";

    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned char) * 8U; ++shiftNum)
    {
        cout << "unsigned char " << 21U << " rotated right by " << +shiftNum << " bit(s) == " << +rotateAndCarryRight<unsigned char>(21U, shiftNum) << "\n";
    }


    cout << "\n";

    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned long long) * 8U; ++shiftNum)
    {
        cout << "unsigned long long " << 3457347ULL << " rotated left by " << +shiftNum << " bit(s) == " << rotateAndCarryLeft<unsigned long long>(3457347ULL, shiftNum) << "\n";
    }

    cout << "\n";

    for (unsigned char shiftNum = 0U; shiftNum <= sizeof(unsigned long long) * 8U; ++shiftNum)
    {
        cout << "unsigned long long " << 3457347ULL << " rotated right by " << +shiftNum << " bit(s) == " << rotateAndCarryRight<unsigned long long>(3457347ULL, shiftNum) << "\n";
    }

    cout << "\n\n";
    system("pause");
}
追风人 2024-07-24 20:43:23

重载一个函数:

unsigned int rotate_right(unsigned int x)
{
 return (x>>1 | (x&1?0x80000000:0))
}

unsigned short rotate_right(unsigned short x) { /* etc. */ }

Overload a function:

unsigned int rotate_right(unsigned int x)
{
 return (x>>1 | (x&1?0x80000000:0))
}

unsigned short rotate_right(unsigned short x) { /* etc. */ }
渡你暖光 2024-07-24 20:43:23
--- Substituting RLC in 8051 C for speed --- Rotate left carry
Here is an example using RLC to update a serial 8 bit DAC msb first:
                               (r=DACVAL, P1.4= SDO, P1.5= SCLK)
MOV     A, r
?1:
MOV     B, #8
RLC     A
MOV     P1.4, C
CLR     P1.5
SETB    P1.5
DJNZ    B, ?1

Here is the code in 8051 C at its fastest:
sbit ACC_7  = ACC ^ 7 ; //define this at the top to access bit 7 of ACC
ACC     =   r;
B       =   8;  
do  {
P1_4    =   ACC_7;  // this assembles into mov c, acc.7  mov P1.4, c 
ACC     <<= 1;
P1_5    =   0;
P1_5    =   1;
B       --  ; 
    } while ( B!=0 );
The keil compiler will use DJNZ when a loop is written this way.
I am cheating here by using registers ACC and B in c code.
If you cannot cheat then substitute with:
P1_4    =   ( r & 128 ) ? 1 : 0 ;
r     <<=   1;
This only takes a few extra instructions.
Also, changing B for a local var char n is the same.
Keil does rotate ACC left by ADD A, ACC which is the same as multiply 2.
It only takes one extra opcode i think.
Keeping code entirely in C keeps things simpler sometimes.
--- Substituting RLC in 8051 C for speed --- Rotate left carry
Here is an example using RLC to update a serial 8 bit DAC msb first:
                               (r=DACVAL, P1.4= SDO, P1.5= SCLK)
MOV     A, r
?1:
MOV     B, #8
RLC     A
MOV     P1.4, C
CLR     P1.5
SETB    P1.5
DJNZ    B, ?1

Here is the code in 8051 C at its fastest:
sbit ACC_7  = ACC ^ 7 ; //define this at the top to access bit 7 of ACC
ACC     =   r;
B       =   8;  
do  {
P1_4    =   ACC_7;  // this assembles into mov c, acc.7  mov P1.4, c 
ACC     <<= 1;
P1_5    =   0;
P1_5    =   1;
B       --  ; 
    } while ( B!=0 );
The keil compiler will use DJNZ when a loop is written this way.
I am cheating here by using registers ACC and B in c code.
If you cannot cheat then substitute with:
P1_4    =   ( r & 128 ) ? 1 : 0 ;
r     <<=   1;
This only takes a few extra instructions.
Also, changing B for a local var char n is the same.
Keil does rotate ACC left by ADD A, ACC which is the same as multiply 2.
It only takes one extra opcode i think.
Keeping code entirely in C keeps things simpler sometimes.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文