数组上的就地位反转洗牌

发布于 2024-07-23 06:43:06 字数 455 浏览 7 评论 0原文

对于 FFT 函数，我需要以位反转的方式排列或混洗数组中的元素。这是 FFT 的一项常见任务，因为大多数两个大小的 FFT 函数的幂要么期望要么以位反转的方式返回数据。

例如，假设数组有 256 个元素，我想用它的位反转模式交换每个元素。这是两个示例（二进制）：

Element 00000001b should be swapped with element 10000000b
Element 00010111b should be swapped with element 11101000b

等等。

知道如何快速且更重要地做到这一点：就地吗？

我已经有一个可以进行此交换的函数。写一篇并不难。由于这是 DSP 中的常见操作，我感觉有比我非常幼稚的循环更聪明的方法来做到这一点。

所讨论的语言是 C，但任何语言都可以。

原文

For a FFT function I need to permutate or shuffle the elements within an array in a bit-reversed way. That's a common task with FFTs because most power of two sized FFT functions either expect or return their data in a bit-reversed way.

E.g. assume that the array has 256 elements I'd like to swap each element with it's bit-reversed pattern. Here are two examples (in binary):

Element 00000001b should be swapped with element 10000000b
Element 00010111b should be swapped with element 11101000b

and so on.

Any idea how to do this fast and more important: in-place?

I already have a function that does this swap. It's not hard to write one. Since this is such a common operation in DSP I have the feeling that there are more clever ways to do it than my very naiive loop.

Language in question is C, but any language is fine.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风吹过旳痕迹 2024-07-30 06:43:06

要使用单遍进行交换，请对递增索引中的所有元素进行一次迭代。仅当索引小于反向索引时才执行交换 - 这将跳过双重交换问题以及回文情况（元素 00000000b、10000001b、10100101b），这些情况与相同值相反并且不需要交换。

// Let data[256] be your element array 
for (i=0; i<256; i++)
    j = bit_reverse(i);
    if (i < j)
    {
        swap(data[i],data[j]);
    }

bit_reverse() 可以使用 Nathaneil 的位运算技巧。
bit_reverse() 将被调用 256 次，但 swap() 将被调用不到 128 次。

To swap in place with a single pass, iterate once through all elements in increasing index. Perform a swap only if the index is less-than the reversed index -- this will skip the double swap problem and also palindrome cases (elements 00000000b, 10000001b, 10100101b) which inverse to the same value and no swap is required.

// Let data[256] be your element array 
for (i=0; i<256; i++)
    j = bit_reverse(i);
    if (i < j)
    {
        swap(data[i],data[j]);
    }

The bit_reverse() can be using Nathaneil's bit-operations trick.
The bit_reverse() will be called 256 times but the swap() will be called less than 128 times.

回复收藏 0 原文

猫九 2024-07-30 06:43:06

实现此目的的一种快速方法是交换每个相邻的单个位，然后交换 2 位字段，等等。
快速执行此操作的方法是：

x = (x & 0x55) << 1 | (x & 0xAA) >> 1; //swaps bits
x = (x & 0x33) << 2 | (x & 0xCC) >> 2; //swapss 2-bit fields
x = (x & 0x0F) << 4 | (x & 0xF0) >> 4;

虽然难以阅读，但如果这是需要优化的内容，您可能需要这样做。

A quick way to do this is to swap every adjacent single bit, then 2-bit fields, etc.
The fast way to do this is:

x = (x & 0x55) << 1 | (x & 0xAA) >> 1; //swaps bits
x = (x & 0x33) << 2 | (x & 0xCC) >> 2; //swapss 2-bit fields
x = (x & 0x0F) << 4 | (x & 0xF0) >> 4;

While hard to read, if this is something that needs to be optimized you may want to do it this way.

回复收藏 0 原文

蹲在坟头点根烟 2024-07-30 06:43:06

此代码使用查找表来非常快速地反转 64 位数字。对于您的 C 语言示例，我还提供了 32 位、16 位和 8 位数字的版本（假设 int 是 32 位）。在面向对象的语言（C++、C# 等）中，我只需重载该函数即可。

我目前手边没有 C 编译器，所以希望我没有错过任何东西。

unsigned char ReverseBits[] = 
{
  0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0, 
  0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8, 
  0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4, 
  0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC, 
  0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2, 
  0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
  0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6, 
  0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
  0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
  0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9, 
  0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
  0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
  0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3, 
  0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
  0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7, 
  0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF
};


unsigned long Reverse64Bits(unsigned long number)
{    
    unsigned long result;

    result = 
        (ReverseBits[ number        & 0xff] << 56) |
        (ReverseBits[(number >>  8) & 0xff] << 48) | 
        (ReverseBits[(number >> 16) & 0xff] << 40) | 
        (ReverseBits[(number >> 24) & 0xff] << 32) | 
        (ReverseBits[(number >> 32) & 0xff] << 24) |
        (ReverseBits[(number >> 40) & 0xff] << 16) | 
        (ReverseBits[(number >> 48) & 0xff] <<  8) | 
        (ReverseBits[(number >> 56) & 0xff]);

    return result;
}

unsigned int Reverse32Bits(unsigned int number)
{
    unsigned int result;

    result = 
        (ReverseBits[ number        & 0xff] << 24) |
        (ReverseBits[(number >>  8) & 0xff] << 16) | 
        (ReverseBits[(number >> 16) & 0xff] <<  8) | 
        (ReverseBits[(number >> 24) & 0xff]);

    return result;
}

unsigned short Reverse16Bits(unsigned short number)
{
    unsigned short result;

    result = 
        (ReverseBits[ number       & 0xff] <<  8) | 
        (ReverseBits[(number >> 8) & 0xff]);

    return result;
}

unsigned char Reverse8Bits(unsigned char number)
{
    unsigned char result;

    result = (ReverseBits[number]);

    return result;
}

This code uses a lookup table to reverse 64-bit numbers very quickly. For your C-language example, I also included versions for 32-, 16-, and 8-bit numbers (assumes int is 32 bits). In an object-oriented language (C++, C#, etc), I would have just overloaded the function.

I don't have a C-compiler handy at the moment so, hopefully, I didn't miss anything.

unsigned char ReverseBits[] = 
{
  0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0, 
  0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8, 
  0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4, 
  0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC, 
  0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2, 
  0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
  0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6, 
  0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
  0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
  0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9, 
  0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
  0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
  0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3, 
  0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
  0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7, 
  0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF
};


unsigned long Reverse64Bits(unsigned long number)
{    
    unsigned long result;

    result = 
        (ReverseBits[ number        & 0xff] << 56) |
        (ReverseBits[(number >>  8) & 0xff] << 48) | 
        (ReverseBits[(number >> 16) & 0xff] << 40) | 
        (ReverseBits[(number >> 24) & 0xff] << 32) | 
        (ReverseBits[(number >> 32) & 0xff] << 24) |
        (ReverseBits[(number >> 40) & 0xff] << 16) | 
        (ReverseBits[(number >> 48) & 0xff] <<  8) | 
        (ReverseBits[(number >> 56) & 0xff]);

    return result;
}

unsigned int Reverse32Bits(unsigned int number)
{
    unsigned int result;

    result = 
        (ReverseBits[ number        & 0xff] << 24) |
        (ReverseBits[(number >>  8) & 0xff] << 16) | 
        (ReverseBits[(number >> 16) & 0xff] <<  8) | 
        (ReverseBits[(number >> 24) & 0xff]);

    return result;
}

unsigned short Reverse16Bits(unsigned short number)
{
    unsigned short result;

    result = 
        (ReverseBits[ number       & 0xff] <<  8) | 
        (ReverseBits[(number >> 8) & 0xff]);

    return result;
}

unsigned char Reverse8Bits(unsigned char number)
{
    unsigned char result;

    result = (ReverseBits[number]);

    return result;
}

回复收藏 0 原文

在梵高的星空下 2024-07-30 06:43:06

如果您考虑一下位交换索引发生的情况，就会发现它的计数方式与非位交换索引的计数方式相同，只是位的使用顺序与传统计数相反。

您可以手动实现“++”等效项，该“++”等效项使用错误顺序的位来执行双索引 for 循环，而不是每次通过循环都对索引进行位交换。我已经验证了 O3 的 gcc 内联了增量函数，但至于它是否比每次通过查找对数字进行位交换更快，那就由探查器说了算。

这是一个说明性的测试程序。

#include <stdio.h>

void RevBitIncr( int *n, int bit )
{
    do
    {
        bit >>= 1;
        *n ^= bit;
    } while( (*n & bit) == 0 && bit != 1 );
}

int main(void)
{
    int max = 0x100;
    int i, j;

    for( i = 0, j = 0; i != max; ++i, RevBitIncr( &j, max ) )
    {
        if( i < j )
            printf( "%02x <-> %02x\n", i, j );
    }

    return 0;
}

If you think about what's happening to the bitswapped index, it's being counted up in the same way that the non-bitswapped index is being counted up, just with the bits being used in the reverse order from conventional counting.

Rather than bitswapping the index every time through the loop you can manually implement a '++' equivalent that uses bits in the wrong order to do a double indexed for loop. I've verified that gcc at O3 inlines the increment function, but as to whether it's any faster then bitswapping the number via a lookup every time, that's for the profiler to say.

Here's an illustrative test program.

#include <stdio.h>

void RevBitIncr( int *n, int bit )
{
    do
    {
        bit >>= 1;
        *n ^= bit;
    } while( (*n & bit) == 0 && bit != 1 );
}

int main(void)
{
    int max = 0x100;
    int i, j;

    for( i = 0, j = 0; i != max; ++i, RevBitIncr( &j, max ) )
    {
        if( i < j )
            printf( "%02x <-> %02x\n", i, j );
    }

    return 0;
}

回复收藏 0 原文

凉城 2024-07-30 06:43:06

下面的方法从前一个索引计算下一个位反转索引，就像 Charles Bailey 的答案一样，但以更优化的方式。请注意，递增数字只是翻转最低有效位序列，例如从 0111 到 1000。因此，为了计算下一个位反转索引，您必须翻转最高有效位的序列。如果您的目标平台有 CTZ（“计数尾随零”）指令，则可以高效地完成此操作。

使用 GCC 的 __builtin_ctz 的示例：

void brswap(double *a, unsigned n) {
    for (unsigned i = 0, j = 0; i < n; i++) {
        if (i < j) {
            double tmp = a[i];
            a[i] = a[j];
            a[j] = tmp;
        }

        // Length of the mask.
        unsigned len = __builtin_ctz(i + 1) + 1;
        // XOR with mask.
        j ^= n - (n >> len);
    }
}

如果没有 CTZ 指令，您也可以使用整数除法：

void brswap(double *a, unsigned n) {
    for (unsigned i = 0, j = 0; i < n; i++) {
        if (i < j) {
            double tmp = a[i];
            a[i] = a[j];
            a[j] = tmp;
        }

        // Compute a mask of LSBs.
        unsigned mask = i ^ (i + 1);
        // Using division to bit-reverse a single bit.
        unsigned rev = n / (mask + 1);
        // XOR with mask.
        j ^= n - rev;
    }
}

The following approach computes the next bit-reversed index from the previous one like in Charles Bailey's answer, but in a more optimized way. Note that incrementing a number simply flips a sequence of least-significant bits, for example from 0111 to 1000. So in order to compute the next bit-reversed index, you have to flip a sequence of most-significant bits. If your target platform has a CTZ ("count trailing zeros") instruction, this can be done efficiently.

Example using GCC's __builtin_ctz:

void brswap(double *a, unsigned n) {
    for (unsigned i = 0, j = 0; i < n; i++) {
        if (i < j) {
            double tmp = a[i];
            a[i] = a[j];
            a[j] = tmp;
        }

        // Length of the mask.
        unsigned len = __builtin_ctz(i + 1) + 1;
        // XOR with mask.
        j ^= n - (n >> len);
    }
}

Without a CTZ instruction, you can also use integer division:

void brswap(double *a, unsigned n) {
    for (unsigned i = 0, j = 0; i < n; i++) {
        if (i < j) {
            double tmp = a[i];
            a[i] = a[j];
            a[j] = tmp;
        }

        // Compute a mask of LSBs.
        unsigned mask = i ^ (i + 1);
        // Using division to bit-reverse a single bit.
        unsigned rev = n / (mask + 1);
        // XOR with mask.
        j ^= n - rev;
    }
}

回复收藏 0 原文