这是最优化的方式吗？ C 位域

发布于 2024-10-13 21:22:35 字数 1903 浏览 12 评论 0原文

我创建了一个函数来设置或清除 DWORD 中的特定位数。我的功能有效。我不需要帮助来使其发挥作用。但是，我想知道我选择的方法是否是最快的方法。

我很难解释这是如何工作的。有两个包含 DWORD 的数组，这些数组在 DWORD 的左侧和右侧填充了位（全部为二进制 1）。它创建一个掩码，其中填充了除我想要设置或清除的位之外的所有位，然后根据该掩码使用按位运算符设置它们。对于这样一个简单的任务来说，这似乎相当复杂，但这似乎是我能想到的最快的方法。这比一点一点设置要快得多。

static DWORD __dwFilledBitsRight[] = {
        0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, 0x1FF, 0x3FF, 0x7FF, 0xFFF, 0x1FFF, 0x3FFF,    0x7FFF, 0xFFFF, 0x1FFFF, 0x3FFFF, 0x7FFFF, 0xFFFFF, 0x1FFFFF, 0x3FFFFF, 0x7FFFFF, 0xFFFFFF, 0x1FFFFFF, 0x3FFFFFF, 0x7FFFFFF, 0xFFFFFFF, 0x1FFFFFFF, 0x3FFFFFFF, 0x7FFFFFFF, 0xFFFFFFFF
    };

static DWORD __dwFilledBitsLeft[] = {
        0x0, 0x80000000, 0xC0000000, 0xE0000000, 0xF0000000, 0xF8000000, 0xFC000000, 0xFE000000, 0xFF000000, 0xFF800000, 0xFFC00000, 0xFFE00000, 0xFFF00000, 0xFFF80000, 0xFFFC0000, 0xFFFE0000,    0xFFFF0000, 0xFFFF8000, 0xFFFFC000, 0xFFFFE000, 0xFFFFF000, 0xFFFFF800, 0xFFFFFC00, 0xFFFFFE00, 0xFFFFFF00, 0xFFFFFF80, 0xFFFFFFC0, 0xFFFFFFE0, 
        0xFFFFFFF0, 0xFFFFFFF8, 0xFFFFFFFC, 0xFFFFFFFE, 0xFFFFFFFF
    };

    // nStartBitFromLeft must be between 1 and 32... 
    // 1 is the bit farthest to the left (actual bit 31)
    // 32 is the bit farthest to the right (actual bit 0)
    inline void __FillDWORDBits(DWORD *p, int nStartBitFromLeft, int nBits, BOOL bSet)
    {
        DWORD dwLeftMask = __dwFilledBitsLeft[nStartBitFromLeft - 1]; // Mask for data on the left of the bits we want
        DWORD dwRightMask = __dwFilledBitsRight[33 - (nStartBitFromLeft + nBits)]; // Mask for data on the right of the bits we want
        DWORD dwBitMask = ~(dwLeftMask | dwRightMask); // Mask for the bits we want
        DWORD dwOriginal = *p;
        if(bSet) *p = (dwOriginal & dwLeftMask) | (dwOriginal & dwRightMask) | (0xFFFFFFFF & dwBitMask);
        else *p = (dwOriginal & dwLeftMask) | (dwOriginal & dwRightMask) | 0;

    }

原文

I made a function to set or clear a specific number of bits in a DWORD. My function works. I don't need help making it work. However, I am wondering if the method I've chosen to do it is the fastest possible way.

It's rather hard for me to explain how this works. There are two arrays containing DWORDs that are filled with bits on the left and right side of the DWORD (with all binary 1's). It makes a mask with all the bits filled except for the ones I want to set or clear, and then sets them with bitwise operators based on that mask. It seems rather complicated for such a simple task, but it seems like the fastest way I could come up with. It's much faster than setting them bit by bit.

static DWORD __dwFilledBitsRight[] = {
        0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, 0x1FF, 0x3FF, 0x7FF, 0xFFF, 0x1FFF, 0x3FFF,    0x7FFF, 0xFFFF, 0x1FFFF, 0x3FFFF, 0x7FFFF, 0xFFFFF, 0x1FFFFF, 0x3FFFFF, 0x7FFFFF, 0xFFFFFF, 0x1FFFFFF, 0x3FFFFFF, 0x7FFFFFF, 0xFFFFFFF, 0x1FFFFFFF, 0x3FFFFFFF, 0x7FFFFFFF, 0xFFFFFFFF
    };

static DWORD __dwFilledBitsLeft[] = {
        0x0, 0x80000000, 0xC0000000, 0xE0000000, 0xF0000000, 0xF8000000, 0xFC000000, 0xFE000000, 0xFF000000, 0xFF800000, 0xFFC00000, 0xFFE00000, 0xFFF00000, 0xFFF80000, 0xFFFC0000, 0xFFFE0000,    0xFFFF0000, 0xFFFF8000, 0xFFFFC000, 0xFFFFE000, 0xFFFFF000, 0xFFFFF800, 0xFFFFFC00, 0xFFFFFE00, 0xFFFFFF00, 0xFFFFFF80, 0xFFFFFFC0, 0xFFFFFFE0, 
        0xFFFFFFF0, 0xFFFFFFF8, 0xFFFFFFFC, 0xFFFFFFFE, 0xFFFFFFFF
    };

    // nStartBitFromLeft must be between 1 and 32... 
    // 1 is the bit farthest to the left (actual bit 31)
    // 32 is the bit farthest to the right (actual bit 0)
    inline void __FillDWORDBits(DWORD *p, int nStartBitFromLeft, int nBits, BOOL bSet)
    {
        DWORD dwLeftMask = __dwFilledBitsLeft[nStartBitFromLeft - 1]; // Mask for data on the left of the bits we want
        DWORD dwRightMask = __dwFilledBitsRight[33 - (nStartBitFromLeft + nBits)]; // Mask for data on the right of the bits we want
        DWORD dwBitMask = ~(dwLeftMask | dwRightMask); // Mask for the bits we want
        DWORD dwOriginal = *p;
        if(bSet) *p = (dwOriginal & dwLeftMask) | (dwOriginal & dwRightMask) | (0xFFFFFFFF & dwBitMask);
        else *p = (dwOriginal & dwLeftMask) | (dwOriginal & dwRightMask) | 0;

    }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小梨窩很甜 2024-10-20 21:22:35

怎么样：

// Create mask of correct length, and shift to the correct position
DWORD mask = ((1ULL << nBits) - 1) << pos;
// Apply mask (or its inverse)
if (bSet)
{
    *p |= mask;
}
else
{
    *p &= ~mask;
}

在任何现代处理器上，简单的按位运算很可能比表查找更快。

注意：根据此平台上DWORD和long long之间的关系，您可能需要对nBits = 的情况进行特殊处理= sizeof(DWORD)*8。或者，如果 nBits==0 不可行，您可以执行 DWORD mask = ((2ULL << (nBits - 1)) - 1) <<位置；。

更新：有人提到 if 可能会很慢，这是事实。这是它的替代品，但您需要进行测量，看看它在实践中是否真的更快。

// A bit hacky, but the aim is to get 0x00000000 or 0xFFFFFFFF
// (relies on two's-complement representation)
DWORD blanket = bSet - 1;
// Use the blanket to override one or other masking operation
*p |=  (blanket | mask);
*p &= ~(blanket & mask);

How about:

// Create mask of correct length, and shift to the correct position
DWORD mask = ((1ULL << nBits) - 1) << pos;
// Apply mask (or its inverse)
if (bSet)
{
    *p |= mask;
}
else
{
    *p &= ~mask;
}

It's pretty likely that simple bitwise operations will be faster than table lookup on any modern processor.

Note: Depending on the relationship between DWORD and long long on this platform, you may need special handling for the case where nBits == sizeof(DWORD)*8. Or if nBits==0 is not a possibility, you could just do DWORD mask = ((2ULL << (nBits - 1)) - 1) << pos;.

Update: It's been mentioned that the if could potentially be slow, which is true. Here's a replacement for it, but you'd need to measure to see if it's actually any faster in practice.

// A bit hacky, but the aim is to get 0x00000000 or 0xFFFFFFFF
// (relies on two's-complement representation)
DWORD blanket = bSet - 1;
// Use the blanket to override one or other masking operation
*p |=  (blanket | mask);
*p &= ~(blanket & mask);

回复收藏 0 原文

苍白女子 2024-10-20 21:22:35

我就是这样做的。我将其分为两个函数：setbits() 和clearbits()。为了清晰起见，分解了步骤，我相信它可以更加优化。

该版本依赖于 32 位代码。另外，在我的世界中，位 0 是最右边的位。您的里程可能会有所不同。

setbits( DWORD *p , int offset , int len )
{
  // offset must be 0-31, len must be 0-31, len+offset must be 0-32
  int   right_shift = ( !len ? 0 : 32 - (len+offset) ) ;
  int   left_shift  = offset ;
  DWORD right_mask  = 0xFFFFFFFF >> right_shift  ;
  DWORD left_mask   = 0xFFFFFFFF << left_shift   ;
  DWORD mask        = left_mask & right_mask     ;

  *p |= mask ;

  return ;
}

clearbits( DWORD *p , int offset , int len )
{
  // offset must be 0-31, len must be 0-31, len+offset must be 0-32
  int   right_shift = ( !len ? 0 : 32 - (len+offset) ) ;
  int   left_shift  = offset ;
  DWORD right_mask  = 0xFFFFFFFF >> right_shift   ;
  DWORD left_mask   = 0xFFFFFFFF << left_shift    ;
  DWORD mask        = ~( left_mask & right_mask ) ;

  *p &= mask ;

  return ;
}

我今天在寻找其他东西时偶然发现了这个改进版本。由肖恩·安德森 (Sean Anderson) 在斯坦福大学的 Bit Twiddling Hacks 提供：

// uncomment #define to get the super scalar CPU version.
// #define SUPER_SCALAR_CPU
void setbits( unsigned int *p , int offset , int len , int flag )
{
  unsigned int mask = ( ( 1 << len ) - 1 ) << offset ;

#if !defined( SUPER_SCALAR_CPU )
  *p ^= ( - flag ^ *p ) & mask ;
#else
  // supposed to be some 16% faster on a Intel Core 2 Duo than the non-super-scalar version above
  *p = (*p & ~ mask ) | ( - flag & mask ) ;
#endif

  return ;

}

很大程度上取决于不过，你的编译器。

This is the way I'd do it. I'd break it into two functions, setbits() and clearbits(). Steps broken out for clarity, and I'm sure it can be far more optimized.

This version is dependent on 32-bit code as it stands. Also, in my world, bit 0 is the rightmost bit. Your mileage may vary.

setbits( DWORD *p , int offset , int len )
{
  // offset must be 0-31, len must be 0-31, len+offset must be 0-32
  int   right_shift = ( !len ? 0 : 32 - (len+offset) ) ;
  int   left_shift  = offset ;
  DWORD right_mask  = 0xFFFFFFFF >> right_shift  ;
  DWORD left_mask   = 0xFFFFFFFF << left_shift   ;
  DWORD mask        = left_mask & right_mask     ;

  *p |= mask ;

  return ;
}

clearbits( DWORD *p , int offset , int len )
{
  // offset must be 0-31, len must be 0-31, len+offset must be 0-32
  int   right_shift = ( !len ? 0 : 32 - (len+offset) ) ;
  int   left_shift  = offset ;
  DWORD right_mask  = 0xFFFFFFFF >> right_shift   ;
  DWORD left_mask   = 0xFFFFFFFF << left_shift    ;
  DWORD mask        = ~( left_mask & right_mask ) ;

  *p &= mask ;

  return ;
}

I stumbled across this improved version whilst looking for something else today. Courtesy of Sean Anderson's Bit Twiddling Hacks at Stanford University:

// uncomment #define to get the super scalar CPU version.
// #define SUPER_SCALAR_CPU
void setbits( unsigned int *p , int offset , int len , int flag )
{
  unsigned int mask = ( ( 1 << len ) - 1 ) << offset ;

#if !defined( SUPER_SCALAR_CPU )
  *p ^= ( - flag ^ *p ) & mask ;
#else
  // supposed to be some 16% faster on a Intel Core 2 Duo than the non-super-scalar version above
  *p = (*p & ~ mask ) | ( - flag & mask ) ;
#endif

  return ;

}

Much depends on your compiler, though.

回复收藏 0 原文

~没有更多了~