按位循环遍历大数据块的最快方法是什么

发布于 2024-07-11 12:11:32 字数 1412 浏览 14 评论 0原文

我正在按字节运行二进制数据的内存块。

目前我正在做这样的事情：

for (i = 0; i < data->Count; i++)
{   
    byte = &data->Data[i];
    ((*byte & Masks[0]) == Masks[0]) ? Stats.FreqOf1++; // syntax incorrect but you get the point.
    ((*byte & Masks[1]) == Masks[1]) ? Stats.FreqOf1++;
    ((*byte & Masks[2]) == Masks[2]) ? Stats.FreqOf1++;
    ((*byte & Masks[3]) == Masks[3]) ? Stats.FreqOf1++;
    ((*byte & Masks[4]) == Masks[4]) ? Stats.FreqOf1++;
    ((*byte & Masks[5]) == Masks[5]) ? Stats.FreqOf1++;
    ((*byte & Masks[6]) == Masks[6]) ? Stats.FreqOf1++;
    ((*byte & Masks[7]) == Masks[7]) ? Stats.FreqOf1++;
}

Masks is:（

for (i = 0; i < 8; i++)
{
    Masks[i] = 1 << i;
}

不知怎的，我没有设法在循环或内联函数中快速完成它，所以我把它写出来了。）

有没有人对如何改进这个有任何建议第一个循环？我对细化细节缺乏经验。

这似乎是一件愚蠢的事情。但我正在实现压缩算法。我只想将位访问部分放在右边。

谢谢！

PS：这是在 Visual Studio 2008 编译器上的。因此，如果这些建议适用于该编译器，那就太好了。

PPS：我刚刚意识到，我不需要增加两个计数。一个就足够了。然后计算最后总位数的差值。但这仅适用于计数。我真正想要快速完成的是位提取。

编辑：提出的查找表想法很好。我意识到我在标题中提出了错误的问题。因为最终我想做的不是计算位数，而是尽可能快地访问每一位。

另一个编辑：是否可以将数据中的指针前进一位？

另一个编辑：感谢您迄今为止的所有回答。

我想在接下来的步骤中实现的是一个不复杂的二进制算术编码器，它不分析上下文。所以我现在只对单个位感兴趣。最终它将成为上下文自适应 BAC，但我将其留到以后再说。

处理 4 个字节而不是 1 个字节可能是一种选择。但是超过 32 位的循环成本也很高，不是吗？

原文

I am running through a memory block of binary data byte-wise.

Currently I am doing something like this:

for (i = 0; i < data->Count; i++)
{   
    byte = &data->Data[i];
    ((*byte & Masks[0]) == Masks[0]) ? Stats.FreqOf1++; // syntax incorrect but you get the point.
    ((*byte & Masks[1]) == Masks[1]) ? Stats.FreqOf1++;
    ((*byte & Masks[2]) == Masks[2]) ? Stats.FreqOf1++;
    ((*byte & Masks[3]) == Masks[3]) ? Stats.FreqOf1++;
    ((*byte & Masks[4]) == Masks[4]) ? Stats.FreqOf1++;
    ((*byte & Masks[5]) == Masks[5]) ? Stats.FreqOf1++;
    ((*byte & Masks[6]) == Masks[6]) ? Stats.FreqOf1++;
    ((*byte & Masks[7]) == Masks[7]) ? Stats.FreqOf1++;
}

Where Masks is:

for (i = 0; i < 8; i++)
{
    Masks[i] = 1 << i;
}

(I somehow did not manage to do it as fast in a loop or in an inlined function, so I wrote it out.)

Does anyone have any suggestions on how to to improve this first loop? I am rather inexperienced with getting down to bits.

This may seem like a stupid thing to do. But I am in the process of implementing a compression algorithm. I just want to have the bit accessing part down right.

Thanks!

PS: This is in on the Visual Studio 2008 compiler. So it would be nice if the suggestions applied to that compiler.

PPS: I just realized, that I don't need to increment two counts. One would be enough. Then compute the difference to the total bits at the end.
But that would be specific to just counting. What I really want done fast is the bit extraction.

EDIT:
The lookup table idea that was brought forward is nice.
I realize though that I posed the question wrong in the title.
Because in the end what I want to do is not count the bits, but access each bit as fast as possible.

ANOTHER EDIT:
Is it possible to advance a pointer by just one bit in the data?

ANOTHER EDIT:
Thank you for all your answers so far.

What I want to implement in the next steps is a nonsophisticated binary arithmetic coder that does not analyze the context. So I am only interested in single bits for now. Eventually it will become a Context-adaptive BAC but I will leave that for later.

Processing 4 bytes instead of 1 byte could be an option. But a loop over 32 bits is costly as well, isn't it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

关于从前 2024-07-18 12:11:33

使用一个表将每个字节值 (256) 映射到其中 1 的数量。（0 的数量就是 (8 - 1 的数量)）。然后迭代字节并对每个字节执行一次查找，而不是多次查找和比较。例如：

int onesCount = 0;
for (i = 0; i < data->Count; i++)
{   
    byte = &data->Data[i];
    onesCount += NumOnes[byte];
}
Stats.FreqOf1 += onesCount;
Stats.FreqOf0 += (data->Count * 8) - onesCount;

Use a table that maps each byte value (256) to the number of 1's in it. (The # of 0's is just (8 - # of 1's)). Then iterate over the bytes and perform a single lookup for each byte, instead of multiple lookups and comparisons. For example:

int onesCount = 0;
for (i = 0; i < data->Count; i++)
{   
    byte = &data->Data[i];
    onesCount += NumOnes[byte];
}
Stats.FreqOf1 += onesCount;
Stats.FreqOf0 += (data->Count * 8) - onesCount;

回复收藏 0 原文

薄荷梦 2024-07-18 12:11:33

我不太明白你想做什么。但是，如果您只想访问位图的位，则可以使用这些（未经测试！！！）函数：

#include <stddef.h>

_Bool isbitset(unsigned char * bitmap, size_t idx)
{
    return bitmap[idx / 8] & (1 << (idx % 8)) ? 1 : 0;
}

void setbit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] |= (1 << (idx % 8));
}

void unsetbit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] &= ~(1 << (idx % 8));
}

void togglebit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] ^= (1 << (idx % 8));
}

编辑：好的，我认为我明白什么你想要做的事情：对一系列位进行快速迭代。因此，我们不想使用上面的随机访问函数，而是一次读取整个字的数据。

您可以使用任何您喜欢的无符号整数类型，但您应该选择一种可能与您的体系结构的字长相对应的类型。我将使用 stdint.h 中的 uint_fast32_t：

uint_fast32_t * data = __data_source__;
for(; __condition__; ++data)
{
    uint_fast32_t mask = 1;
    uint_fast32_t current = *data;
    for(; mask; mask <<= 1)
    {
        if(current & mask)
        {
            // bit is set
        }
        else
        {
            // bit is not set
        }
    }
}

从内部循环中，您可以使用

*data |= mask;

unset the bit 设置该位，

*data &= ~mask;

切换该位

*data ^= mask;

并使用Warning: ： 代码在大端架构上可能会出现意外行为！

I did not really understand what you're trying to do. But if you just want to get access to the bits of a bitmap, you can use these (untested!!!) functions:

#include <stddef.h>

_Bool isbitset(unsigned char * bitmap, size_t idx)
{
    return bitmap[idx / 8] & (1 << (idx % 8)) ? 1 : 0;
}

void setbit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] |= (1 << (idx % 8));
}

void unsetbit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] &= ~(1 << (idx % 8));
}

void togglebit(unsigned char * bitmap, size_t idx)
{
    bitmap[idx / 8] ^= (1 << (idx % 8));
}

Edit: Ok, I think I understand what you want to do: Fast iteration over a sequence of bits. Therefore, we don't want to use the random access functions from above, but read a whole word of data at once.

You might use any unsigned integer type you like, but you should choose one which is likely to correspond to the word size of your architecture. I'll go with uint_fast32_t from stdint.h:

uint_fast32_t * data = __data_source__;
for(; __condition__; ++data)
{
    uint_fast32_t mask = 1;
    uint_fast32_t current = *data;
    for(; mask; mask <<= 1)
    {
        if(current & mask)
        {
            // bit is set
        }
        else
        {
            // bit is not set
        }
    }
}

From the inner loop, you can set the bit with

*data |= mask;

unset the bit with

*data &= ~mask;

and toggle the bit with

*data ^= mask;

Warning: The code might behave unexpectedly on big-endian architectures!

回复收藏 0 原文

老街孤人 2024-07-18 12:11:33

您可以使用预先计算的查找表，即：

static int bitcount_lookup[256] = { ..... } ; /* or make it a global and compute the values in code */

...

for( ... ) 
   byte = ... 
   Stats.FreqOf1 += bitcount_lookup[byte];

You could use a precomputed lookup table, i.e:

static int bitcount_lookup[256] = { ..... } ; /* or make it a global and compute the values in code */

...

for( ... ) 
   byte = ... 
   Stats.FreqOf1 += bitcount_lookup[byte];

回复收藏 0 原文

瀞厅☆埖开 2024-07-18 12:11:33

下面是一个计算 32 位整数的 1 位的方法（基于 Java 的 Integer.bitCount(i) 方法）：

unsigned bitCount(unsigned i) {
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    i = (i + (i >> 4)) & 0x0f0f0f0f;
    i = i + (i >> 8);
    i = i + (i >> 16);
    return i & 0x3f;
}

因此您可以将数据转换为 int 并以 4 个字节为步长向前移动。

Here is a method how to count the 1 bits of a 32bit integer (based on Java's Integer.bitCount(i) method):

unsigned bitCount(unsigned i) {
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    i = (i + (i >> 4)) & 0x0f0f0f0f;
    i = i + (i >> 8);
    i = i + (i >> 16);
    return i & 0x3f;
}

So you can cast your data to int and move forward in 4 byte steps.

回复收藏 0 原文

<逆流佳人身旁 2024-07-18 12:11:33

这是我在一个 32 位值上创建的一个简单的值，但是您可以看到将其适应任意数量的位并不困难......

int ones = 0;
int x = 0xdeadbeef;
for(int y = 0;y < 32;y++)
{
    if((x & 0x1) == 0x1) ones++;
    x = (x >> 1);
}

printf("%x contains %d ones and %d zeros.\n", x, ones, 32-ones);

但请注意，它会在此过程中修改该值。如果您对需要保留的数据执行此操作，那么您需要先复制它。

在 __asm 中执行此操作可能是一种更好，也许更快的方法，但很难说编译器可以优化到什么程度......

对于您考虑的每个解决方案，每个解决方案都会有缺点。查找表或位移位器（如我的）都有缺点。

拉里

Here is a simple one I whipped up on just a single 32 bit value, but you can see it wouldn't be hard to adapt it to any number of bits....

int ones = 0;
int x = 0xdeadbeef;
for(int y = 0;y < 32;y++)
{
    if((x & 0x1) == 0x1) ones++;
    x = (x >> 1);
}

printf("%x contains %d ones and %d zeros.\n", x, ones, 32-ones);

Notice however, that it modifies the value in the process. If you are doing this on data you need to keep, then you need to make a copy of it first.

Doing this in __asm would probably be a better, maybe faster way, but it's hard to say with how well the compiler can optimize...

With each solution you consider, each one will have drawbacks. A lookup table or a bit shifter (like mine), both have drawbacks.

Larry

回复收藏 0 原文

寒冷纷飞旳雪 2024-07-18 12:11:33

ttobiass - 请记住，您的内联函数在您正在谈论的应用程序中很重要，但是您需要记住一些事情。您可以从内联代码中获得性能，只需记住几件事即可。

调试模式下的内联不存在。（除非你强迫它）
编译器将内联函数，因为它认为合适。通常，如果你告诉它内联一个函数，它可能根本不做。即使你使用__forceinline。有关内联的更多信息，请查看 MSDN。
甚至只有某些函数可以内联。例如，您不能内联递归函数。

您将从 C/C++ 语言的项目设置以及构建代码的方式中获得最佳性能。此时，了解堆与堆栈操作、调用约定、内存对齐等很重要。

我知道这并不能完全回答您的问题，但是您提到了性能，以及如何获得最佳性能，这些东西是关键。

回复收藏 0 原文

老子叫无熙 2024-07-18 12:11:33

加入链接车：
计算位数

回复收藏 0 原文

哥，最终变帅啦 2024-07-18 12:11:33

如果这不是过早优化的情况，并且您确实需要挤出最后一个飞秒，那么您最好使用 256 元素的静态数组，用每个字节值的位数填充一次，然后

Stats.FreqOf1 += bitCountTable[字节]

当循环完成时：

Stats.FreqOf0 = ((data->Count * 8) - Stats.FreqOf1)

回复收藏 0 原文

橘虞初梦 2024-07-18 12:11:33

Beautiful Code一书中有一整章介绍了不同的技术。您可以在 Google 图书上阅读（大部分）内容从这里开始。

回复收藏 0 原文

作死小能手 2024-07-18 12:11:33

提取位的更快方法是使用：

bitmask= data->Data[i];

while (bitmask)
{
    bit_set_as_power_of_two= bitmask & -bitmask;
    bitmask&= bitmask - 1;
}

如果您只想对设置的位进行计数，则缓存中的 LUT 会很快，但您也可以使用此答案中的链接。

A faster way to extract bits is to use:

bitmask= data->Data[i];

while (bitmask)
{
    bit_set_as_power_of_two= bitmask & -bitmask;
    bitmask&= bitmask - 1;
}

If you just want to count bits set, a LUT in cache per would be fast, but you can also do it in constant time with the interleaved bit counting method in the link in this answer.

回复收藏 0 原文