按位循环遍历大数据块的最快方法是什么
我正在按字节运行二进制数据的内存块。
目前我正在做这样的事情:
for (i = 0; i < data->Count; i++)
{
byte = &data->Data[i];
((*byte & Masks[0]) == Masks[0]) ? Stats.FreqOf1++; // syntax incorrect but you get the point.
((*byte & Masks[1]) == Masks[1]) ? Stats.FreqOf1++;
((*byte & Masks[2]) == Masks[2]) ? Stats.FreqOf1++;
((*byte & Masks[3]) == Masks[3]) ? Stats.FreqOf1++;
((*byte & Masks[4]) == Masks[4]) ? Stats.FreqOf1++;
((*byte & Masks[5]) == Masks[5]) ? Stats.FreqOf1++;
((*byte & Masks[6]) == Masks[6]) ? Stats.FreqOf1++;
((*byte & Masks[7]) == Masks[7]) ? Stats.FreqOf1++;
}
Masks is:(
for (i = 0; i < 8; i++)
{
Masks[i] = 1 << i;
}
不知怎的,我没有设法在循环或内联函数中快速完成它,所以我把它写出来了。)
有没有人对如何改进这个有任何建议第一个循环? 我对细化细节缺乏经验。
这似乎是一件愚蠢的事情。 但我正在实现压缩算法。 我只想将位访问部分放在右边。
谢谢!
PS:这是在 Visual Studio 2008 编译器上的。 因此,如果这些建议适用于该编译器,那就太好了。
PPS:我刚刚意识到,我不需要增加两个计数。 一个就足够了。 然后计算最后总位数的差值。 但这仅适用于计数。 我真正想要快速完成的是位提取。
编辑: 提出的查找表想法很好。 我意识到我在标题中提出了错误的问题。 因为最终我想做的不是计算位数,而是尽可能快地访问每一位。
另一个编辑: 是否可以将数据中的指针前进一位?
另一个编辑: 感谢您迄今为止的所有回答。
我想在接下来的步骤中实现的是一个不复杂的二进制算术编码器,它不分析上下文。 所以我现在只对单个位感兴趣。 最终它将成为上下文自适应 BAC,但我将其留到以后再说。
处理 4 个字节而不是 1 个字节可能是一种选择。 但是超过 32 位的循环成本也很高,不是吗?
I am running through a memory block of binary data byte-wise.
Currently I am doing something like this:
for (i = 0; i < data->Count; i++)
{
byte = &data->Data[i];
((*byte & Masks[0]) == Masks[0]) ? Stats.FreqOf1++; // syntax incorrect but you get the point.
((*byte & Masks[1]) == Masks[1]) ? Stats.FreqOf1++;
((*byte & Masks[2]) == Masks[2]) ? Stats.FreqOf1++;
((*byte & Masks[3]) == Masks[3]) ? Stats.FreqOf1++;
((*byte & Masks[4]) == Masks[4]) ? Stats.FreqOf1++;
((*byte & Masks[5]) == Masks[5]) ? Stats.FreqOf1++;
((*byte & Masks[6]) == Masks[6]) ? Stats.FreqOf1++;
((*byte & Masks[7]) == Masks[7]) ? Stats.FreqOf1++;
}
Where Masks is:
for (i = 0; i < 8; i++)
{
Masks[i] = 1 << i;
}
(I somehow did not manage to do it as fast in a loop or in an inlined function, so I wrote it out.)
Does anyone have any suggestions on how to to improve this first loop? I am rather inexperienced with getting down to bits.
This may seem like a stupid thing to do. But I am in the process of implementing a compression algorithm. I just want to have the bit accessing part down right.
Thanks!
PS: This is in on the Visual Studio 2008 compiler. So it would be nice if the suggestions applied to that compiler.
PPS: I just realized, that I don't need to increment two counts. One would be enough. Then compute the difference to the total bits at the end.
But that would be specific to just counting. What I really want done fast is the bit extraction.
EDIT:
The lookup table idea that was brought forward is nice.
I realize though that I posed the question wrong in the title.
Because in the end what I want to do is not count the bits, but access each bit as fast as possible.
ANOTHER EDIT:
Is it possible to advance a pointer by just one bit in the data?
ANOTHER EDIT:
Thank you for all your answers so far.
What I want to implement in the next steps is a nonsophisticated binary arithmetic coder that does not analyze the context. So I am only interested in single bits for now. Eventually it will become a Context-adaptive BAC but I will leave that for later.
Processing 4 bytes instead of 1 byte could be an option. But a loop over 32 bits is costly as well, isn't it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
使用一个表将每个字节值 (256) 映射到其中 1 的数量。 (0 的数量就是 (8 - 1 的数量))。 然后迭代字节并对每个字节执行一次查找,而不是多次查找和比较。 例如:
Use a table that maps each byte value (256) to the number of 1's in it. (The # of 0's is just (8 - # of 1's)). Then iterate over the bytes and perform a single lookup for each byte, instead of multiple lookups and comparisons. For example:
我不太明白你想做什么。 但是,如果您只想访问位图的位,则可以使用这些(未经测试!!!)函数:
编辑:好的,我认为我明白什么你想要做的事情:对一系列位进行快速迭代。 因此,我们不想使用上面的随机访问函数,而是一次读取整个字的数据。
您可以使用任何您喜欢的无符号整数类型,但您应该选择一种可能与您的体系结构的字长相对应的类型。 我将使用
stdint.h
中的uint_fast32_t
:从内部循环中,您可以使用
unset the bit 设置该位,
切换该位
并使用Warning: : 代码在大端架构上可能会出现意外行为!
I did not really understand what you're trying to do. But if you just want to get access to the bits of a bitmap, you can use these (untested!!!) functions:
Edit: Ok, I think I understand what you want to do: Fast iteration over a sequence of bits. Therefore, we don't want to use the random access functions from above, but read a whole word of data at once.
You might use any unsigned integer type you like, but you should choose one which is likely to correspond to the word size of your architecture. I'll go with
uint_fast32_t
fromstdint.h
:From the inner loop, you can set the bit with
unset the bit with
and toggle the bit with
Warning: The code might behave unexpectedly on big-endian architectures!
您可以使用预先计算的查找表,即:
You could use a precomputed lookup table, i.e:
下面是一个计算 32 位整数的 1 位的方法(基于 Java 的
Integer.bitCount(i)
方法):因此您可以将数据转换为 int 并以 4 个字节为步长向前移动。
Here is a method how to count the 1 bits of a 32bit integer (based on Java's
Integer.bitCount(i)
method):So you can cast your data to int and move forward in 4 byte steps.
这是我在一个 32 位值上创建的一个简单的值,但是您可以看到将其适应任意数量的位并不困难......
但请注意,它会在此过程中修改该值。 如果您对需要保留的数据执行此操作,那么您需要先复制它。
在 __asm 中执行此操作可能是一种更好,也许更快的方法,但很难说编译器可以优化到什么程度......
对于您考虑的每个解决方案,每个解决方案都会有缺点。 查找表或位移位器(如我的)都有缺点。
拉里
Here is a simple one I whipped up on just a single 32 bit value, but you can see it wouldn't be hard to adapt it to any number of bits....
Notice however, that it modifies the value in the process. If you are doing this on data you need to keep, then you need to make a copy of it first.
Doing this in __asm would probably be a better, maybe faster way, but it's hard to say with how well the compiler can optimize...
With each solution you consider, each one will have drawbacks. A lookup table or a bit shifter (like mine), both have drawbacks.
Larry
ttobiass - 请记住,您的内联函数在您正在谈论的应用程序中很重要,但是您需要记住一些事情。 您可以从内联代码中获得性能,只需记住几件事即可。
您将从 C/C++ 语言的项目设置以及构建代码的方式中获得最佳性能。 此时,了解堆与堆栈操作、调用约定、内存对齐等很重要。
我知道这并不能完全回答您的问题,但是您提到了性能,以及如何获得最佳性能,这些东西是关键。
ttobiass - Keep in mind your inline functions are important in applications like you are talking about, but there are things you need to keep in mind. You CAN get the performance out of the inline code, just remember a couple things.
You'll get your best performance out of your project settings for the C/C++ language, and how you construct your code. At this point, it's important to understand Heap vs. Stack operations, calling conventions, memory alignment, etc.
I know this does not answer your question exactly, but you mention performance, and how to get the best performance, and these things are key.
加入链接车:
计算位数
To join the link wagon:
counting bits
如果这不是过早优化的情况,并且您确实需要挤出最后一个飞秒,那么您最好使用 256 元素的静态数组,用每个字节值的位数填充一次,然后
当循环完成时:
If this is not a case of premature optimization and you truly need to squeeze out every last femtosecond, then you're probably better off with a 256-element static array that you populate once with the bit-count of each byte value, then
and when the loop is done:
Beautiful Code一书中有一整章介绍了不同的技术。 您可以在 Google 图书上阅读(大部分)内容 从这里开始。
There's a whole chapter on the different techniques for this in the book Beautiful Code. You can read (most of) it on Google books starting here.
提取位的更快方法是使用:
如果您只想对设置的位进行计数,则缓存中的 LUT 会很快,但您也可以使用 此答案中的链接。
A faster way to extract bits is to use:
If you just want to count bits set, a LUT in cache per would be fast, but you can also do it in constant time with the interleaved bit counting method in the link in this answer.
最快的方法可能是建立一个字节值与该字节中设置的位数的查找表。 至少这是我在谷歌面试时的答案。
The fastest way is probably to build a lookup table of byte values versus the number of bits set in that byte. At least that was the answer when I interviewed at Google.