在matlab中有效计算汉明权
假设 MATLAB uint32 被解释为位字符串,那么计算字符串中有多少个非零位的有效且简洁的方法是什么?
我有一种可行的、天真的方法,可以循环这些位,但这对于我的需求来说太慢了。 (使用 std::bitset count() 的 C++ 实现几乎立即运行)。
我发现了一个非常好的页面,列出了各种位计数技术,但我希望有一种简单的 MATLAB 式方法。
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
更新 #1
刚刚实现了 Brian Kernighan 算法,如下所示:
w = 0;
while ( bits > 0 )
bits = bitand( bits, bits-1 );
w = w + 1;
end
性能仍然很差,超过 10 秒只计算 4096^2 权重计算。 我的 C++ 代码使用 std::bitset 中的 count() 在亚秒内完成此操作。
更新 #2
这是我迄今为止尝试过的技术的运行时间表。 当我得到更多的想法/建议时,我会更新它。
Vectorized Scheiner algorithm => 2.243511 sec Vectorized Naive bitget loop => 7.553345 sec Kernighan algorithm => 17.154692 sec length( find( bitget( val, 1:32 ) ) ) => 67.368278 sec nnz( bitget( val, 1:32 ) ) => 349.620259 sec Justin Scheiner's algorithm, unrolled loops => 370.846031 sec Justin Scheiner's algorithm => 398.786320 sec Naive bitget loop => 456.016731 sec sum(dec2bin(val) == '1') => 1069.851993 sec
评论:MATLAB 中的 dec2bin() 函数似乎实现得很差。 它运行得非常慢。
注释:“Naive bitget循环”算法的实现如下:
w=0;
for i=1:32
if bitget( val, i ) == 1
w = w + 1;
end
end
注释: Scheiner 算法的循环展开版本如下所示:
function w=computeWeight( val )
w = val;
w = bitand(bitshift(w, -1), uint32(1431655765)) + ...
bitand(w, uint32(1431655765));
w = bitand(bitshift(w, -2), uint32(858993459)) + ...
bitand(w, uint32(858993459));
w = bitand(bitshift(w, -4), uint32(252645135)) + ...
bitand(w, uint32(252645135));
w = bitand(bitshift(w, -8), uint32(16711935)) + ...
bitand(w, uint32(16711935));
w = bitand(bitshift(w, -16), uint32(65535)) + ...
bitand(w, uint32(65535));
Given a MATLAB uint32 to be interpreted as a bit string, what is an efficient and concise way of counting how many nonzero bits are in the string?
I have a working, naive approach which loops over the bits, but that's too slow for my needs. (A C++ implementation using std::bitset count() runs almost instantly).
I've found a pretty nice page listing various bit counting techniques, but I'm hoping there is an easy MATLAB-esque way.
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
Update #1
Just implemented the Brian Kernighan algorithm as follows:
w = 0;
while ( bits > 0 )
bits = bitand( bits, bits-1 );
w = w + 1;
end
Performance is still crappy, over 10 seconds to compute just 4096^2 weight calculations. My C++ code using count() from std::bitset does this in subsecond time.
Update #2
Here is a table of run times for the techniques I've tried so far. I will update it as I get additional ideas/suggestions.
Vectorized Scheiner algorithm => 2.243511 sec
Vectorized Naive bitget loop => 7.553345 sec
Kernighan algorithm => 17.154692 sec
length( find( bitget( val, 1:32 ) ) ) => 67.368278 sec
nnz( bitget( val, 1:32 ) ) => 349.620259 sec
Justin Scheiner's algorithm, unrolled loops => 370.846031 sec
Justin Scheiner's algorithm => 398.786320 sec
Naive bitget loop => 456.016731 sec
sum(dec2bin(val) == '1') => 1069.851993 sec
Comment: The dec2bin() function in MATLAB seems to be very poorly implemented. It runs extremely slow.
Comment: The "Naive bitget loop" algorithm is implemented as follows:
w=0;
for i=1:32
if bitget( val, i ) == 1
w = w + 1;
end
end
Comment:
The loop unrolled version of Scheiner's algorithm looks as follows:
function w=computeWeight( val )
w = val;
w = bitand(bitshift(w, -1), uint32(1431655765)) + ...
bitand(w, uint32(1431655765));
w = bitand(bitshift(w, -2), uint32(858993459)) + ...
bitand(w, uint32(858993459));
w = bitand(bitshift(w, -4), uint32(252645135)) + ...
bitand(w, uint32(252645135));
w = bitand(bitshift(w, -8), uint32(16711935)) + ...
bitand(w, uint32(16711935));
w = bitand(bitshift(w, -16), uint32(65535)) + ...
bitand(w, uint32(65535));
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
我很想知道这个解决方案有多快:
回头看,我发现这是 bithacks 页面上给出的“并行”解决方案。
I'd be interested to see how fast this solution is:
Going back, I see that this is the 'parallel' solution given on the bithacks page.
除非这是 MATLAB 实现练习,否则您可能只想采用快速 C++ 实现并将其编译为 mex 函数,每个目标平台一次。
Unless this is a MATLAB implementation exercise, you might want to just take your fast C++ implementation and compile it as a mex function, once per target platform.
编辑:新解决方案
看来您想要对 4096×4096 UINT32 值数组中的每个元素重复计算。 如果这就是您正在做的事情,我认为在 MATLAB 中执行此操作的最快方法是使用 BITGET 旨在对值矩阵进行操作。 代码如下所示:
如果您想制作其他一些算法的矢量化版本,我相信BITAND 也被设计用于矩阵操作。
旧的解决方案...
我能想到的最简单的方法是使用DEC2BIN 函数,它为您提供非负整数的二进制表示形式(作为字符串):
它很慢,但很简单。 =)
EDIT: NEW SOLUTION
It appears that you want to repeat the calculation for every element in a 4096-by-4096 array of UINT32 values. If this is what you are doing, I think the fastest way to do it in MATLAB is to use the fact that BITGET is designed to operate on matrices of values. The code would look like this:
If you want to make vectorized versions of some of the other algorithms, I believe BITAND is also designed to operate on matrices.
The old solution...
The easiest way I can think of is to use the DEC2BIN function, which gives you the binary representation (as a string) of a non-negative integer:
It's slow, but it's easy. =)
实现了顶部斯坦福链接中的“最佳 32 位算法”。
改进后的算法将处理时间减少了 6%。
还优化了段大小,发现32K稳定并且比4K提高了15%的时间。
预计 4Kx4K 时间是矢量化 Scheiner 算法的 40%。
Implemented the "Best 32 bit Algorithm" from the Stanford link at the top.
The improved algorithm reduced processing time by 6%.
Also optimized the segment size and found that 32K is stable and improves time by 15% over 4K.
Expect 4Kx4K time to be 40% of Vectorized Scheiner Algorithm.
在 Matlab Cody 上做了一些时序比较。
确定分段改进的矢量化 Scheiner 可提供最佳性能。
对于 L=4096*4096 向量,基于 Cody 1.30 秒到 0.60 秒的变化,时间减少了 50% 以上。
Did some timing comparisons on Matlab Cody.
Determined a Segmented Modified Vectorized Scheiner gives optimimum performance.
Have >50% time reduction based on Cody 1.30 sec to 0.60 sec change for an L=4096*4096 vector.
一种快速方法是使用查找表计算每个字节中的位,然后将这些值相加; 事实上,这是问题中给出的网页上建议的方法之一。 这种方法的优点在于,查找和求和都是 MATLAB 中的可向量化运算,因此您可以对此方法进行向量化,并非常快速地同时计算大量位串的汉明权重/设置位数。 此方法在 MATLAB 文件交换上的 bitcount 提交中实现。
A fast approach is counting the bits in each byte using a lookup table, then summing these values; indeed, it's one of the approaches suggested on the web page given in the question. The nice thing about this approach is that both lookup and sum are vectorizable operations in MATLAB, so you can vectorize this approach and compute the hamming weight / number of set bits of a large number of bit strings simultaneously, very quickly. This approach is implemented in the bitcount submission on the MATLAB File Exchange.
尝试将工作分成更小的部分。 我的猜测是,如果您想一次处理所有数据,matlab 会尝试在执行连续步骤之前对所有整数执行每个操作,并且处理器的缓存会随着每个步骤而失效。
Try splitting the job into smaller parts. My guess is that if you want to process all data at once, matlab is trying to do each operation on all integers before taking successive steps and the processor's cache is invalidated with each step.
我在这里恢复一个旧线程,但我遇到了这个问题,我为它编写了一点代码:
看起来很简洁,但我担心
bitget
是在 O(n )位移
操作。 该代码适用于我要做的事情,但我的问题集不依赖于汉明权重。I'm reviving an old thread here, but I ran across this problem and I wrote this little bit of code for it:
Looks pretty concise, but I'm scared that
bitget
is implemented in O(n)bitshift
operations. The code works for what I'm going, but my problem set doesn't rely on hamming weight.