当前位置：文江博客话题详情

在matlab中有效计算汉明权

发布于 2024-07-24 09:31:50 字数 2128 浏览 13 评论 0原文

假设 MATLAB uint32 被解释为位字符串，那么计算字符串中有多少个非零位的有效且简洁的方法是什么？

我有一种可行的、天真的方法，可以循环这些位，但这对于我的需求来说太慢了。（使用 std::bitset count() 的 C++ 实现几乎立即运行）。

我发现了一个非常好的页面，列出了各种位计数技术，但我希望有一种简单的 MATLAB 式方法。

http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive

更新 #1

刚刚实现了 Brian Kernighan 算法，如下所示：

w = 0;
while ( bits > 0 )
    bits = bitand( bits, bits-1 );
    w = w + 1;
end

性能仍然很差，超过 10 秒只计算 4096^2 权重计算。我的 C++ 代码使用 std::bitset 中的 count() 在亚秒内完成此操作。

更新 #2

这是我迄今为止尝试过的技术的运行时间表。当我得到更多的想法/建议时，我会更新它。

Vectorized Scheiner algorithm                =>    2.243511 sec
Vectorized Naive bitget loop                 =>    7.553345 sec
Kernighan algorithm                          =>   17.154692 sec
length( find( bitget( val, 1:32 ) ) )        =>   67.368278 sec
nnz( bitget( val, 1:32 ) )                   =>  349.620259 sec
Justin Scheiner's algorithm, unrolled loops  =>  370.846031 sec
Justin Scheiner's algorithm                  =>  398.786320 sec
Naive bitget loop                            =>  456.016731 sec
sum(dec2bin(val) == '1')                     => 1069.851993 sec

评论：MATLAB 中的 dec2bin() 函数似乎实现得很差。它运行得非常慢。

注释：“Naive bitget循环”算法的实现如下：

w=0;
for i=1:32
   if bitget( val, i ) == 1
       w = w + 1;
   end
end

注释： Scheiner 算法的循环展开版本如下所示：

function w=computeWeight( val )
w = val;
w = bitand(bitshift(w, -1), uint32(1431655765)) + ...
    bitand(w, uint32(1431655765));

w = bitand(bitshift(w, -2), uint32(858993459)) + ...
    bitand(w, uint32(858993459));

w = bitand(bitshift(w, -4), uint32(252645135)) + ...
    bitand(w, uint32(252645135));

w = bitand(bitshift(w, -8), uint32(16711935)) + ...
    bitand(w, uint32(16711935));

w = bitand(bitshift(w, -16), uint32(65535)) + ...
    bitand(w, uint32(65535));

原文

Given a MATLAB uint32 to be interpreted as a bit string, what is an efficient and concise way of counting how many nonzero bits are in the string?

I have a working, naive approach which loops over the bits, but that's too slow for my needs. (A C++ implementation using std::bitset count() runs almost instantly).

I've found a pretty nice page listing various bit counting techniques, but I'm hoping there is an easy MATLAB-esque way.

http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive

Update #1

Just implemented the Brian Kernighan algorithm as follows:

w = 0;
while ( bits > 0 )
    bits = bitand( bits, bits-1 );
    w = w + 1;
end

Performance is still crappy, over 10 seconds to compute just 4096^2 weight calculations. My C++ code using count() from std::bitset does this in subsecond time.

Update #2

Here is a table of run times for the techniques I've tried so far. I will update it as I get additional ideas/suggestions.



Vectorized Scheiner algorithm                =>    2.243511 sec

Vectorized Naive bitget loop                 =>    7.553345 sec

Kernighan algorithm                          =>   17.154692 sec

length( find( bitget( val, 1:32 ) ) )        =>   67.368278 sec

nnz( bitget( val, 1:32 ) )                   =>  349.620259 sec

Justin Scheiner's algorithm, unrolled loops  =>  370.846031 sec

Justin Scheiner's algorithm                  =>  398.786320 sec

Naive bitget loop                            =>  456.016731 sec

sum(dec2bin(val) == '1')                     => 1069.851993 sec

Comment: The dec2bin() function in MATLAB seems to be very poorly implemented. It runs extremely slow.

Comment: The "Naive bitget loop" algorithm is implemented as follows:

w=0;
for i=1:32
   if bitget( val, i ) == 1
       w = w + 1;
   end
end

Comment:
The loop unrolled version of Scheiner's algorithm looks as follows:

function w=computeWeight( val )
w = val;
w = bitand(bitshift(w, -1), uint32(1431655765)) + ...
    bitand(w, uint32(1431655765));

w = bitand(bitshift(w, -2), uint32(858993459)) + ...
    bitand(w, uint32(858993459));

w = bitand(bitshift(w, -4), uint32(252645135)) + ...
    bitand(w, uint32(252645135));

w = bitand(bitshift(w, -8), uint32(16711935)) + ...
    bitand(w, uint32(16711935));

w = bitand(bitshift(w, -16), uint32(65535)) + ...
    bitand(w, uint32(65535));

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

執念 2024-07-31 09:31:50

我很想知道这个解决方案有多快：

function r = count_bits(n)

shifts = [-1, -2, -4, -8, -16];
masks = [1431655765, 858993459, 252645135, 16711935, 65535];

r = n;
for i=1:5
   r = bitand(bitshift(r, shifts(i)), masks(i)) + ...
      bitand(r, masks(i));
end

回头看，我发现这是 bithacks 页面上给出的“并行”解决方案。

I'd be interested to see how fast this solution is:

function r = count_bits(n)

shifts = [-1, -2, -4, -8, -16];
masks = [1431655765, 858993459, 252645135, 16711935, 65535];

r = n;
for i=1:5
   r = bitand(bitshift(r, shifts(i)), masks(i)) + ...
      bitand(r, masks(i));
end

Going back, I see that this is the 'parallel' solution given on the bithacks page.

回复收藏 0 原文

鲸落 2024-07-31 09:31:50

除非这是 MATLAB 实现练习，否则您可能只想采用快速 C++ 实现并将其编译为 mex 函数，每个目标平台一次。

回复收藏 0 原文

拥抱我好吗 2024-07-31 09:31:50

编辑：新解决方案

看来您想要对 4096×4096 UINT32 值数组中的每个元素重复计算。如果这就是您正在做的事情，我认为在 MATLAB 中执行此操作的最快方法是使用 BITGET 旨在对值矩阵进行操作。代码如下所示：

numArray = ...your 4096-by-4096 matrix of uint32 values...
w = zeros(4096,4096,'uint32');
for iBit = 1:32,
  w = w+bitget(numArray,iBit);
end

如果您想制作其他一些算法的矢量化版本，我相信BITAND 也被设计用于矩阵操作。

旧的解决方案...

我能想到的最简单的方法是使用DEC2BIN 函数，它为您提供非负整数的二进制表示形式（作为字符串）：

w = sum(dec2bin(num) == '1');  % Sums up the ones in the string

它很慢，但很简单。 =)

EDIT: NEW SOLUTION

It appears that you want to repeat the calculation for every element in a 4096-by-4096 array of UINT32 values. If this is what you are doing, I think the fastest way to do it in MATLAB is to use the fact that BITGET is designed to operate on matrices of values. The code would look like this:

numArray = ...your 4096-by-4096 matrix of uint32 values...
w = zeros(4096,4096,'uint32');
for iBit = 1:32,
  w = w+bitget(numArray,iBit);
end

If you want to make vectorized versions of some of the other algorithms, I believe BITAND is also designed to operate on matrices.

The old solution...

The easiest way I can think of is to use the DEC2BIN function, which gives you the binary representation (as a string) of a non-negative integer:

w = sum(dec2bin(num) == '1');  % Sums up the ones in the string

It's slow, but it's easy. =)

回复收藏 0 原文

暗恋未遂 2024-07-31 09:31:50

实现了顶部斯坦福链接中的“最佳 32 位算法”。
改进后的算法将处理时间减少了 6%。
还优化了段大小，发现32K稳定并且比4K提高了15%的时间。
预计 4Kx4K 时间是矢量化 Scheiner 算法的 40%。

function w = Ham(w)
% Input uint32
% Output vector of Ham wts
 for i=1:32768:length(w)
  w(i:i+32767)=Ham_seg(w(i:i+32767));
 end
end

% Segmentation gave reduced time by 50%

function w=Ham_seg(w)
 %speed
 b1=uint32(1431655765); 
 b2=uint32(858993459);
 b3=uint32(252645135);
 b7=uint32(63); % working orig binary mask

 w = bitand(bitshift(w, -1), b1) + bitand(w, b1);
 w = bitand(bitshift(w, -2), b2) + bitand(w, b2);
 w =bitand(w+bitshift(w, -4),b3);
 w =bitand(bitshift(w,-24)+bitshift(w,-16)+bitshift(w,-8)+w,b7);

end

Implemented the "Best 32 bit Algorithm" from the Stanford link at the top.
The improved algorithm reduced processing time by 6%.
Also optimized the segment size and found that 32K is stable and improves time by 15% over 4K.
Expect 4Kx4K time to be 40% of Vectorized Scheiner Algorithm.

function w = Ham(w)
% Input uint32
% Output vector of Ham wts
 for i=1:32768:length(w)
  w(i:i+32767)=Ham_seg(w(i:i+32767));
 end
end

% Segmentation gave reduced time by 50%

function w=Ham_seg(w)
 %speed
 b1=uint32(1431655765); 
 b2=uint32(858993459);
 b3=uint32(252645135);
 b7=uint32(63); % working orig binary mask

 w = bitand(bitshift(w, -1), b1) + bitand(w, b1);
 w = bitand(bitshift(w, -2), b2) + bitand(w, b2);
 w =bitand(w+bitshift(w, -4),b3);
 w =bitand(bitshift(w,-24)+bitshift(w,-16)+bitshift(w,-8)+w,b7);

end

回复收藏 0 原文

无可置疑 2024-07-31 09:31:50

在 Matlab Cody 上做了一些时序比较。
确定分段改进的矢量化 Scheiner 可提供最佳性能。

对于 L=4096*4096 向量，基于 Cody 1.30 秒到 0.60 秒的变化，时间减少了 50% 以上。

function w = Ham(w)
% Input uint32
% Output vector of Ham wts

 b1=uint32(1431655765); % evaluating saves 15% of time 1.30 to 1.1 sec
 b2=uint32(858993459);
 b3=uint32(252645135);
 b4=uint32(16711935);
 b5=uint32(65535);

 for i=1:4096:length(w)
  w(i:i+4095)=Ham_seg(w(i:i+4095),b1,b2,b3,b4,b5);
 end
end

% Segmentation reduced time by 50%

function w=Ham_seg(w,b1,b2,b3,b4,b5)
 % Passing variables or could evaluate b1:b5 here


 w = bitand(bitshift(w, -1), b1) + bitand(w, b1);
 w = bitand(bitshift(w, -2), b2) + bitand(w, b2);
 w = bitand(bitshift(w, -4), b3) + bitand(w, b3);
 w = bitand(bitshift(w, -8), b4) + bitand(w, b4);
 w = bitand(bitshift(w, -16), b5) + bitand(w, b5);

end





vt=randi(2^32,[4096*4096,1])-1;
% for vt being uint32 the floor function gives unexpected values
tic
v=num_ones(mod(vt,65536)+1)+num_ones(floor(vt/65536)+1); % 0.85 sec
toc
% a corrected method is
v=num_ones(mod(vt,65536)+1)+num_ones(floor(double(vt)/65536)+1);
toc

Did some timing comparisons on Matlab Cody.
Determined a Segmented Modified Vectorized Scheiner gives optimimum performance.

Have >50% time reduction based on Cody 1.30 sec to 0.60 sec change for an L=4096*4096 vector.

function w = Ham(w)
% Input uint32
% Output vector of Ham wts

 b1=uint32(1431655765); % evaluating saves 15% of time 1.30 to 1.1 sec
 b2=uint32(858993459);
 b3=uint32(252645135);
 b4=uint32(16711935);
 b5=uint32(65535);

 for i=1:4096:length(w)
  w(i:i+4095)=Ham_seg(w(i:i+4095),b1,b2,b3,b4,b5);
 end
end

% Segmentation reduced time by 50%

function w=Ham_seg(w,b1,b2,b3,b4,b5)
 % Passing variables or could evaluate b1:b5 here


 w = bitand(bitshift(w, -1), b1) + bitand(w, b1);
 w = bitand(bitshift(w, -2), b2) + bitand(w, b2);
 w = bitand(bitshift(w, -4), b3) + bitand(w, b3);
 w = bitand(bitshift(w, -8), b4) + bitand(w, b4);
 w = bitand(bitshift(w, -16), b5) + bitand(w, b5);

end





vt=randi(2^32,[4096*4096,1])-1;
% for vt being uint32 the floor function gives unexpected values
tic
v=num_ones(mod(vt,65536)+1)+num_ones(floor(vt/65536)+1); % 0.85 sec
toc
% a corrected method is
v=num_ones(mod(vt,65536)+1)+num_ones(floor(double(vt)/65536)+1);
toc

回复收藏 0 原文

鱼窥荷 2024-07-31 09:31:50

一种快速方法是使用查找表计算每个字节中的位，然后将这些值相加；事实上，这是问题中给出的网页上建议的方法之一。这种方法的优点在于，查找和求和都是 MATLAB 中的可向量化运算，因此您可以对此方法进行向量化，并非常快速地同时计算大量位串的汉明权重/设置位数。此方法在 MATLAB 文件交换上的 bitcount 提交中实现。

回复收藏 0 原文

蹲墙角沉默 2024-07-31 09:31:50

尝试将工作分成更小的部分。我的猜测是，如果您想一次处理所有数据，matlab 会尝试在执行连续步骤之前对所有整数执行每个操作，并且处理器的缓存会随着每个步骤而失效。

for i=1:4096,
    «process bits(i,:)»
end

Try splitting the job into smaller parts. My guess is that if you want to process all data at once, matlab is trying to do each operation on all integers before taking successive steps and the processor's cache is invalidated with each step.

for i=1:4096,
    «process bits(i,:)»
end

回复收藏 0 原文

開玄 2024-07-31 09:31:50

我在这里恢复一个旧线程，但我遇到了这个问题，我为它编写了一点代码：

distance = sum(bitget(bits, 1:32));

看起来很简洁，但我担心 bitget 是在 O(n ) 位移 操作。该代码适用于我要做的事情，但我的问题集不依赖于汉明权重。

I'm reviving an old thread here, but I ran across this problem and I wrote this little bit of code for it:

distance = sum(bitget(bits, 1:32));

Looks pretty concise, but I'm scared that bitget is implemented in O(n) bitshift operations. The code works for what I'm going, but my problem set doesn't rely on hamming weight.

回复收藏 0 原文

北风几吹夏 2024-07-31 09:31:50

num_ones=uint8(zeros(intmax('uint32')/2^6,1));
% one time load of array not implemented here
tic
for i=1:4096*4096
 %v=num_ones(rem(i,64)+1)+num_ones(floor(i/64)+1); % 1.24 sec
 v=num_ones(mod(i,64)+1)+num_ones(floor(i/64)+1); % 1.20 sec
end
toc
tic
num_ones=uint8(zeros(65536,1));
for i=0:65535
 num_ones(i+1)=length( find( bitget( i, 1:32 ) ) ) ;
end
toc
% 0.43 sec to load
% smaller array to initialize
% one time load of array
tic
for i=1:4096*4096
 v=num_ones(mod(i,65536)+1)+num_ones(floor(i/65536)+1); %  0.95 sec
 %v=num_ones(mod(i,65536)+1)+num_ones(bitshift(i,-16)+1); % 16 sec for 4K*1K
end
toc
%vectorized
tic
num_ones=uint8(zeros(65536,1));
for i=0:65535
 num_ones(i+1)=length( find( bitget( i, 1:32 ) ) ) ;
end % 0.43 sec
toc
vt=randi(2^32,[4096*4096,1])-1;
tic
v=num_ones(mod(vt,65536)+1)+num_ones(floor(vt/65536)+1); % 0.85 sec
toc

num_ones=uint8(zeros(intmax('uint32')/2^6,1));
% one time load of array not implemented here
tic
for i=1:4096*4096
 %v=num_ones(rem(i,64)+1)+num_ones(floor(i/64)+1); % 1.24 sec
 v=num_ones(mod(i,64)+1)+num_ones(floor(i/64)+1); % 1.20 sec
end
toc
tic
num_ones=uint8(zeros(65536,1));
for i=0:65535
 num_ones(i+1)=length( find( bitget( i, 1:32 ) ) ) ;
end
toc
% 0.43 sec to load
% smaller array to initialize
% one time load of array
tic
for i=1:4096*4096
 v=num_ones(mod(i,65536)+1)+num_ones(floor(i/65536)+1); %  0.95 sec
 %v=num_ones(mod(i,65536)+1)+num_ones(bitshift(i,-16)+1); % 16 sec for 4K*1K
end
toc
%vectorized
tic
num_ones=uint8(zeros(65536,1));
for i=0:65535
 num_ones(i+1)=length( find( bitget( i, 1:32 ) ) ) ;
end % 0.43 sec
toc
vt=randi(2^32,[4096*4096,1])-1;
tic
v=num_ones(mod(vt,65536)+1)+num_ones(floor(vt/65536)+1); % 0.85 sec
toc

回复收藏 0 原文

~没有更多了~