MATLAB：如何对两个矩阵数组进行向量乘法？

发布于 2024-11-18 15:18:47 字数 486 浏览 4 评论 0原文

我有两个 3 维数组，前两个维度表示矩阵，最后一个维度通过参数空间进行计数，作为一个简单的示例

A = repmat([1,2; 3,4], [1 1 4]);

（但假设 A(:,:,j) 是不同的对于每个j）。如何轻松地执行两个这样的矩阵数组 A 和 B 的每j 矩阵乘法？

C = A; % pre-allocate, nan(size(A,1), size(B,2)) would be better but slower
for jj = 1:size(A, 3)
  C(:,:,jj) = A(:,:,jj) * B(:,:,jj);
end

当然可以完成这项工作，但如果第三维更像 1e3 个元素，则速度会非常慢，因为它不使用 MATLAB 的矢量化。那么，有没有更快的方法呢？

原文

I have two 3-dimensional arrays, the first two dimensions of which represent matrices and the last one counts through a parameterspace, as a simple example take

A = repmat([1,2; 3,4], [1 1 4]);

(but assume A(:,:,j) is different for each j). How can one easily perform a per-j matrix multiplication of two such matrix-arrays A and B?

C = A; % pre-allocate, nan(size(A,1), size(B,2)) would be better but slower
for jj = 1:size(A, 3)
  C(:,:,jj) = A(:,:,jj) * B(:,:,jj);
end

certainly does the job, but if the third dimension is more like 1e3 elements this is very slow since it doesn't use MATLAB's vectorization. So, is there a faster way?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

故事与诗 2024-11-25 15:18:47

我现在做了一些计时测试，2x2xN 的最快方法是计算矩阵元素：

C = A;
C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);

在一般情况下，for 循环实际上是最快的（不过不要忘记预先分配 C！）。

如果已经将结果作为矩阵元胞数组，请使用 cellfun 是最快的选择，它也比循环单元格元素更快：

C = cellfun(@mtimes, A, B, 'UniformOutput', false);

但是，必须调用 num2cell 首先 (Ac = num2cell(A, [1 2])) 和 cell2mat 对于 3d 数组情况浪费太多时间。

以下是我对 2 x 2 x 1e4 的随机集所做的一些计时：

 array-for: 0.057112
 arrayfun : 0.14206
 num2cell : 0.079468
 cell-for : 0.033173
 cellfun  : 0.025223
 cell2mat : 0.010213
 explicit : 0.0021338

显式是指使用 2 x 2 矩阵元素的直接计算，请参见下文。
对于新的随机数组，结果类似，如果之前不需要 num2cell 并且没有 2x2xN 的限制，则 cellfun 是最快的。对于一般的 3d 数组来说，在第三维上循环确实是最快的选择。这是计时代码：

n = 2;
m = 2;
l = 1e4;

A = rand(n,m,l);
B = rand(m,n,l);

% naive for-loop:
tic
%Cf = nan(n,n,l);
Cf = A;
for jl = 1:l
    Cf(:,:,jl) = A(:,:,jl) * B(:,:,jl);
end;
disp([' array-for: ' num2str(toc)]);

% using arrayfun:
tic
Ca = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:size(A,3), 'UniformOutput',false);
Ca = cat(3,Ca{:});
disp([' arrayfun : ' num2str(toc)]);

tic
Ac = num2cell(A, [1 2]);
Bc = num2cell(B, [1 2]);
disp([' num2cell : ' num2str(toc)]);

% cell for-loop:
tic
Cfc = Ac;
for jl = 1:l
    Cfc{jl} = Ac{jl} * Bc{jl};
end;
disp([' cell-for : ' num2str(toc)]);

% using cellfun:
tic
Cc = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
disp([' cellfun  : ' num2str(toc)]);

tic
Cc = cell2mat(Cc);
disp([' cell2mat : ' num2str(toc)]);

tic
Cm = A;
Cm(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
Cm(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
Cm(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
Cm(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
disp([' explicit : ' num2str(toc)]);

disp(' ');

I did some timing tests now, the fastest way for 2x2xN turns out to be calculating the matrix elements:

C = A;
C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);

In the general case it turns out the for loop is actually the fastest (don't forget to pre-allocate C though!).

Should one already have the result as cell-array of matrices though, using cellfun is the fastest choice, it is also faster than looping over the cell elements:

C = cellfun(@mtimes, A, B, 'UniformOutput', false);

However, having to call num2cell first (Ac = num2cell(A, [1 2])) and cell2mat for the 3d-array case wastes too much time.

Here's some timing I did for a random set of 2 x 2 x 1e4:

 array-for: 0.057112
 arrayfun : 0.14206
 num2cell : 0.079468
 cell-for : 0.033173
 cellfun  : 0.025223
 cell2mat : 0.010213
 explicit : 0.0021338

Explicit refers to using direct calculation of the 2 x 2 matrix elements, see bellow.
The result is similar for new random arrays, cellfun is the fastest if no num2cell is required before and there is no restriction to 2x2xN. For general 3d-arrays looping over the third dimension is indeed the fastest choice already. Here's the timing code:

n = 2;
m = 2;
l = 1e4;

A = rand(n,m,l);
B = rand(m,n,l);

% naive for-loop:
tic
%Cf = nan(n,n,l);
Cf = A;
for jl = 1:l
    Cf(:,:,jl) = A(:,:,jl) * B(:,:,jl);
end;
disp([' array-for: ' num2str(toc)]);

% using arrayfun:
tic
Ca = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:size(A,3), 'UniformOutput',false);
Ca = cat(3,Ca{:});
disp([' arrayfun : ' num2str(toc)]);

tic
Ac = num2cell(A, [1 2]);
Bc = num2cell(B, [1 2]);
disp([' num2cell : ' num2str(toc)]);

% cell for-loop:
tic
Cfc = Ac;
for jl = 1:l
    Cfc{jl} = Ac{jl} * Bc{jl};
end;
disp([' cell-for : ' num2str(toc)]);

% using cellfun:
tic
Cc = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
disp([' cellfun  : ' num2str(toc)]);

tic
Cc = cell2mat(Cc);
disp([' cell2mat : ' num2str(toc)]);

tic
Cm = A;
Cm(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
Cm(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
Cm(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
Cm(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
disp([' explicit : ' num2str(toc)]);

disp(' ');

回复收藏 0 原文

看轻我的陪伴 2024-11-25 15:18:47

这是我的基准测试，比较中提到的方法@TobiasKienzler 回答。我正在使用 TIMEIT 函数来获得更准确的计时。

function [t,v] = matrixMultTest()
    n = 2; m = 2; p = 1e5;
    A = rand(n,m,p);
    B = rand(m,n,p);

    %# time functions
    t = zeros(5,1);
    t(1) = timeit( @() func1(A,B,n,m,p) );
    t(2) = timeit( @() func2(A,B,n,m,p) );
    t(3) = timeit( @() func3(A,B,n,m,p) );
    t(4) = timeit( @() func4(A,B,n,m,p) );
    t(5) = timeit( @() func5(A,B,n,m,p) );

    %# check the results
    v = cell(5,1);
    v{1} = func1(A,B,n,m,p);
    v{2} = func2(A,B,n,m,p);
    v{3} = func3(A,B,n,m,p);
    v{4} = func4(A,B,n,m,p);
    v{5} = func5(A,B,n,m,p);
    assert( isequal(v{:}) )
end

%# simple FOR-loop
function C = func1(A,B,n,m,p)
    C = zeros(n,n,p);
    for k=1:p
        C(:,:,k) = A(:,:,k) * B(:,:,k);
    end
end

%# ARRAYFUN
function C = func2(A,B,n,m,p)
    C = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:p, 'UniformOutput',false);
    C = cat(3, C{:});
end

%# NUM2CELL/FOR-loop/CELL2MAT
function C = func3(A,B,n,m,p)
    Ac = num2cell(A, [1 2]);
    Bc = num2cell(B, [1 2]);
    C = cell(1,1,p);
    for k=1:p
        C{k} = Ac{k} * Bc{k};
    end;
    C = cell2mat(C);
end

%# NUM2CELL/CELLFUN/CELL2MAT
function C = func4(A,B,n,m,p)
    Ac = num2cell(A, [1 2]);
    Bc = num2cell(B, [1 2]);
    C = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
    C = cell2mat(C);
end

%# Loop Unrolling
function C = func5(A,B,n,m,p)
    C = zeros(n,n,p);
    C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
    C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
    C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
    C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
end

结果：

>> [t,v] = matrixMultTest();
>> t
t =
      0.63633      # FOR-loop
      1.5902       # ARRAYFUN
      1.1257       # NUM2CELL/FOR-loop/CELL2MAT
      1.0759       # NUM2CELL/CELLFUN/CELL2MAT
      0.05712      # Loop Unrolling

正如我在评论中解释的那样，简单的 FOR 循环是最好的解决方案（缺少循环展开在最后一种情况下，这只适用于这些小的 2×2 矩阵）。

Here is my benchmark test comparing the methods mentioned in @TobiasKienzler answer. I am using the TIMEIT function to get more accurate timings.

function [t,v] = matrixMultTest()
    n = 2; m = 2; p = 1e5;
    A = rand(n,m,p);
    B = rand(m,n,p);

    %# time functions
    t = zeros(5,1);
    t(1) = timeit( @() func1(A,B,n,m,p) );
    t(2) = timeit( @() func2(A,B,n,m,p) );
    t(3) = timeit( @() func3(A,B,n,m,p) );
    t(4) = timeit( @() func4(A,B,n,m,p) );
    t(5) = timeit( @() func5(A,B,n,m,p) );

    %# check the results
    v = cell(5,1);
    v{1} = func1(A,B,n,m,p);
    v{2} = func2(A,B,n,m,p);
    v{3} = func3(A,B,n,m,p);
    v{4} = func4(A,B,n,m,p);
    v{5} = func5(A,B,n,m,p);
    assert( isequal(v{:}) )
end

%# simple FOR-loop
function C = func1(A,B,n,m,p)
    C = zeros(n,n,p);
    for k=1:p
        C(:,:,k) = A(:,:,k) * B(:,:,k);
    end
end

%# ARRAYFUN
function C = func2(A,B,n,m,p)
    C = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:p, 'UniformOutput',false);
    C = cat(3, C{:});
end

%# NUM2CELL/FOR-loop/CELL2MAT
function C = func3(A,B,n,m,p)
    Ac = num2cell(A, [1 2]);
    Bc = num2cell(B, [1 2]);
    C = cell(1,1,p);
    for k=1:p
        C{k} = Ac{k} * Bc{k};
    end;
    C = cell2mat(C);
end

%# NUM2CELL/CELLFUN/CELL2MAT
function C = func4(A,B,n,m,p)
    Ac = num2cell(A, [1 2]);
    Bc = num2cell(B, [1 2]);
    C = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
    C = cell2mat(C);
end

%# Loop Unrolling
function C = func5(A,B,n,m,p)
    C = zeros(n,n,p);
    C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
    C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
    C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
    C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
end

The results:

>> [t,v] = matrixMultTest();
>> t
t =
      0.63633      # FOR-loop
      1.5902       # ARRAYFUN
      1.1257       # NUM2CELL/FOR-loop/CELL2MAT
      1.0759       # NUM2CELL/CELLFUN/CELL2MAT
      0.05712      # Loop Unrolling

As I explained in the comments, a simple FOR-loop is the best solution (short of loop unwinding in the last case, which is only feasible for these small 2-by-2 matrices).

回复收藏 0 原文

酒几许 2024-11-25 15:18:47

我强烈建议您使用 MMX 工具箱< /a> 的 matlab.它可以尽可能快地乘以n维矩阵。

MMX的优点是：

使用简单。
乘以n维矩阵（实际上它可以乘以二维矩阵数组）
它执行其他矩阵运算（转置、二次乘法、Chol分解等等）
它使用< strong>C 编译器和多线程计算以提高速度。

对于这个问题，你只需要编写这个命令：

C=mmx('mul',A,B);

我在@Amro的答案中添加了以下函数

%# mmx toolbox
function C=func6(A,B,n,m,p)
    C=mmx('mul',A,B);
end

我得到了n=2,m=2,p=1e5的结果：

    1.6571 # FOR-loop
    4.3110 # ARRAYFUN
    3.3731 # NUM2CELL/FOR-loop/CELL2MAT
    2.9820 # NUM2CELL/CELLFUN/CELL2MAT
    0.0244 # Loop Unrolling
    0.0221 # MMX toolbox  <===================

我使用了@Amro的代码运行基准测试。

I highly recommend you use the MMX toolbox of matlab. It can multiply n-dimensional matrices as fast as possible.

The advantages of MMX are:

It is easy to use.
Multiply n-dimensional matrices (actually it can multiply arrays of 2-D matrices)
It performs other matrix operations (transpose, Quadratic Multiply, Chol decomposition and more)
It uses C compiler and multi-thread computation for speed up.

For this problem, you just need to write this command:

C=mmx('mul',A,B);

I added the following function to @Amro's answer

%# mmx toolbox
function C=func6(A,B,n,m,p)
    C=mmx('mul',A,B);
end

I got this result for n=2,m=2,p=1e5:

    1.6571 # FOR-loop
    4.3110 # ARRAYFUN
    3.3731 # NUM2CELL/FOR-loop/CELL2MAT
    2.9820 # NUM2CELL/CELLFUN/CELL2MAT
    0.0244 # Loop Unrolling
    0.0221 # MMX toolbox  <===================

I used @Amro's code to run the benchmark.

回复收藏 0 原文

拥抱影子 2024-11-25 15:18:47

一种技术是创建一个 2Nx2N 稀疏矩阵，并在 A 和 B 的对角线上嵌入 2x2 矩阵。使用稀疏矩阵进行乘积，并通过稍微巧妙的索引获取结果，并将其重塑为 2x2xN。

但我怀疑这会比简单的循环更快。

回复收藏 0 原文

停滞 2024-11-25 15:18:47

根据我的经验，一种更快的方法是对三维矩阵使用点乘和求和。以下函数 z_matmultiply(A,B) 将两个具有相同深度的三维矩阵相乘。点乘以尽可能并行的方式完成，因此您可能需要检查此函数的速度，并将其与其他函数进行大量重复比较。

function C = z_matmultiply(A,B)

[ma,na,oa] = size(A);
[mb,nb,ob] = size(B);

%preallocate the output as we will do a loop soon
C = zeros(ma,nb,oa);

%error message if the dimensions are not appropriate
if na ~= mb || oa ~= ob
    fprintf('\n z_matmultiply warning: Matrix Dimmensions Inconsistent \n')
else

% if statement minimizes for loops by looping the smallest matrix dimension 
if ma > nb
    for j = 1:nb
        Bp(j,:,:) = B(:,j,:);
        C(:,j,:) = sum(A.*repmat(Bp(j,:,:),[ma,1]),2);
    end
else
    for i = 1:ma
        Ap(:,i,:) = A(i,:,:);
        C(i,:,:) = sum(repmat(Ap(:,i,:),[1,nb]).*B,1);
    end 
end

end

An even faster method, in my experience, is to use dot multiplication and summation over the three-dimensional matrix. The following function, function z_matmultiply(A,B) multiplies two three dimensional matrices which have the same depth. Dot multiplication is done in a manner that is as parallel as possible, thus you might want to check the speed of this function and compare it to others over a large number of repetitions.

function C = z_matmultiply(A,B)

[ma,na,oa] = size(A);
[mb,nb,ob] = size(B);

%preallocate the output as we will do a loop soon
C = zeros(ma,nb,oa);

%error message if the dimensions are not appropriate
if na ~= mb || oa ~= ob
    fprintf('\n z_matmultiply warning: Matrix Dimmensions Inconsistent \n')
else

% if statement minimizes for loops by looping the smallest matrix dimension 
if ma > nb
    for j = 1:nb
        Bp(j,:,:) = B(:,j,:);
        C(:,j,:) = sum(A.*repmat(Bp(j,:,:),[ma,1]),2);
    end
else
    for i = 1:ma
        Ap(:,i,:) = A(i,:,:);
        C(i,:,:) = sum(repmat(Ap(:,i,:),[1,nb]).*B,1);
    end 
end

end

回复收藏 0 原文

~没有更多了~