MATLAB:如何对两个矩阵数组进行向量乘法?
我有两个 3 维数组,前两个维度表示矩阵,最后一个维度通过参数空间进行计数,作为一个简单的示例
A = repmat([1,2; 3,4], [1 1 4]);
(但假设 A(:,:,j)
是不同的对于每个j
)。如何轻松地执行两个这样的矩阵数组 A
和 B
的每j
矩阵乘法?
C = A; % pre-allocate, nan(size(A,1), size(B,2)) would be better but slower
for jj = 1:size(A, 3)
C(:,:,jj) = A(:,:,jj) * B(:,:,jj);
end
当然可以完成这项工作,但如果第三维更像 1e3 个元素,则速度会非常慢,因为它不使用 MATLAB 的矢量化。那么,有没有更快的方法呢?
I have two 3-dimensional arrays, the first two dimensions of which represent matrices and the last one counts through a parameterspace, as a simple example take
A = repmat([1,2; 3,4], [1 1 4]);
(but assume A(:,:,j)
is different for each j
). How can one easily perform a per-j
matrix multiplication of two such matrix-arrays A
and B
?
C = A; % pre-allocate, nan(size(A,1), size(B,2)) would be better but slower
for jj = 1:size(A, 3)
C(:,:,jj) = A(:,:,jj) * B(:,:,jj);
end
certainly does the job, but if the third dimension is more like 1e3 elements this is very slow since it doesn't use MATLAB's vectorization. So, is there a faster way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我现在做了一些计时测试,2x2xN 的最快方法是计算矩阵元素:
在一般情况下,for 循环实际上是最快的(不过不要忘记预先分配 C!)。
如果已经将结果作为矩阵元胞数组,请使用 cellfun 是最快的选择,它也比循环单元格元素更快:
但是,必须调用 num2cell 首先 (
Ac = num2cell(A, [1 2])
) 和cell2mat
对于 3d 数组情况浪费太多时间。以下是我对 2 x 2 x 1e4 的随机集所做的一些计时:
显式是指使用 2 x 2 矩阵元素的直接计算,请参见下文。
对于新的随机数组,结果类似,如果之前不需要
num2cell
并且没有 2x2xN 的限制,则cellfun
是最快的。对于一般的 3d 数组来说,在第三维上循环确实是最快的选择。这是计时代码:I did some timing tests now, the fastest way for 2x2xN turns out to be calculating the matrix elements:
In the general case it turns out the for loop is actually the fastest (don't forget to pre-allocate C though!).
Should one already have the result as cell-array of matrices though, using cellfun is the fastest choice, it is also faster than looping over the cell elements:
However, having to call num2cell first (
Ac = num2cell(A, [1 2])
) andcell2mat
for the 3d-array case wastes too much time.Here's some timing I did for a random set of 2 x 2 x 1e4:
Explicit refers to using direct calculation of the 2 x 2 matrix elements, see bellow.
The result is similar for new random arrays,
cellfun
is the fastest if nonum2cell
is required before and there is no restriction to 2x2xN. For general 3d-arrays looping over the third dimension is indeed the fastest choice already. Here's the timing code:这是我的基准测试,比较 中提到的方法@TobiasKienzler 回答。我正在使用 TIMEIT 函数来获得更准确的计时。
结果:
正如我在评论中解释的那样,简单的 FOR 循环是最好的解决方案(缺少 循环展开 在最后一种情况下,这只适用于这些小的 2×2 矩阵)。
Here is my benchmark test comparing the methods mentioned in @TobiasKienzler answer. I am using the TIMEIT function to get more accurate timings.
The results:
As I explained in the comments, a simple FOR-loop is the best solution (short of loop unwinding in the last case, which is only feasible for these small 2-by-2 matrices).
我强烈建议您使用 MMX 工具箱< /a> 的 matlab.它可以尽可能快地乘以n维矩阵。
MMX的优点是:
对于这个问题,你只需要编写这个命令:
我在@Amro的答案中添加了以下函数
我得到了
n=2,m=2,p=1e5
的结果:我使用了@Amro的代码运行基准测试。
I highly recommend you use the MMX toolbox of matlab. It can multiply n-dimensional matrices as fast as possible.
The advantages of MMX are:
For this problem, you just need to write this command:
I added the following function to @Amro's answer
I got this result for
n=2,m=2,p=1e5
:I used @Amro's code to run the benchmark.
一种技术是创建一个 2Nx2N 稀疏矩阵,并在 A 和 B 的对角线上嵌入 2x2 矩阵。使用稀疏矩阵进行乘积,并通过稍微巧妙的索引获取结果,并将其重塑为 2x2xN。
但我怀疑这会比简单的循环更快。
One technique would be to create a 2Nx2N sparse matrix and embed on the diagonal the 2x2 matrices, for both A and B. Do the product with sparse matrices and take the result with slightly clever indexing and reshape it to 2x2xN.
But I doubt this will be faster than simple looping.
根据我的经验,一种更快的方法是对三维矩阵使用点乘和求和。以下函数 z_matmultiply(A,B) 将两个具有相同深度的三维矩阵相乘。点乘以尽可能并行的方式完成,因此您可能需要检查此函数的速度,并将其与其他函数进行大量重复比较。
An even faster method, in my experience, is to use dot multiplication and summation over the three-dimensional matrix. The following function, function z_matmultiply(A,B) multiplies two three dimensional matrices which have the same depth. Dot multiplication is done in a manner that is as parallel as possible, thus you might want to check the speed of this function and compare it to others over a large number of repetitions.