如何在Python中处理大矩阵和矩阵乘法

发布于 2025-01-18 01:54:41 字数 628 浏览 1 评论 0原文

我正在尝试执行具有以下方案的矩阵乘法：

C = np.dot(np.dot(sparse.csr_matrix(np.double(A).transpose()),sparse.spdiags(B,0,Ngrid,Ngrid)), sparse.csr_matrix(np.double(A)))

因此，我想转置矩阵A，该矩阵A导致n x m矩阵带有m＆gt;＆gt; n，并用对角矩阵乘以m x m x m矩阵。 b是“主角”。所得矩阵（N x M）应乘以矩阵A（M x N）并导致N x n矩阵C。

出现错误是以下内容：

<2000x921600 sparse matrix of type '<class 'numpy.float64'>'
    with 1843066024 stored elements in Compressed Sparse Row format>

由于最终矩阵为n x n，我想拥有此错误矩阵作为numpy数组。您如何看待，我试图将矩阵在中间制作，因为毫无用处的对角矩阵。但是，我无法理解为什么Python需要使用1843066024元素进行这种疯狂的大型矩阵才能进行乘法。

您是否有一些想法和/或解释为什么会出现此问题？

原文

I'm trying to execute a matrix multiplication which has the following scheme:

C = np.dot(np.dot(sparse.csr_matrix(np.double(A).transpose()),sparse.spdiags(B,0,Ngrid,Ngrid)), sparse.csr_matrix(np.double(A)))

Thus, I want to transpose matrix A, which lead to a N x M matrix with M>>N and multiply with the diagonal matrix which is a M x M matrix. B is the „main diagonal“. The resulting matrix (N x M) should be multiplied with matrix A (M x N) and lead to the N x N matrix C.

The error appears is the following:

<2000x921600 sparse matrix of type '<class 'numpy.float64'>'
    with 1843066024 stored elements in Compressed Sparse Row format>

As the final matrix is N x N, I want to have this matrix as a numpy array. How you see, I try to make the matrix in the mid as a sparsity diagonal matrix which works well. But, I cant understand why Python need this insane large matrix with 1843066024 elements to conduct the multiplication.

Do you have some ideas and/or explanation why this problem appears?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

懷念過去 2025-01-25 01:54:41

如果b是对角线，则无需使用稀疏来保存内存。您可以使用1D阵列进行计算，只有对角值。

我将用较小的维度演示，其中制作完整的b不会引起问题。其他人可以用较大的维度进行测试。

In [5]: A = np.arange(24).reshape(3,8)
In [7]: B = np.diag(np.arange(8))
In [8]: B.shape
Out[8]: (8, 8)

双重matmul：

In [10]: A@[email protected]
Out[10]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

等效einsum：

In [12]: np.einsum('ij,jk,lk',A,B,A)
Out[12]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

b的1D对角线：

In [15]: b = np.diag(B)

广播的乘法与Matmul相同的事情：

In [17]: np.allclose(A*b,A@B)
Out[17]: True
In [18]: np.allclose(A*[email protected],A@[email protected])
Out[18]: True

用einsum表达它：

In [19]: np.einsum('ij,j,lj',A,b,A)
Out[19]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

一些比较时间：

In [20]: timeit np.einsum('ij,j,lj',A,b,A)
10.6 µs ± 22.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [21]: timeit np.einsum('ij,jk,lk',A,B,A)
15.2 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [22]: timeit A@[email protected]
7.26 µs ± 20.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [23]: timeit A*[email protected]
8.17 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

对于einsum“冷凝”版本更快。使用matmul完整的对角线略快。

但是，在大型数组中创建完整的b可能会有问题，使用b可能会更快。另外，它在其他方面也观察到，因此由于记忆力更好，较小阵列上的迭代速度可以更快。

np.array([A[i,:]*[email protected] for i in range(3)])

If B is diagonal, you don't need to use sparse to save memory. You can do the calculation with a 1d array, just the diagonal values.

I'll demonstrate with small dimensions, where making a full B is doesn't cause problems. Others can test this with large dimensions.

In [5]: A = np.arange(24).reshape(3,8)
In [7]: B = np.diag(np.arange(8))
In [8]: B.shape
Out[8]: (8, 8)

The double matmul:

In [10]: A@[email protected]
Out[10]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

The equivalent einsum:

In [12]: np.einsum('ij,jk,lk',A,B,A)
Out[12]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

The 1d diagonal of B:

In [15]: b = np.diag(B)

A broadcasted multiplication does the same thing as the matmul:

In [17]: np.allclose(A*b,A@B)
Out[17]: True
In [18]: np.allclose(A*[email protected],A@[email protected])
Out[18]: True

Expressing that with einsum:

In [19]: np.einsum('ij,j,lj',A,b,A)
Out[19]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

Some comparative times:

In [20]: timeit np.einsum('ij,j,lj',A,b,A)
10.6 µs ± 22.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [21]: timeit np.einsum('ij,jk,lk',A,B,A)
15.2 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [22]: timeit A@[email protected]
7.26 µs ± 20.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [23]: timeit A*[email protected]
8.17 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

For einsum the 'condensed' version is faster. With matmul the full diagonal is marginally faster.

But with large arrays where creating a full B might a problem, using b might be faster. Also it's been observed in other SO that iterations on smaller arrays can be faster, due to better memory handling.

np.array([A[i,:]*[email protected] for i in range(3)])

回复收藏 0 原文

椒妓 2025-01-25 01:54:41

您正在这样做...过于复杂。这是m＆gt;＆gt; n（您对此不一致）。

import numpy as np

B = sparse.spdiags(B,0,Ngrid,Ngrid)) # [M x M] sparse
A = np.ndarray(..., dtype=float)     # [M x N] dense

C = A.T @ B   # [N x M] dense
C = C @ A     # [N x N] dense

c是您想要的数组。您的中间产品都不是m x m。如果您仍然存在内存问题，则需要获得更多的内存，或者需要将问题切成m轴上的较小零件，然后将它们计算为分段。

You're doing this... overly complicated. Here's a straightforward path for M >> N (you're inconsistent on that).

import numpy as np

B = sparse.spdiags(B,0,Ngrid,Ngrid)) # [M x M] sparse
A = np.ndarray(..., dtype=float)     # [M x N] dense

C = A.T @ B   # [N x M] dense
C = C @ A     # [N x N] dense

C is then the array that you want. None of your intermediate products are M x M. If you still have memory problems, you either need to get more memory, or you need to chunk your problem into smaller pieces on your m axis and calculate them piecewise.

回复收藏 0 原文

~没有更多了~