如何在Python中处理大矩阵和矩阵乘法

发布于 2025-01-18 01:54:41 字数 628 浏览 1 评论 0原文

我正在尝试执行具有以下方案的矩阵乘法:

C = np.dot(np.dot(sparse.csr_matrix(np.double(A).transpose()),sparse.spdiags(B,0,Ngrid,Ngrid)), sparse.csr_matrix(np.double(A)))

因此,我想转置矩阵A,该矩阵A导致n x m矩阵带有m>> n,并用对角矩阵乘以m x m x m矩阵。 b是“主角”。所得矩阵(N x M)应乘以矩阵A(M x N)并导致N x n矩阵C。

出现错误是以下内容:

<2000x921600 sparse matrix of type '<class 'numpy.float64'>'
    with 1843066024 stored elements in Compressed Sparse Row format>

由于最终矩阵为n x n,我想拥有此 错误矩阵作为numpy数组。您如何看待,我试图将矩阵在中间制作,因为毫无用处的对角矩阵。但是,我无法理解为什么Python需要使用1843066024元素进行这种疯狂的大型矩阵才能进行乘法。

您是否有一些想法和/或解释为什么会出现此问题?

I'm trying to execute a matrix multiplication which has the following scheme:

C = np.dot(np.dot(sparse.csr_matrix(np.double(A).transpose()),sparse.spdiags(B,0,Ngrid,Ngrid)), sparse.csr_matrix(np.double(A)))

Thus, I want to transpose matrix A, which lead to a N x M matrix with M>>N and multiply with the diagonal matrix which is a M x M matrix. B is the „main diagonal“. The resulting matrix (N x M) should be multiplied with matrix A (M x N) and lead to the N x N matrix C.

The error appears is the following:

<2000x921600 sparse matrix of type '<class 'numpy.float64'>'
    with 1843066024 stored elements in Compressed Sparse Row format>

As the final matrix is N x N, I want to have this matrix as a numpy array. How you see, I try to make the matrix in the mid as a sparsity diagonal matrix which works well. But, I cant understand why Python need this insane large matrix with 1843066024 elements to conduct the multiplication.

Do you have some ideas and/or explanation why this problem appears?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

懷念過去 2025-01-25 01:54:41

如果b是对角线,则无需使用稀疏来保存内存。您可以使用1D阵列进行计算,只有对角值。

我将用较小的维度演示,其中制作完整的b不会引起问题。其他人可以用较大的维度进行测试。

In [5]: A = np.arange(24).reshape(3,8)
In [7]: B = np.diag(np.arange(8))
In [8]: B.shape
Out[8]: (8, 8)

双重matmul:

In [10]: A@[email protected]
Out[10]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

等效einsum

In [12]: np.einsum('ij,jk,lk',A,B,A)
Out[12]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

b的1D对角线:

In [15]: b = np.diag(B)

广播的乘​​法与Matmul相同的事情:

In [17]: np.allclose(A*b,A@B)
Out[17]: True
In [18]: np.allclose(A*[email protected],A@[email protected])
Out[18]: True

einsum表达它:

In [19]: np.einsum('ij,j,lj',A,b,A)
Out[19]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

一些比较时间:

In [20]: timeit np.einsum('ij,j,lj',A,b,A)
10.6 µs ± 22.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [21]: timeit np.einsum('ij,jk,lk',A,B,A)
15.2 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [22]: timeit A@[email protected]
7.26 µs ± 20.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [23]: timeit A*[email protected]
8.17 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

对于einsum“冷凝”版本更快。使用matmul完整的对角线略快。

但是,在大型数组中创建完整的b可能会有问题,使用b可能会更快。另外,它在其他方面也观察到,因此由于记忆力更好,较小阵列上的迭代速度可以更快。

np.array([A[i,:]*[email protected] for i in range(3)])

If B is diagonal, you don't need to use sparse to save memory. You can do the calculation with a 1d array, just the diagonal values.

I'll demonstrate with small dimensions, where making a full B is doesn't cause problems. Others can test this with large dimensions.

In [5]: A = np.arange(24).reshape(3,8)
In [7]: B = np.diag(np.arange(8))
In [8]: B.shape
Out[8]: (8, 8)

The double matmul:

In [10]: A@[email protected]
Out[10]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

The equivalent einsum:

In [12]: np.einsum('ij,jk,lk',A,B,A)
Out[12]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

The 1d diagonal of B:

In [15]: b = np.diag(B)

A broadcasted multiplication does the same thing as the matmul:

In [17]: np.allclose(A*b,A@B)
Out[17]: True
In [18]: np.allclose(A*[email protected],A@[email protected])
Out[18]: True

Expressing that with einsum:

In [19]: np.einsum('ij,j,lj',A,b,A)
Out[19]: 
array([[  784,  1904,  3024],
       [ 1904,  4816,  7728],
       [ 3024,  7728, 12432]])

Some comparative times:

In [20]: timeit np.einsum('ij,j,lj',A,b,A)
10.6 µs ± 22.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [21]: timeit np.einsum('ij,jk,lk',A,B,A)
15.2 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [22]: timeit A@[email protected]
7.26 µs ± 20.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [23]: timeit A*[email protected]
8.17 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

For einsum the 'condensed' version is faster. With matmul the full diagonal is marginally faster.

But with large arrays where creating a full B might a problem, using b might be faster. Also it's been observed in other SO that iterations on smaller arrays can be faster, due to better memory handling.

np.array([A[i,:]*[email protected] for i in range(3)])
椒妓 2025-01-25 01:54:41

您正在这样做...过于复杂。这是m&gt;&gt; n(您对此不一致)。

import numpy as np

B = sparse.spdiags(B,0,Ngrid,Ngrid)) # [M x M] sparse
A = np.ndarray(..., dtype=float)     # [M x N] dense

C = A.T @ B   # [N x M] dense
C = C @ A     # [N x N] dense

c是您想要的数组。您的中间产品都不是m x m。如果您仍然存在内存问题,则需要获得更多的内存,或者需要将问题切成m轴上的较小零件,然后将它们计算为分段。

You're doing this... overly complicated. Here's a straightforward path for M >> N (you're inconsistent on that).

import numpy as np

B = sparse.spdiags(B,0,Ngrid,Ngrid)) # [M x M] sparse
A = np.ndarray(..., dtype=float)     # [M x N] dense

C = A.T @ B   # [N x M] dense
C = C @ A     # [N x N] dense

C is then the array that you want. None of your intermediate products are M x M. If you still have memory problems, you either need to get more memory, or you need to chunk your problem into smaller pieces on your m axis and calculate them piecewise.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文