Scipy：稀疏矩阵乘法内存错误

发布于 2025-01-10 08:12:54 字数 1145 浏览 0 评论 0原文

我想在稀疏矩阵及其转置之间执行矩阵乘法（它们是大矩阵）。具体来说，我有：

C = csc_matrix(...)
Ct = csc_matrix.transpose(C)
L = Ct*C

和形状：

C.shape
(1791489, 28508141)
Ct.shape
(28508141, 1791489)

并且我收到以下错误：

Traceback (most recent call last):

  File "C:\...\modularity.py", line 373, in <module>
    L = Ct*C

  File "C:\...\anaconda3\lib\site-packages\scipy\sparse\base.py", line 480, in __mul__
    return self._mul_sparse_matrix(other)

  File "C:\...\anaconda3\lib\site-packages\scipy\sparse\compressed.py", line 518, in _mul_sparse_matrix
    indices = np.empty(nnz, dtype=idx_dtype)

MemoryError: Unable to allocate 1.11 TiB for an array with shape (152087117507,) and data type int64

我无法弄清楚为什么，为什么它尝试为这么大的数组分配内存？

更新：目前我正在尝试像这样分块进行乘法

chunksize=1000
numiter = Ct.shape[0]//chunksize
blocks=[]
for i in range(numiter):
    A = Ct[i*chunksize:(i+1)*chunksize].dot(C)
    blocks.append(A)

但我得到：

MemoryError: Unable to allocate 217. MiB for an array with shape (57012620,) and data type int32

原文

I want to perform matrix multiplication between a sparse matrix and its transpose, (their are big matrices). Specifically, I have:

C = csc_matrix(...)
Ct = csc_matrix.transpose(C)
L = Ct*C

and shapes:

C.shape
(1791489, 28508141)
Ct.shape
(28508141, 1791489)

And I am getting the following error:

Traceback (most recent call last):

  File "C:\...\modularity.py", line 373, in <module>
    L = Ct*C

  File "C:\...\anaconda3\lib\site-packages\scipy\sparse\base.py", line 480, in __mul__
    return self._mul_sparse_matrix(other)

  File "C:\...\anaconda3\lib\site-packages\scipy\sparse\compressed.py", line 518, in _mul_sparse_matrix
    indices = np.empty(nnz, dtype=idx_dtype)

MemoryError: Unable to allocate 1.11 TiB for an array with shape (152087117507,) and data type int64

I cannot figure out why, why does it try to allocate memory for such a huge array ?

Update: Currently I am trying to do the multiplication in chunks like this

chunksize=1000
numiter = Ct.shape[0]//chunksize
blocks=[]
for i in range(numiter):
    A = Ct[i*chunksize:(i+1)*chunksize].dot(C)
    blocks.append(A)

But I get:

MemoryError: Unable to allocate 217. MiB for an array with shape (57012620,) and data type int32

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

方觉久 2025-01-17 08:12:54

对于未来想要乘以巨大稀疏矩阵的观众，我使用PyTables解决了我的问题，并将乘法结果保存在块中。它仍然会创建一个大文件，但至少被压缩了。我使用的代码如下所示：

import tables as tb

f = tb.open_file('D:\dot.h5', 'w')
l, m, n = Ct.shape[0], Ct.shape[1], C.shape[1]
filters = tb.Filters(complevel=8, complib='blosc')
out_data = f.create_earray(f.root, 'data', tb.Int32Atom(), shape=(0,), filters=filters)
out_indices = f.create_earray(f.root, 'indices', tb.Int32Atom(),shape=(0,), filters=filters)
out_indptr = f.create_earray(f.root, 'indptr', tb.Int32Atom(), shape=(0,), filters=filters)
out_indptr.append(np.array([0])) #this is needed as a first indptr
max_indptr = 0
#buffersize
bl = 10000
for i in range(0, l, bl):
 res = Ct[i:min(i+bl, l),:].dot(C)
 out_data.append(res.data)
 indices = res.indices
 indptr = res.indptr
 out_indices.append(indices)
 out_indptr.append(max_indptr+indptr[1:])
 max_indptr += indices.shape[0]

因此，如果您想访问最终矩阵的第二行，您只需：

L2 = csr_matrix((a.data[a.indptr[2]:a.indptr[2+1]], a.indices[a.indptr[2]:a.indptr[2+1]], np.array([0,len(a.indices[a.indptr[2]:a.indptr[2+1]])])), shape=(1,n))

For future viewers who want to multiply huge sparse matrices I solved my problem using PyTables and saved the result of the multiplication in chunks. Still it creates a big file but at least is compressed. The code I used goes like this:

import tables as tb

f = tb.open_file('D:\dot.h5', 'w')
l, m, n = Ct.shape[0], Ct.shape[1], C.shape[1]
filters = tb.Filters(complevel=8, complib='blosc')
out_data = f.create_earray(f.root, 'data', tb.Int32Atom(), shape=(0,), filters=filters)
out_indices = f.create_earray(f.root, 'indices', tb.Int32Atom(),shape=(0,), filters=filters)
out_indptr = f.create_earray(f.root, 'indptr', tb.Int32Atom(), shape=(0,), filters=filters)
out_indptr.append(np.array([0])) #this is needed as a first indptr
max_indptr = 0
#buffersize
bl = 10000
for i in range(0, l, bl):
 res = Ct[i:min(i+bl, l),:].dot(C)
 out_data.append(res.data)
 indices = res.indices
 indptr = res.indptr
 out_indices.append(indices)
 out_indptr.append(max_indptr+indptr[1:])
 max_indptr += indices.shape[0]

So if for example you want access to the 2nd row of your final matrix you simply can:

L2 = csr_matrix((a.data[a.indptr[2]:a.indptr[2+1]], a.indices[a.indptr[2]:a.indptr[2+1]], np.array([0,len(a.indices[a.indptr[2]:a.indptr[2+1]])])), shape=(1,n))

回复收藏 0 原文

~没有更多了~