Scipy:稀疏矩阵乘法内存错误

发布于 2025-01-10 08:12:54 字数 1145 浏览 0 评论 0原文

我想在稀疏矩阵及其转置之间执行矩阵乘法(它们是大矩阵)。具体来说,我有:

C = csc_matrix(...)
Ct = csc_matrix.transpose(C)
L = Ct*C

和形状:

C.shape
(1791489, 28508141)
Ct.shape
(28508141, 1791489)

并且我收到以下错误:

Traceback (most recent call last):

  File "C:\...\modularity.py", line 373, in <module>
    L = Ct*C

  File "C:\...\anaconda3\lib\site-packages\scipy\sparse\base.py", line 480, in __mul__
    return self._mul_sparse_matrix(other)

  File "C:\...\anaconda3\lib\site-packages\scipy\sparse\compressed.py", line 518, in _mul_sparse_matrix
    indices = np.empty(nnz, dtype=idx_dtype)

MemoryError: Unable to allocate 1.11 TiB for an array with shape (152087117507,) and data type int64

我无法弄清楚为什么,为什么它尝试为这么大的数组分配内存?

更新:目前我正在尝试像这样分块进行乘法

chunksize=1000
numiter = Ct.shape[0]//chunksize
blocks=[]
for i in range(numiter):
    A = Ct[i*chunksize:(i+1)*chunksize].dot(C)
    blocks.append(A)

但我得到:

MemoryError: Unable to allocate 217. MiB for an array with shape (57012620,) and data type int32

I want to perform matrix multiplication between a sparse matrix and its transpose, (their are big matrices). Specifically, I have:

C = csc_matrix(...)
Ct = csc_matrix.transpose(C)
L = Ct*C

and shapes:

C.shape
(1791489, 28508141)
Ct.shape
(28508141, 1791489)

And I am getting the following error:

Traceback (most recent call last):

  File "C:\...\modularity.py", line 373, in <module>
    L = Ct*C

  File "C:\...\anaconda3\lib\site-packages\scipy\sparse\base.py", line 480, in __mul__
    return self._mul_sparse_matrix(other)

  File "C:\...\anaconda3\lib\site-packages\scipy\sparse\compressed.py", line 518, in _mul_sparse_matrix
    indices = np.empty(nnz, dtype=idx_dtype)

MemoryError: Unable to allocate 1.11 TiB for an array with shape (152087117507,) and data type int64

I cannot figure out why, why does it try to allocate memory for such a huge array ?

Update: Currently I am trying to do the multiplication in chunks like this

chunksize=1000
numiter = Ct.shape[0]//chunksize
blocks=[]
for i in range(numiter):
    A = Ct[i*chunksize:(i+1)*chunksize].dot(C)
    blocks.append(A)

But I get:

MemoryError: Unable to allocate 217. MiB for an array with shape (57012620,) and data type int32

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

方觉久 2025-01-17 08:12:54

对于未来想要乘以巨大稀疏矩阵的观众,我使用PyTables解决了我的问题,并将乘法结果保存在块中。它仍然会创建一个大文件,但至少被压缩了。我使用的代码如下所示:

import tables as tb

f = tb.open_file('D:\dot.h5', 'w')
l, m, n = Ct.shape[0], Ct.shape[1], C.shape[1]
filters = tb.Filters(complevel=8, complib='blosc')
out_data = f.create_earray(f.root, 'data', tb.Int32Atom(), shape=(0,), filters=filters)
out_indices = f.create_earray(f.root, 'indices', tb.Int32Atom(),shape=(0,), filters=filters)
out_indptr = f.create_earray(f.root, 'indptr', tb.Int32Atom(), shape=(0,), filters=filters)
out_indptr.append(np.array([0])) #this is needed as a first indptr
max_indptr = 0
#buffersize
bl = 10000
for i in range(0, l, bl):
 res = Ct[i:min(i+bl, l),:].dot(C)
 out_data.append(res.data)
 indices = res.indices
 indptr = res.indptr
 out_indices.append(indices)
 out_indptr.append(max_indptr+indptr[1:])
 max_indptr += indices.shape[0]

因此,如果您想访问最终矩阵的第二行,您只需:

L2 = csr_matrix((a.data[a.indptr[2]:a.indptr[2+1]], a.indices[a.indptr[2]:a.indptr[2+1]], np.array([0,len(a.indices[a.indptr[2]:a.indptr[2+1]])])), shape=(1,n))

For future viewers who want to multiply huge sparse matrices I solved my problem using PyTables and saved the result of the multiplication in chunks. Still it creates a big file but at least is compressed. The code I used goes like this:

import tables as tb

f = tb.open_file('D:\dot.h5', 'w')
l, m, n = Ct.shape[0], Ct.shape[1], C.shape[1]
filters = tb.Filters(complevel=8, complib='blosc')
out_data = f.create_earray(f.root, 'data', tb.Int32Atom(), shape=(0,), filters=filters)
out_indices = f.create_earray(f.root, 'indices', tb.Int32Atom(),shape=(0,), filters=filters)
out_indptr = f.create_earray(f.root, 'indptr', tb.Int32Atom(), shape=(0,), filters=filters)
out_indptr.append(np.array([0])) #this is needed as a first indptr
max_indptr = 0
#buffersize
bl = 10000
for i in range(0, l, bl):
 res = Ct[i:min(i+bl, l),:].dot(C)
 out_data.append(res.data)
 indices = res.indices
 indptr = res.indptr
 out_indices.append(indices)
 out_indptr.append(max_indptr+indptr[1:])
 max_indptr += indices.shape[0]

So if for example you want access to the 2nd row of your final matrix you simply can:

L2 = csr_matrix((a.data[a.indptr[2]:a.indptr[2+1]], a.indices[a.indptr[2]:a.indptr[2+1]], np.array([0,len(a.indices[a.indptr[2]:a.indptr[2+1]])])), shape=(1,n))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文