对稀疏矩阵执行外积之和

发布于 2024-11-28 02:45:36 字数 746 浏览 2 评论 0原文

我正在尝试使用 scipy 的稀疏包来实现以下方程

W = x[:,1] * y[:,1].T + x[:,2] * y[:,2].T + ...

： y 是一个 nxm csc_matrix。基本上，我试图将 x 的每一列乘以 y 的每一列，并将所得的 nxn 矩阵相加。然后我想让所有非零元素为 1。

这是我当前的实现：

    c = sparse.csc_matrix((n, n))
    for i in xrange(0,m):
        tmp = bam.id2sym_thal[:,i] * bam.id2sym_cort[:,i].T
        minimum(tmp.data,ones_like(tmp.data),tmp.data)
        maximum(tmp.data,ones_like(tmp.data),tmp.data)

        c = c + tmp

此实现存在以下问题：

内存使用量似乎激增。据我了解，内存只会随着 c 变得不那么稀疏而增加，但我看到循环开始消耗 >20GB 的内存，an=10,000，m=100,000（x 和 y 的每一行只有大约 60非零元素）。
我使用的 python 循环效率不是很高。

我的问题：有更好的方法吗？控制内存使用是我首先关心的问题，但如果能够让它更快的话那就太好了！

谢谢你！

原文

I am trying to implement the following equation using scipy's sparse package:

W = x[:,1] * y[:,1].T + x[:,2] * y[:,2].T + ...

where x & y are a nxm csc_matrix. Basically I'm trying to multiply each col of x by each col of y and sum the resulting nxn matrices together. I then want to make all non-zero elements 1.

This is my current implementation:

    c = sparse.csc_matrix((n, n))
    for i in xrange(0,m):
        tmp = bam.id2sym_thal[:,i] * bam.id2sym_cort[:,i].T
        minimum(tmp.data,ones_like(tmp.data),tmp.data)
        maximum(tmp.data,ones_like(tmp.data),tmp.data)

        c = c + tmp

This implementation has the following problems:

Memory usage seems to explode. As I understand it, memory should only increase as c becomes less sparse, but I am seeing that the loop starts eating up >20GB of memory with a n=10,000, m=100,000 (each row of x & y only has around 60 non-zero elements).
I'm using a python loop which is not very efficient.

My question: Is there a better way to do this? Controlling memory usage is my first concern, but it would be great to make it faster!

Thank you!

分享到QQ

分享到微博