扩展（添加行或列）scipy.sparse 矩阵

发布于 2024-10-11 23:19:24 字数 207 浏览 4 评论 0原文

假设我有一个来自 scipy.sparse 的 NxN 矩阵 M（lil_matrix 或 csr_matrix），我想将其设为 (N+1)xN，其中 M_modified[i,j] = M[i,j] for 0 <= i < ; N（和所有 j）并且对于所有 j，M[N,j] = 0。基本上，我想在 M 的底部添加一行零并保留矩阵的其余部分。有没有办法在不复制数据的情况下做到这一点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸠书 2024-10-18 23:19:24

Scipy 没有办法在不复制数据的情况下执行此操作，但您可以通过更改定义稀疏矩阵的属性来自己完成此操作。

csr_matrix 由 4 个属性组成：

data：包含矩阵中实际值的数组

索引：包含与 data 中每个值对应的列索引的数组

indptr：指定 data 中第一个值之前的索引的数组每行。如果该行为空，则索引与前一列相同。

shape：包含矩阵形状的元组

如果您只是在底部添加一行零，您所要做的就是更改矩阵的形状和 indptr 。

x = np.ones((3,5))
x = csr_matrix(x)
x.toarray()
>> array([[ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.]])
# reshape is not implemented for csr_matrix but you can cheat and do it  yourself.
x._shape = (4,5)
# Update indptr to let it know we added a row with nothing in it. So just append the last
# value in indptr to the end.
# note that you are still copying the indptr array
x.indptr = np.hstack((x.indptr,x.indptr[-1]))
x.toarray()
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.]])

这是一个用于处理 vstacking 任意 2 个 csr_matrices 的更一般情况的函数。您最终仍然会复制底层 numpy 数组，但它仍然比 scipy vstack 方法快得多。

def csr_vappend(a,b):
    """ Takes in 2 csr_matrices and appends the second one to the bottom of the first one. 
    Much faster than scipy.sparse.vstack but assumes the type to be csr and overwrites
    the first matrix instead of copying it. The data, indices, and indptr still get copied."""

    a.data = np.hstack((a.data,b.data))
    a.indices = np.hstack((a.indices,b.indices))
    a.indptr = np.hstack((a.indptr,(b.indptr + a.nnz)[1:]))
    a._shape = (a.shape[0]+b.shape[0],b.shape[1])
    return a

Scipy doesn't have a way to do this without copying the data but you can do it yourself by changing the attributes that define the sparse matrix.

There are 4 attributes that make up the csr_matrix:

data: An array containing the actual values in the matrix

indices: An array containing the column index corresponding to each value in data

indptr: An array that specifies the index before the first value in data for each row. If the row is empty then the index is the same as the previous column.

shape: A tuple containing the shape of the matrix

If you are simply adding a row of zeros to the bottom all you have to do is change the shape and indptr for your matrix.

x = np.ones((3,5))
x = csr_matrix(x)
x.toarray()
>> array([[ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.]])
# reshape is not implemented for csr_matrix but you can cheat and do it  yourself.
x._shape = (4,5)
# Update indptr to let it know we added a row with nothing in it. So just append the last
# value in indptr to the end.
# note that you are still copying the indptr array
x.indptr = np.hstack((x.indptr,x.indptr[-1]))
x.toarray()
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.]])

Here is a function to handle the more general case of vstacking any 2 csr_matrices. You still end up copying the underlying numpy arrays but it is still significantly faster than the scipy vstack method.

def csr_vappend(a,b):
    """ Takes in 2 csr_matrices and appends the second one to the bottom of the first one. 
    Much faster than scipy.sparse.vstack but assumes the type to be csr and overwrites
    the first matrix instead of copying it. The data, indices, and indptr still get copied."""

    a.data = np.hstack((a.data,b.data))
    a.indices = np.hstack((a.indices,b.indices))
    a.indptr = np.hstack((a.indptr,(b.indptr + a.nnz)[1:]))
    a._shape = (a.shape[0]+b.shape[0],b.shape[1])
    return a

回复收藏 0 原文

征棹 2024-10-18 23:19:24

不确定您是否仍在寻找解决方案，但也许其他人可以研究 hstack 和 vstack - http://docs.scipy.org/doc/scipy/reference/ generated/scipy.sparse.hstack.html。我认为我们可以为单个附加行定义一个 csr_matrix，然后用前一个矩阵来定义它。

回复收藏 0 原文

慈悲佛祖 2024-10-18 23:19:24

我认为没有任何方法可以真正避免复制。这两种类型的稀疏矩阵都将其数据存储为 Numpy 数组（在 csr 的数据和索引属性中以及 lil 的数据和行属性中），并且 Numpy 数组无法扩展。

更新更多信息：

LIL 确实代表链接列表，但当前的实现并不完全符合其名称。用于data和rows的Numpy数组都是对象类型。这些数组中的每个对象实际上都是 Python 列表（当所有值连续为零时为空列表）。 Python 列表并不完全是链表，但它们有点接近，并且坦率地说，由于 O(1) 查找，它们是更好的选择。就我个人而言，我并没有立即意识到在这里使用 Numpy 对象数组而不仅仅是 Python 列表有什么意义。您可以相当轻松地将当前的 lil 实现更改为使用 Python 列表，这将允许您添加一行而无需复制整个矩阵。

回复收藏 0 原文

~没有更多了~