扩展(添加行或列)scipy.sparse 矩阵

发布于 2024-10-11 23:19:24 字数 207 浏览 4 评论 0原文

假设我有一个来自 scipy.sparse 的 NxN 矩阵 M(lil_matrix 或 csr_matrix),我想将其设为 (N+1)xN,其中 M_modified[i,j] = M[i,j] for 0 <= i < ; N(和所有 j)并且对于所有 j,M[N,j] = 0。基本上,我想在 M 的底部添加一行零并保留矩阵的其余部分。有没有办法在不复制数据的情况下做到这一点?

Suppose I have a NxN matrix M (lil_matrix or csr_matrix) from scipy.sparse, and I want to make it (N+1)xN where M_modified[i,j] = M[i,j] for 0 <= i < N (and all j) and M[N,j] = 0 for all j. Basically, I want to add a row of zeros to the bottom of M and preserve the remainder of the matrix. Is there a way to do this without copying the data?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

鸠书 2024-10-18 23:19:24

Scipy 没有办法在不复制数据的情况下执行此操作,但您可以通过更改定义稀疏矩阵的属性来自己完成此操作。

csr_matrix 由 4 个属性组成:

data:包含矩阵中实际值的数组

索引:包含与 data 中每个值对应的列索引的数组

indptr:指定 data 中第一个值之前的索引的数组每行。如果该行为空,则索引与前一列相同。

shape:包含矩阵形状的元组

如果您只是在底部添加一行零,您所要做的就是更改矩阵的形状和 indptr 。

x = np.ones((3,5))
x = csr_matrix(x)
x.toarray()
>> array([[ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.]])
# reshape is not implemented for csr_matrix but you can cheat and do it  yourself.
x._shape = (4,5)
# Update indptr to let it know we added a row with nothing in it. So just append the last
# value in indptr to the end.
# note that you are still copying the indptr array
x.indptr = np.hstack((x.indptr,x.indptr[-1]))
x.toarray()
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.]])

这是一个用于处理 vstacking 任意 2 个 csr_matrices 的更一般情况的函数。您最终仍然会复制底层 numpy 数组,但它仍然比 scipy vstack 方法快得多。

def csr_vappend(a,b):
    """ Takes in 2 csr_matrices and appends the second one to the bottom of the first one. 
    Much faster than scipy.sparse.vstack but assumes the type to be csr and overwrites
    the first matrix instead of copying it. The data, indices, and indptr still get copied."""

    a.data = np.hstack((a.data,b.data))
    a.indices = np.hstack((a.indices,b.indices))
    a.indptr = np.hstack((a.indptr,(b.indptr + a.nnz)[1:]))
    a._shape = (a.shape[0]+b.shape[0],b.shape[1])
    return a

Scipy doesn't have a way to do this without copying the data but you can do it yourself by changing the attributes that define the sparse matrix.

There are 4 attributes that make up the csr_matrix:

data: An array containing the actual values in the matrix

indices: An array containing the column index corresponding to each value in data

indptr: An array that specifies the index before the first value in data for each row. If the row is empty then the index is the same as the previous column.

shape: A tuple containing the shape of the matrix

If you are simply adding a row of zeros to the bottom all you have to do is change the shape and indptr for your matrix.

x = np.ones((3,5))
x = csr_matrix(x)
x.toarray()
>> array([[ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.]])
# reshape is not implemented for csr_matrix but you can cheat and do it  yourself.
x._shape = (4,5)
# Update indptr to let it know we added a row with nothing in it. So just append the last
# value in indptr to the end.
# note that you are still copying the indptr array
x.indptr = np.hstack((x.indptr,x.indptr[-1]))
x.toarray()
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.]])

Here is a function to handle the more general case of vstacking any 2 csr_matrices. You still end up copying the underlying numpy arrays but it is still significantly faster than the scipy vstack method.

def csr_vappend(a,b):
    """ Takes in 2 csr_matrices and appends the second one to the bottom of the first one. 
    Much faster than scipy.sparse.vstack but assumes the type to be csr and overwrites
    the first matrix instead of copying it. The data, indices, and indptr still get copied."""

    a.data = np.hstack((a.data,b.data))
    a.indices = np.hstack((a.indices,b.indices))
    a.indptr = np.hstack((a.indptr,(b.indptr + a.nnz)[1:]))
    a._shape = (a.shape[0]+b.shape[0],b.shape[1])
    return a
征棹 2024-10-18 23:19:24

Not sure if you're still looking for a solution, but maybe others can look into hstack and vstack - http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html. I think we can define a csr_matrix for the single additional row and then vstack it with the previous matrix.

慈悲佛祖 2024-10-18 23:19:24

我认为没有任何方法可以真正避免复制。这两种类型的稀疏矩阵都将其数据存储为 Numpy 数组(在 csr 的数据和索引属性中以及 lil 的数据和行属性中),并且 Numpy 数组无法扩展。

更新更多信息:

LIL 确实代表链接列表,但当前的实现并不完全符合其名称。用于datarows的Numpy数组都是对象类型。这些数组中的每个对象实际上都是 Python 列表(当所有值连续为零时为空列表)。 Python 列表并不完全是链表,但它们有点接近,并且坦率地说,由于 O(1) 查找,它们是更好的选择。就我个人而言,我并没有立即意识到在这里使用 Numpy 对象数组而不仅仅是 Python 列表有什么意义。您可以相当轻松地将当前的 lil 实现更改为使用 Python 列表,这将允许您添加一行而无需复制整个矩阵。

I don't think that there is any way to really escape from doing the copying. Both of those types of sparse matrices store their data as Numpy arrays (in the data and indices attributes for csr and in the data and rows attributes for lil) internally and Numpy arrays can't be extended.

Update with more information:

LIL does stand for LInked List, but the current implementation doesn't quite live up to the name. The Numpy arrays used for data and rows are both of type object. Each of the objects in these arrays are actually Python lists (an empty list when all values are zero in a row). Python lists aren't exactly linked lists, but they are kind of close and quite frankly a better choice due to O(1) look-up. Personally, I don't immediately see the point of using a Numpy array of objects here rather than just a Python list. You could fairly easily change the current lil implementation to use Python lists instead which would allow you to add a row without copying the whole matrix.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文