扩展(添加行或列)scipy.sparse 矩阵
假设我有一个来自 scipy.sparse 的 NxN 矩阵 M(lil_matrix 或 csr_matrix),我想将其设为 (N+1)xN,其中 M_modified[i,j] = M[i,j] for 0 <= i < ; N(和所有 j)并且对于所有 j,M[N,j] = 0。基本上,我想在 M 的底部添加一行零并保留矩阵的其余部分。有没有办法在不复制数据的情况下做到这一点?
Suppose I have a NxN matrix M (lil_matrix or csr_matrix) from scipy.sparse, and I want to make it (N+1)xN where M_modified[i,j] = M[i,j] for 0 <= i < N (and all j) and M[N,j] = 0 for all j. Basically, I want to add a row of zeros to the bottom of M and preserve the remainder of the matrix. Is there a way to do this without copying the data?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Scipy 没有办法在不复制数据的情况下执行此操作,但您可以通过更改定义稀疏矩阵的属性来自己完成此操作。
csr_matrix 由 4 个属性组成:
data:包含矩阵中实际值的数组
索引:包含与 data 中每个值对应的列索引的数组
indptr:指定 data 中第一个值之前的索引的数组每行。如果该行为空,则索引与前一列相同。
shape:包含矩阵形状的元组
如果您只是在底部添加一行零,您所要做的就是更改矩阵的形状和 indptr 。
这是一个用于处理 vstacking 任意 2 个 csr_matrices 的更一般情况的函数。您最终仍然会复制底层 numpy 数组,但它仍然比 scipy vstack 方法快得多。
Scipy doesn't have a way to do this without copying the data but you can do it yourself by changing the attributes that define the sparse matrix.
There are 4 attributes that make up the csr_matrix:
data: An array containing the actual values in the matrix
indices: An array containing the column index corresponding to each value in data
indptr: An array that specifies the index before the first value in data for each row. If the row is empty then the index is the same as the previous column.
shape: A tuple containing the shape of the matrix
If you are simply adding a row of zeros to the bottom all you have to do is change the shape and indptr for your matrix.
Here is a function to handle the more general case of vstacking any 2 csr_matrices. You still end up copying the underlying numpy arrays but it is still significantly faster than the scipy vstack method.
不确定您是否仍在寻找解决方案,但也许其他人可以研究
hstack
和vstack
- http://docs.scipy.org/doc/scipy/reference/ generated/scipy.sparse.hstack.html。我认为我们可以为单个附加行定义一个 csr_matrix,然后用前一个矩阵来定义它。Not sure if you're still looking for a solution, but maybe others can look into
hstack
andvstack
- http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html. I think we can define a csr_matrix for the single additional row and thenvstack
it with the previous matrix.我认为没有任何方法可以真正避免复制。这两种类型的稀疏矩阵都将其数据存储为 Numpy 数组(在 csr 的数据和索引属性中以及 lil 的数据和行属性中),并且 Numpy 数组无法扩展。
更新更多信息:
LIL 确实代表链接列表,但当前的实现并不完全符合其名称。用于
data
和rows
的Numpy数组都是对象类型。这些数组中的每个对象实际上都是 Python 列表(当所有值连续为零时为空列表)。 Python 列表并不完全是链表,但它们有点接近,并且坦率地说,由于 O(1) 查找,它们是更好的选择。就我个人而言,我并没有立即意识到在这里使用 Numpy 对象数组而不仅仅是 Python 列表有什么意义。您可以相当轻松地将当前的 lil 实现更改为使用 Python 列表,这将允许您添加一行而无需复制整个矩阵。I don't think that there is any way to really escape from doing the copying. Both of those types of sparse matrices store their data as Numpy arrays (in the data and indices attributes for csr and in the data and rows attributes for lil) internally and Numpy arrays can't be extended.
Update with more information:
LIL does stand for LInked List, but the current implementation doesn't quite live up to the name. The Numpy arrays used for
data
androws
are both of type object. Each of the objects in these arrays are actually Python lists (an empty list when all values are zero in a row). Python lists aren't exactly linked lists, but they are kind of close and quite frankly a better choice due to O(1) look-up. Personally, I don't immediately see the point of using a Numpy array of objects here rather than just a Python list. You could fairly easily change the current lil implementation to use Python lists instead which would allow you to add a row without copying the whole matrix.