迭代 scipy.sparse 向量(或矩阵)
我想知道最好的方法是使用 scipy.sparse 迭代稀疏矩阵的非零条目。例如,如果我执行以下操作:
from scipy.sparse import lil_matrix
x = lil_matrix( (20,1) )
x[13,0] = 1
x[15,0] = 2
c = 0
for i in x:
print c, i
c = c+1
输出
0
1
2
3
4
5
6
7
8
9
10
11
12
13 (0, 0) 1.0
14
15 (0, 0) 2.0
16
17
18
19
看起来迭代器正在触及每个元素,而不仅仅是非零条目。我查看了 API
http:// docs.scipy.org/doc/scipy/reference/ generated/scipy.sparse.lil_matrix.html
并进行了一些搜索,但我似乎找不到有效的解决方案。
I'm wondering what the best way is to iterate nonzero entries of sparse matrices with scipy.sparse. For example, if I do the following:
from scipy.sparse import lil_matrix
x = lil_matrix( (20,1) )
x[13,0] = 1
x[15,0] = 2
c = 0
for i in x:
print c, i
c = c+1
the output is
0
1
2
3
4
5
6
7
8
9
10
11
12
13 (0, 0) 1.0
14
15 (0, 0) 2.0
16
17
18
19
so it appears the iterator is touching every element, not just the nonzero entries. I've had a look at the API
http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_matrix.html
and searched around a bit, but I can't seem to find a solution that works.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
尝试使用
filter(lambda x:x, x)
而不是x
。Try
filter(lambda x:x, x)
instead ofx
.编辑:bbtrb的方法(使用coo_matrix)更快比我原来的建议,使用 非零。 Sven Marnach 建议使用 itertools.izip 也提高了速度。当前最快的是
using_tocoo_izip
:产生以下
timeit
结果:Edit: bbtrb's method (using coo_matrix) is much faster than my original suggestion, using nonzero. Sven Marnach's suggestion to use
itertools.izip
also improves the speed. Current fastest isusing_tocoo_izip
:yields these
timeit
results:最快的方法应该是转换为
coo_matrix
:The fastest way should be by converting to a
coo_matrix
:要从 scipy.sparse 代码部分循环各种稀疏矩阵,我将使用这个小包装函数(请注意,对于 Python-2,鼓励您使用 xrange 和 < code>izip 可以在大型矩阵上获得更好的性能):
To loop a variety of sparse matrices from the
scipy.sparse
code section I would use this small wrapper function (note that for Python-2 you are encouraged to usexrange
andizip
for better performance on large matrices):tocoo() 将整个矩阵物化为不同的结构,这不是 python 3 的首选 MO。您还可以考虑这个迭代器,它对于大型矩阵特别有用。
我必须承认我使用了很多 python 结构,这些结构可能应该被 numpy 结构(尤其是枚举)替换。
NB:
所以是的,枚举有点慢(ish)
对于迭代器:
所以你决定这个开销是否可以接受,在我的例子中,tocoo 导致了
MemoryOverflows
。恕我直言:这样的迭代器应该是 csr_matrix 接口的一部分,类似于 dict() 中的 items() :)
tocoo() materializes the entire matrix into a different structure, which is not the preferred MO for python 3. You can also consider this iterator, which is especially useful for large matrices.
I have to admit that I'm using a lot of python-constructs which possibly should be replaced by numpy-constructs (especially enumerate).
NB:
So yes, enumerate is somewhat slow(ish)
For the iterator:
So you decide whether this overhead is acceptable, in my case the tocoo caused
MemoryOverflows
's.IMHO: such an iterator should be part of the csr_matrix interface, similar to items() in a dict() :)
我遇到了同样的问题,实际上,如果您只关心速度,最快的方法(快超过 1 个数量级)是将稀疏矩阵转换为密集矩阵 (x.todense()),并迭代非零稠密矩阵中的元素。 (当然,这种方法需要更多的内存)
I had the same problem and actually, if your concern is only speed, the fastest way (more than 1 order of magnitude faster) is to convert the sparse matrix to a dense one (x.todense()), and iterating over the nonzero elements in the dense matrix. (Though, of course, this approach requires a lot more memory)