scipy.sparse 矩阵索引运算的向量化
尽管所有内容似乎都已矢量化,但以下代码运行速度还是太慢。
from numpy import *
from scipy.sparse import *
n = 100000;
i = xrange(n); j = xrange(n);
data = ones(n);
A=csr_matrix((data,(i,j)));
x = A[i,j]
问题似乎是索引操作是作为 python 函数实现的,调用 A[i,j]
会产生以下分析输出
500033 function calls in 8.718 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
100000 7.933 0.000 8.156 0.000 csr.py:265(_get_single_element)
1 0.271 0.271 8.705 8.705 csr.py:177(__getitem__)
(...)
,即 python 函数 _get_single_element
被调用 100000 次,效率很低。为什么不在纯 C 中实现?有谁知道绕过这个限制并加速上述代码的方法? 我应该使用不同的稀疏矩阵类型吗?
The following code runs too slowly even though everything seems to be vectorized.
from numpy import *
from scipy.sparse import *
n = 100000;
i = xrange(n); j = xrange(n);
data = ones(n);
A=csr_matrix((data,(i,j)));
x = A[i,j]
The problem seems to be that the indexing operation is implemented as a python function, and invoking A[i,j]
results in the following profiling output
500033 function calls in 8.718 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
100000 7.933 0.000 8.156 0.000 csr.py:265(_get_single_element)
1 0.271 0.271 8.705 8.705 csr.py:177(__getitem__)
(...)
Namely, the python function _get_single_element
gets called 100000 times which is really inefficient. Why isn't this implemented in pure C? Does anybody know of a way of getting around this limitation, and speeding up the above code?
Should I be using a different sparse matrix type?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用
A.diagonal()
更快地检索对角线(在我的机器上为 0.0009 秒 vs. 3.8 秒)。但是,如果您想进行任意索引,那么这是一个更复杂的问题,因为您使用的不是切片而是索引列表。 _get_single_element 函数被调用 100000 次,因为它只是迭代您传递给它的迭代器(i 和 j)。切片将是 A[30:60,10] 或类似的东西。另外,为了简单起见,我将使用 csr_matrix(eye(n,n)) 来制作与使用迭代器制作的相同的矩阵。
更新:
好的,由于您的问题确实是关于能够快速访问大量随机元素,所以我会尽力回答您的问题。
答案很简单:没有人抽出时间来解决这个问题。据我所知,Scipy 的稀疏矩阵模块领域仍有大量工作要做。用 C 实现的一部分是不同稀疏矩阵格式之间的转换。
您可以尝试实际深入研究稀疏矩阵模块并尝试加快它们的速度。我这样做了,并且在使用 csr 矩阵尝试上面的随机访问代码时,能够将时间减少到原始时间的三分之一以下。我必须直接访问 _get_single_element 并大幅削减代码才能做到这一点,包括取出绑定检查。
然而,使用 lil_matrix 仍然更快(尽管初始化矩阵速度较慢),但我必须使用列表理解进行访问,因为 lil 矩阵没有为您正在执行的索引类型设置。顺便说一句,对 csr_matrix 使用列表理解仍然使 lil 矩阵方法遥遥领先。最终,lil 矩阵访问随机元素的速度更快,因为它没有被压缩。
使用原始形式的 lil_matrix 的运行时间大约是上面列出的代码的五分之一。如果我去掉一些绑定检查并直接调用 lil_matrix 的 _get1() 方法,我可以将时间进一步缩短到原始时间的 7% 左右。为清楚起见,速度从 3.4-3.8 秒加速到约 0.261 秒。
最后,我尝试创建自己的函数来直接访问 lil 矩阵的数据并避免重复的函数调用。此时间约为 0.136 秒。这没有利用正在排序的数据,这是另一个潜在的优化(特别是如果您正在访问同一行上的大量元素)。
如果您想要比这更快,那么您可能必须编写自己的 C 代码稀疏矩阵实现。
好吧,如果您的目的是访问大量元素,我建议使用 lil 矩阵,但这完全取决于您需要做什么。例如,您还需要矩阵相乘吗?请记住,矩阵之间的更改至少有时(在某些情况下)会非常快,因此不排除更改为不同的矩阵格式来执行不同的操作。
如果您不需要在矩阵上实际执行任何代数运算,那么也许您应该只使用 defaultdict 或类似的东西。 defaultdicts 的危险在于,每当请求一个不在字典中的元素时,它都会将该项目设置为默认值并存储它,这样可能会出现问题。
You can use
A.diagonal()
to retrieve the diagonal much more quickly (0.0009 seconds vs. 3.8 seconds on my machine) . However, if you want to do arbitary indexing then that is a more complicated question because you aren't using slices so much as a list of indices. The _get_single_element function is being called 100000 times because it just iterating through the iterators (i and j) that you passed to it. A slice would be A[30:60,10] or something similar to that.Also, I would use
csr_matrix(eye(n,n))
to make the same matrix that you made with iterators just for simplicity.Update:
Ok, since your question is truly about being able to access lots of random elements quickly, I will answer your questions as best as I can.
The answer is simple: no one has gotten around to it. There is still plenty of work to be done in the sparse matrix modules area of Scipy from what I have seen. One part that is implemented in C is the conversions between different sparse matrix formats.
You can try actually diving into the sparse matrix modules and trying to speed them up. I did so and was able to get the time down to less than a third of the original when trying out your code above for random accesses using csr matrices. I had to directly access _get_single_element and pare down the code significantly to do that including taking out bound checks.
However, it was faster still to use a lil_matrix (though slower to initialize the matrix), but I had to do the accessing with a list comprehension because lil matrices aren't setup for the type of indexing you are doing. Using a list comprehension for the csr_matrix still leaves the lil matrix method way ahead by the way. Ultimately, the lil matrix is faster for accessing random elements because it isn't compressed.
Using the lil_matrix in its original form runs in about a fifth of the time of the code you have listed above. If I take out some bound checks and call lil_matrix's _get1() method directly, I can bring the time down further about 7% of the original time. For clarity that's a speed-up from 3.4-3.8 seconds to about 0.261 seconds.
Lastly, I tried making my own function that directly accesses the lil matrix's data and avoids the repeated function calls. The time for this was about 0.136 seconds. This didn't take advantage of the data being sorted which is another potential optimization (in particular if you are accessing a lot of elements that are on the same rows).
If you want faster than that then you will have to write your own C code sparse matrix implementation probably.
Well, I suggest the lil matrix if your intent is to access a whole lot of elements, but it all depends on what you need to do. Do you also need to multiply matrices for instance? Just remember that changing between matrices can at least sometimes (in certain circumstances) be quite fast so don't rule out changing to a different matrix format to do different operations.
If you don't need to do actually do any algebra operations on your matrix, then maybe you should just use a defaultdict or something similar. The danger with defaultdicts is that whenever an element is asked for that isn't in the dict, it sets that item to the default and stores it so that could be problematic.
我认为仅当使用“object”的默认数据类型时才会调用 _get_single_element 。您是否尝试过提供 dtype,例如
csr_matrix((data, (i,j)), dtype=int32)
I think _get_single_element is only called when the default dtype of 'object' is used. Have you tried providing a dtype, such as
csr_matrix((data, (i,j)), dtype=int32)