如何加速 scipy.weave 中的多维数组访问?
我正在 python 中编写我的 c 代码来加速循环:
from scipy import weave
from numpy import *
#1) create the array
a=zeros((200,300,400),int)
for i in range(200):
for j in range(300):
for k in range(400):
a[i,j,k]=i*300*400+j*400+k
#2) test on c code to access the array
code="""
for(int i=0;i<200;++i){
for(int j=0;j<300;++j){
for(int k=0;k<400;++k){
printf("%ld,",a[i*300*400+j*400+k]);
}
printf("\\n");
}
printf("\\n\\n");
}
"""
test =weave.inline(code, ['a'])
它工作得很好,但是当数组很大时它仍然很昂贵。 有人建议我使用 a.strides 而不是令人讨厌的“a[i*300*400+j*400+k]” 我无法理解有关 .strides 的文档。
任何想法
提前致谢
I'm weaving my c code in python to speed up the loop:
from scipy import weave
from numpy import *
#1) create the array
a=zeros((200,300,400),int)
for i in range(200):
for j in range(300):
for k in range(400):
a[i,j,k]=i*300*400+j*400+k
#2) test on c code to access the array
code="""
for(int i=0;i<200;++i){
for(int j=0;j<300;++j){
for(int k=0;k<400;++k){
printf("%ld,",a[i*300*400+j*400+k]);
}
printf("\\n");
}
printf("\\n\\n");
}
"""
test =weave.inline(code, ['a'])
It's working all well, but it is still costly when the array is big.
Someone suggested me to use a.strides instead of the nasty "a[i*300*400+j*400+k]"
I can't understand the document about .strides.
Any ideas
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以将 3 个 for 循环替换为
以下建议这可能会导致约 68 倍(或更好?见下文)加速:
test.py:
With n1,n2,n3 = 200,300,400,
在我的机器上花费了 182 毫秒,并且
有尚未完成。
You could replace the 3 for-loops with
The following suggests this may result in a ~68x (or better? see below) speedup:
test.py:
With n1,n2,n3 = 200,300,400,
took 182 ms on my machine, and
has yet to finish.
问题是您在 C 代码中将 240 万个数字打印到屏幕上。这当然需要一段时间,因为必须将数字转换为字符串,然后打印到屏幕上。您真的需要将它们全部打印到屏幕上吗?您的最终目标是什么?
为了进行比较,我尝试将另一个数组设置为 a 中的每个元素。这个过程在编织中花费了大约 0.05 秒。我放弃了在 30 秒左右将所有元素打印到屏幕上的计时。
The problem is that you are printing out 2.4 million numbers to the screen in your C code. This is of course going to take a while because the numbers have to be converted into strings and then printed to the screen. Do you really need to print them all to the screen? What is your end goal here?
For a comparison, I tried just setting another array as each of the elements in a. This process took about .05 seconds in weave. I gave up on timing the printing of all elements to the screen after 30 seconds or so.
在 C 中没有办法加速访问多维数组。你必须计算数组索引,并且必须取消引用它,这就是最简单的。
There is no way to speed up accessing a multidimensional array in C. You have to calculate the array index and you have to dereference it, this is as simple as it gets.
我真的希望,您没有使用所有打印语句运行循环,正如贾斯汀已经指出的那样。除此之外:
给我:
这似乎很合理。如果您想进一步加快速度,您可能需要开始使用 GPU,它非常适合此类数字运算。
在这种特殊情况下,您甚至可以这样做:
但这并没有变得更好,因为 np.zeros() 已经占用了大部分时间:
I really hope, you didn't run the loop with all the print statements, as Justin already noted. Besides that:
Gives me:
Which seems to be pretty reasonable. If you want to speed it up further, you probably want to start using your GPU, which is quite perfect for number crunching like that.
In this special case, you could even do:
But this is not getting much better anymore, since
np.zeros()
already takes most of the time: