如何加速 scipy.weave 中的多维数组访问?

发布于 2024-09-11 11:42:02 字数 644 浏览 3 评论 0原文

我正在 python 中编写我的 c 代码来加速循环:

from scipy import weave
from numpy import *

#1) create the array
a=zeros((200,300,400),int)
for i in range(200):
    for j in range(300):
        for k in range(400):    
            a[i,j,k]=i*300*400+j*400+k
#2) test on c code to access the array
code="""
for(int i=0;i<200;++i){
for(int j=0;j<300;++j){
for(int k=0;k<400;++k){
printf("%ld,",a[i*300*400+j*400+k]);    
}
printf("\\n");
}
printf("\\n\\n");
}
"""
test =weave.inline(code, ['a'])

它工作得很好,但是当数组很大时它仍然很昂贵。 有人建议我使用 a.strides 而不是令人讨厌的“a[i*300*400+j*400+k]” 我无法理解有关 .strides 的文档。

任何想法

提前致谢

I'm weaving my c code in python to speed up the loop:

from scipy import weave
from numpy import *

#1) create the array
a=zeros((200,300,400),int)
for i in range(200):
    for j in range(300):
        for k in range(400):    
            a[i,j,k]=i*300*400+j*400+k
#2) test on c code to access the array
code="""
for(int i=0;i<200;++i){
for(int j=0;j<300;++j){
for(int k=0;k<400;++k){
printf("%ld,",a[i*300*400+j*400+k]);    
}
printf("\\n");
}
printf("\\n\\n");
}
"""
test =weave.inline(code, ['a'])

It's working all well, but it is still costly when the array is big.
Someone suggested me to use a.strides instead of the nasty "a[i*300*400+j*400+k]"
I can't understand the document about .strides.

Any ideas

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

玩物 2024-09-18 11:42:02

您可以将 3 个 for 循环替换为

grid=np.ogrid[0:200,0:300,0:400]
a=grid[0]*300*400+grid[1]*400+grid[2]

以下建议这可能会导致约 68 倍(或更好?见下文)加速:

% python -mtimeit -s"import test" "test.m1()"
100 loops, best of 3: 17.5 msec per loop
% python -mtimeit -s"import test" "test.m2()"
1000 loops, best of 3: 247 usec per loop

test.py:

import numpy as np

n1,n2,n3=20,30,40
def m1():
    a=np.zeros((n1,n2,n3),int)
    for i in range(n1):
        for j in range(n2):
            for k in range(n3):    
                a[i,j,k]=i*300*400+j*400+k
    return a

def m2():    
    grid=np.ogrid[0:n1,0:n2,0:n3]
    b=grid[0]*300*400+grid[1]*400+grid[2]
    return b 

if __name__=='__main__':
    assert(np.all(m1()==m2()))

With n1,n2,n3 = 200,300,400,

python -mtimeit -s"import test" "test.m2()"

在我的机器上花费了 182 毫秒,并且

python -mtimeit -s"import test" "test.m1()"

有尚未完成。

You could replace the 3 for-loops with

grid=np.ogrid[0:200,0:300,0:400]
a=grid[0]*300*400+grid[1]*400+grid[2]

The following suggests this may result in a ~68x (or better? see below) speedup:

% python -mtimeit -s"import test" "test.m1()"
100 loops, best of 3: 17.5 msec per loop
% python -mtimeit -s"import test" "test.m2()"
1000 loops, best of 3: 247 usec per loop

test.py:

import numpy as np

n1,n2,n3=20,30,40
def m1():
    a=np.zeros((n1,n2,n3),int)
    for i in range(n1):
        for j in range(n2):
            for k in range(n3):    
                a[i,j,k]=i*300*400+j*400+k
    return a

def m2():    
    grid=np.ogrid[0:n1,0:n2,0:n3]
    b=grid[0]*300*400+grid[1]*400+grid[2]
    return b 

if __name__=='__main__':
    assert(np.all(m1()==m2()))

With n1,n2,n3 = 200,300,400,

python -mtimeit -s"import test" "test.m2()"

took 182 ms on my machine, and

python -mtimeit -s"import test" "test.m1()"

has yet to finish.

战皆罪 2024-09-18 11:42:02

问题是您在 C 代码中将 240 万个数字打印到屏幕上。这当然需要一段时间,因为必须将数字转换为字符串,然后打印到屏幕上。您真的需要将它们全部打印到屏幕上吗?您的最终目标是什么?

为了进行比较,我尝试将另一个数组设置为 a 中的每个元素。这个过程在编织中花费了大约 0.05 秒。我放弃了在 30 秒左右将所有元素打印到屏幕上的计时。

The problem is that you are printing out 2.4 million numbers to the screen in your C code. This is of course going to take a while because the numbers have to be converted into strings and then printed to the screen. Do you really need to print them all to the screen? What is your end goal here?

For a comparison, I tried just setting another array as each of the elements in a. This process took about .05 seconds in weave. I gave up on timing the printing of all elements to the screen after 30 seconds or so.

橘和柠 2024-09-18 11:42:02

在 C 中没有办法加速访问多维数组。你必须计算数组索引,并且必须取消引用它,这就是最简单的。

There is no way to speed up accessing a multidimensional array in C. You have to calculate the array index and you have to dereference it, this is as simple as it gets.

可可 2024-09-18 11:42:02

我真的希望,您没有使用所有打印语句运行循环,正如贾斯汀已经指出的那样。除此之外:

from scipy import weave
n1, n2, n3 = 200, 300, 400

def m1():
    a = np.zeros((n1,n2,n3), int)
    for i in xrange(n1):
        for j in xrange(n2):
            for k in xrange(n3):
                a[i,j,k] = i*300*400 + j*400 + k
    return a

def m2():    
    grid = np.ogrid[0:n1,0:n2,0:n3]
    b = grid[0]*300*400 + grid[1]*400 + grid[2]
    return b 

def m3():
    a = np.zeros((n1,n2,n3), int)
    code = """
    int rows = Na[0];
    int cols = Na[1];
    int depth = Na[2];
    int val = 0;      
    for (int i=0; i<rows; i++) {
        for (int j=0; j<cols; j++) {
            for (int k=0; k<depth; k++) {
                val = (i*cols + j)*depth + k;
                a[val] = val;
            }
        }
    }"""
    weave.inline(code, ['a'])
    return a

%timeit m1()
%timeit m2()
%timeit m3()
np.all(m1() == m2())
np.all(m2() == m3())

给我:

1 loops, best of 3: 19.6 s per loop
1 loops, best of 3: 248 ms per loop
10 loops, best of 3: 144 ms per loop

这似乎很合理。如果您想进一步加快速度,您可能需要开始使用 GPU,它非常适合此类数字运算。

在这种特殊情况下,您甚至可以这样做:

def m4():
    a = np.zeros((n1,n2,n3), int)
    code = """
    int rows = Na[0];
    int cols = Na[1];
    int depth = Na[2];
    for (int i=0; i<rows*cols*depth; i++) {
        a[i] = i;
    }"""
    weave.inline(code, ['a'])
    return a

但这并没有变得更好,因为 np.zeros() 已经占用了大部分时间:

%timeit np.zeros((n1,n2,n3), int)
10 loops, best of 3: 113 ms per loop

I really hope, you didn't run the loop with all the print statements, as Justin already noted. Besides that:

from scipy import weave
n1, n2, n3 = 200, 300, 400

def m1():
    a = np.zeros((n1,n2,n3), int)
    for i in xrange(n1):
        for j in xrange(n2):
            for k in xrange(n3):
                a[i,j,k] = i*300*400 + j*400 + k
    return a

def m2():    
    grid = np.ogrid[0:n1,0:n2,0:n3]
    b = grid[0]*300*400 + grid[1]*400 + grid[2]
    return b 

def m3():
    a = np.zeros((n1,n2,n3), int)
    code = """
    int rows = Na[0];
    int cols = Na[1];
    int depth = Na[2];
    int val = 0;      
    for (int i=0; i<rows; i++) {
        for (int j=0; j<cols; j++) {
            for (int k=0; k<depth; k++) {
                val = (i*cols + j)*depth + k;
                a[val] = val;
            }
        }
    }"""
    weave.inline(code, ['a'])
    return a

%timeit m1()
%timeit m2()
%timeit m3()
np.all(m1() == m2())
np.all(m2() == m3())

Gives me:

1 loops, best of 3: 19.6 s per loop
1 loops, best of 3: 248 ms per loop
10 loops, best of 3: 144 ms per loop

Which seems to be pretty reasonable. If you want to speed it up further, you probably want to start using your GPU, which is quite perfect for number crunching like that.

In this special case, you could even do:

def m4():
    a = np.zeros((n1,n2,n3), int)
    code = """
    int rows = Na[0];
    int cols = Na[1];
    int depth = Na[2];
    for (int i=0; i<rows*cols*depth; i++) {
        a[i] = i;
    }"""
    weave.inline(code, ['a'])
    return a

But this is not getting much better anymore, since np.zeros() already takes most of the time:

%timeit np.zeros((n1,n2,n3), int)
10 loops, best of 3: 113 ms per loop
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文