如何加速 scipy.weave 中的多维数组访问？

发布于 2024-09-11 11:42:02 字数 644 浏览 3 评论 0原文

我正在 python 中编写我的 c 代码来加速循环：

from scipy import weave
from numpy import *

#1) create the array
a=zeros((200,300,400),int)
for i in range(200):
    for j in range(300):
        for k in range(400):    
            a[i,j,k]=i*300*400+j*400+k
#2) test on c code to access the array
code="""
for(int i=0;i<200;++i){
for(int j=0;j<300;++j){
for(int k=0;k<400;++k){
printf("%ld,",a[i*300*400+j*400+k]);    
}
printf("\\n");
}
printf("\\n\\n");
}
"""
test =weave.inline(code, ['a'])

它工作得很好，但是当数组很大时它仍然很昂贵。有人建议我使用 a.strides 而不是令人讨厌的“a[i*300*400+j*400+k]” 我无法理解有关 .strides 的文档。

任何想法

提前致谢

原文

I'm weaving my c code in python to speed up the loop:

from scipy import weave
from numpy import *

#1) create the array
a=zeros((200,300,400),int)
for i in range(200):
    for j in range(300):
        for k in range(400):    
            a[i,j,k]=i*300*400+j*400+k
#2) test on c code to access the array
code="""
for(int i=0;i<200;++i){
for(int j=0;j<300;++j){
for(int k=0;k<400;++k){
printf("%ld,",a[i*300*400+j*400+k]);    
}
printf("\\n");
}
printf("\\n\\n");
}
"""
test =weave.inline(code, ['a'])

It's working all well, but it is still costly when the array is big.
Someone suggested me to use a.strides instead of the nasty "a[i*300*400+j*400+k]"
I can't understand the document about .strides.

Any ideas

Thanks in advance

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

玩物 2024-09-18 11:42:02

您可以将 3 个 for 循环替换为

grid=np.ogrid[0:200,0:300,0:400]
a=grid[0]*300*400+grid[1]*400+grid[2]

以下建议这可能会导致约 68 倍（或更好？见下文）加速：

% python -mtimeit -s"import test" "test.m1()"
100 loops, best of 3: 17.5 msec per loop
% python -mtimeit -s"import test" "test.m2()"
1000 loops, best of 3: 247 usec per loop

test.py:

import numpy as np

n1,n2,n3=20,30,40
def m1():
    a=np.zeros((n1,n2,n3),int)
    for i in range(n1):
        for j in range(n2):
            for k in range(n3):    
                a[i,j,k]=i*300*400+j*400+k
    return a

def m2():    
    grid=np.ogrid[0:n1,0:n2,0:n3]
    b=grid[0]*300*400+grid[1]*400+grid[2]
    return b 

if __name__=='__main__':
    assert(np.all(m1()==m2()))

With n1,n2,n3 = 200,300,400，

python -mtimeit -s"import test" "test.m2()"

在我的机器上花费了 182 毫秒，并且

python -mtimeit -s"import test" "test.m1()"

有尚未完成。

You could replace the 3 for-loops with

grid=np.ogrid[0:200,0:300,0:400]
a=grid[0]*300*400+grid[1]*400+grid[2]

The following suggests this may result in a ~68x (or better? see below) speedup:

% python -mtimeit -s"import test" "test.m1()"
100 loops, best of 3: 17.5 msec per loop
% python -mtimeit -s"import test" "test.m2()"
1000 loops, best of 3: 247 usec per loop

test.py:

import numpy as np

n1,n2,n3=20,30,40
def m1():
    a=np.zeros((n1,n2,n3),int)
    for i in range(n1):
        for j in range(n2):
            for k in range(n3):    
                a[i,j,k]=i*300*400+j*400+k
    return a

def m2():    
    grid=np.ogrid[0:n1,0:n2,0:n3]
    b=grid[0]*300*400+grid[1]*400+grid[2]
    return b 

if __name__=='__main__':
    assert(np.all(m1()==m2()))

With n1,n2,n3 = 200,300,400,

python -mtimeit -s"import test" "test.m2()"

took 182 ms on my machine, and

python -mtimeit -s"import test" "test.m1()"

has yet to finish.

回复收藏 0 原文

战皆罪 2024-09-18 11:42:02

问题是您在 C 代码中将 240 万个数字打印到屏幕上。这当然需要一段时间，因为必须将数字转换为字符串，然后打印到屏幕上。您真的需要将它们全部打印到屏幕上吗？您的最终目标是什么？

为了进行比较，我尝试将另一个数组设置为 a 中的每个元素。这个过程在编织中花费了大约 0.05 秒。我放弃了在 30 秒左右将所有元素打印到屏幕上的计时。

回复收藏 0 原文

橘和柠 2024-09-18 11:42:02

在 C 中没有办法加速访问多维数组。你必须计算数组索引，并且必须取消引用它，这就是最简单的。

回复收藏 0 原文

可可 2024-09-18 11:42:02

我真的希望，您没有使用所有打印语句运行循环，正如贾斯汀已经指出的那样。除此之外：

from scipy import weave
n1, n2, n3 = 200, 300, 400

def m1():
    a = np.zeros((n1,n2,n3), int)
    for i in xrange(n1):
        for j in xrange(n2):
            for k in xrange(n3):
                a[i,j,k] = i*300*400 + j*400 + k
    return a

def m2():    
    grid = np.ogrid[0:n1,0:n2,0:n3]
    b = grid[0]*300*400 + grid[1]*400 + grid[2]
    return b 

def m3():
    a = np.zeros((n1,n2,n3), int)
    code = """
    int rows = Na[0];
    int cols = Na[1];
    int depth = Na[2];
    int val = 0;      
    for (int i=0; i<rows; i++) {
        for (int j=0; j<cols; j++) {
            for (int k=0; k<depth; k++) {
                val = (i*cols + j)*depth + k;
                a[val] = val;
            }
        }
    }"""
    weave.inline(code, ['a'])
    return a

%timeit m1()
%timeit m2()
%timeit m3()
np.all(m1() == m2())
np.all(m2() == m3())

给我：

1 loops, best of 3: 19.6 s per loop
1 loops, best of 3: 248 ms per loop
10 loops, best of 3: 144 ms per loop

这似乎很合理。如果您想进一步加快速度，您可能需要开始使用 GPU，它非常适合此类数字运算。

在这种特殊情况下，您甚至可以这样做：

def m4():
    a = np.zeros((n1,n2,n3), int)
    code = """
    int rows = Na[0];
    int cols = Na[1];
    int depth = Na[2];
    for (int i=0; i<rows*cols*depth; i++) {
        a[i] = i;
    }"""
    weave.inline(code, ['a'])
    return a

但这并没有变得更好，因为 np.zeros() 已经占用了大部分时间：

%timeit np.zeros((n1,n2,n3), int)
10 loops, best of 3: 113 ms per loop

I really hope, you didn't run the loop with all the print statements, as Justin already noted. Besides that:

from scipy import weave
n1, n2, n3 = 200, 300, 400

def m1():
    a = np.zeros((n1,n2,n3), int)
    for i in xrange(n1):
        for j in xrange(n2):
            for k in xrange(n3):
                a[i,j,k] = i*300*400 + j*400 + k
    return a

def m2():    
    grid = np.ogrid[0:n1,0:n2,0:n3]
    b = grid[0]*300*400 + grid[1]*400 + grid[2]
    return b 

def m3():
    a = np.zeros((n1,n2,n3), int)
    code = """
    int rows = Na[0];
    int cols = Na[1];
    int depth = Na[2];
    int val = 0;      
    for (int i=0; i<rows; i++) {
        for (int j=0; j<cols; j++) {
            for (int k=0; k<depth; k++) {
                val = (i*cols + j)*depth + k;
                a[val] = val;
            }
        }
    }"""
    weave.inline(code, ['a'])
    return a

%timeit m1()
%timeit m2()
%timeit m3()
np.all(m1() == m2())
np.all(m2() == m3())

Gives me:

1 loops, best of 3: 19.6 s per loop
1 loops, best of 3: 248 ms per loop
10 loops, best of 3: 144 ms per loop

Which seems to be pretty reasonable. If you want to speed it up further, you probably want to start using your GPU, which is quite perfect for number crunching like that.

In this special case, you could even do:

def m4():
    a = np.zeros((n1,n2,n3), int)
    code = """
    int rows = Na[0];
    int cols = Na[1];
    int depth = Na[2];
    for (int i=0; i<rows*cols*depth; i++) {
        a[i] = i;
    }"""
    weave.inline(code, ['a'])
    return a

But this is not getting much better anymore, since np.zeros() already takes most of the time:

%timeit np.zeros((n1,n2,n3), int)
10 loops, best of 3: 113 ms per loop

回复收藏 0 原文

~没有更多了~

关于作者

随心而道

暂无简介

0 文章

0 评论

23 人气

关注发私信

lioqio

文章 0 评论 0

关注

Single

文章 0 评论 0

关注

禾厶谷欠

文章 0 评论 0

关注

alipaysp_2zg8elfGgC

文章 0 评论 0

关注

qq_N6d4X7

文章 0 评论 0

关注

放低过去

文章 0 评论 0

友情链接

文江博客

如何加速 scipy.weave 中的多维数组访问？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

如何加速 scipy.weave 中的多维数组访问？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。