提高 Numpy 性能

发布于 2024-08-20 17:10:10 字数 814 浏览 11 评论 0原文

我想使用 python 来提高卷积的性能,并希望获得一些关于如何最好地提高性能的见解。

我目前正在使用 scipy 来执行卷积,使用的代码有点像下面的代码片段:

import numpy
import scipy
import scipy.signal
import timeit

a=numpy.array ( [ range(1000000) ] )
a.reshape(1000,1000)
filt=numpy.array( [ [ 1, 1, 1 ], [1, -8, 1], [1,1,1] ] )

def convolve():
  global a, filt
  scipy.signal.convolve2d ( a, filt, mode="same" )

t=timeit.Timer("convolve()", "from __main__ import convolve")
print "%.2f sec/pass" % (10 * t.timeit(number=10)/100)

我正在处理图像数据,使用灰度(0 到 255 之间的整数值),目前每个卷积大约需要四分之一秒。我的想法是执行以下操作之一:

使用 corepy,最好进行一些优化 使用 icc & 重新编译 numpy ikml。 使用 python-cuda。

我想知道是否有人对这些方法有任何经验(典型的增益是什么,是否值得花时间),或者是否有人知道有更好的库来使用 Numpy 执行卷积。

谢谢!

编辑:

通过使用 Numpy 用 C 语言重写 python 循环,速度提高了约 10 倍。

I'd like to improve the performance of convolution using python, and was hoping for some insight on how to best go about improving performance.

I am currently using scipy to perform the convolution, using code somewhat like the snippet below:

import numpy
import scipy
import scipy.signal
import timeit

a=numpy.array ( [ range(1000000) ] )
a.reshape(1000,1000)
filt=numpy.array( [ [ 1, 1, 1 ], [1, -8, 1], [1,1,1] ] )

def convolve():
  global a, filt
  scipy.signal.convolve2d ( a, filt, mode="same" )

t=timeit.Timer("convolve()", "from __main__ import convolve")
print "%.2f sec/pass" % (10 * t.timeit(number=10)/100)

I am processing image data, using grayscale (integer values between 0 and 255), and I currently get about a quarter of a second per convolution. My thinking was to do one of the following:

Use corepy, preferably with some optimizations
Recompile numpy with icc & ikml.
Use python-cuda.

I was wondering if anyone had any experience with any of these approaches ( what sort of gain would be typical, and if it is worth the time ), or if anyone is aware of a better library to perform convolution with Numpy.

Thanks!

EDIT:

Speed up of about 10x by re-writing python loop in C over using Numpy.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

怪我闹别瞎闹 2024-08-27 17:10:10

scipy 中用于进行 2d 卷积的代码有点混乱且未优化。请参阅http://svn.scipy.org/svn/scipy/ trunk/scipy/signal/firfilter.c 如果您想了解 scipy 的底层功能。

如果您想要的只是使用像您展示的那样的小而恒定的内核进行处理,那么这样的函数可能会起作用:

def specialconvolve(a):
    # sorry, you must pad the input yourself
    rowconvol = a[1:-1,:] + a[:-2,:] + a[2:,:]
    colconvol = rowconvol[:,1:-1] + rowconvol[:,:-2] + rowconvol[:,2:] - 9*a[1:-1,1:-1]
    return colconvol

该函数利用了上面建议的 DarenW 内核的可分离性,以及利用了更优化的内核numpy 算术例程。根据我的测量,它比 convolve2d 函数快 1000 倍以上。

The code in scipy for doing 2d convolutions is a bit messy and unoptimized. See http://svn.scipy.org/svn/scipy/trunk/scipy/signal/firfilter.c if you want a glimpse into the low-level functioning of scipy.

If all you want is to process with a small, constant kernel like the one you showed, a function like this might work:

def specialconvolve(a):
    # sorry, you must pad the input yourself
    rowconvol = a[1:-1,:] + a[:-2,:] + a[2:,:]
    colconvol = rowconvol[:,1:-1] + rowconvol[:,:-2] + rowconvol[:,2:] - 9*a[1:-1,1:-1]
    return colconvol

This function takes advantage of the separability of the kernel like DarenW suggested above, as well as taking advantage of the more optimized numpy arithmetic routines. It's over 1000 times faster than the convolve2d function by my measurements.

醉生梦死 2024-08-27 17:10:10

对于特定的示例 3x3 内核,我观察到

1  1  1
1 -8  1
1  1  1

  1  1  1     0  0  0
= 1  1  1  +  0 -9  0
  1  1  1     0  0  0

其中第一个是可分解的 - 它可以通过对每行进行卷积 (1 1 1) 进行卷积,然后对每列再次进行卷积。然后减去原始数据的九倍。这可能会更快,也可能不会更快,具体取决于 scipy 程序员是否足够聪明来自动执行此操作。 (我有一段时间没有检查了。)

您可能想要进行更有趣的卷积,其中分解可能或可能不可能。

For the particular example 3x3 kernel, I'd observe that

1  1  1
1 -8  1
1  1  1

  1  1  1     0  0  0
= 1  1  1  +  0 -9  0
  1  1  1     0  0  0

and that the first of these is factorable - it can be convoluted by convolving (1 1 1) for each row, and then again for each column. Then subtract nine times the original data. This may or may not be faster, depending on whether the scipy programmers made it smart enough to automatically do this. (I haven't checked in a while.)

You probably want to do more interesting convolutions, where factoring may or may not be possible.

淡紫姑娘! 2024-08-27 17:10:10

在谈论 C 和 ctypes 之前,我建议在 C 中运行一个独立的卷积,看看极限在哪里。
同样,对于 CUDA、cython、scipy.weave ...

添加了 7 月 7 日:带裁剪的 convolve33 8 位数据每点大约需要 20 个时钟周期,
每次内存访问需要 2 个时钟周期,在我的 mac g4 pcc 和 gcc 4.2 上。您的里程将会有所不同。

一些微妙之处:

  • 您关心正确裁剪到 0..255 吗? np.clip() 很慢,
    cython等不知道。
  • Numpy/scipy 可能需要 A 大小的临时内存(因此保持 2*sizeof(A) < 缓存大小)。
    但是,如果您的 C 代码就地执行运行更新,则内存的大小只有一半,但算法不同。

顺便说一句,谷歌 theano convolve =>
“一个应该模仿 scipy.signal.convolve2d 的卷积运算,但速度更快!正在开发中”

Before going to say C with ctypes, I'd suggest running a standalone convolve in C, to see where the limit is.
Similarly for CUDA, cython, scipy.weave ...

Added 7feb: convolve33 8-bit data with clipping takes ~ 20 clock cycles per point,
2 clock cycles per mem access, on my mac g4 pcc with gcc 4.2. Your mileage will vary.

A couple of subtleties:

  • do you care about correct clipping to 0..255 ? np.clip() is slow,
    cython etc. don't know.
  • Numpy/scipy may need memory for temps the size of A (so keep 2*sizeof(A) < cache size).
    If your C code, though, does a running update inplace, that's half the mem but a different algorithm.

By the way, google theano convolve =>
"A convolution op that should mimic scipy.signal.convolve2d, but faster! In development"

过气美图社 2024-08-27 17:10:10

卷积的典型优化是使用信号的 FFT。原因是:实空间中的卷积是FFT空间中的乘积。计算 FFT、然后计算乘积以及结果的 iFFT 通常比通常的卷积方式更快。

A typical optimization for convolution is to use the FFT of your signal. The reason is: the convolution in real space is a product in FFT space. It is often faster to compute the FFT, then the product, and the iFFT of the result than convolve the usual way.

横笛休吹塞上声 2024-08-27 17:10:10

截至 2018 年,SciPy/Numpy 组合似乎已经加快了很多。这是我在笔记本电脑(戴尔 Inspiron 13、i5)上看到的情况。
OpenCV 做得最好,但你无法控制模式。

>>> img= np.random.rand(1000,1000)
>>> kernel = np.ones((3,3), dtype=np.float)/9.0
>>> t1= time.time();dst1 = cv2.filter2D(img,-1,kernel);print(time.time()-t1)
0.0235188007355
>>> t1= time.time();dst2 = signal.correlate(img,kernel,mode='valid',method='fft');print(time.time()-t1)
0.140458106995
>>> t1= time.time();dst3 = signal.convolve2d(img,kernel,mode='valid');print(time.time()-t1)
0.0548939704895
>>> t1= time.time();dst4 = signal.correlate2d(img,kernel,mode='valid');print(time.time()-t1)
0.0518119335175
>>> t1= time.time();dst5 = signal.fftconvolve(img,kernel,mode='valid');print(time.time()-t1)
0.13204407692

As of 2018, seems like SciPy/Numpy combo has been sped up a lot. This is what I saw on my laptop (Dell Inspiron 13, i5).
OpenCV did the best but you don't have any control on modes.

>>> img= np.random.rand(1000,1000)
>>> kernel = np.ones((3,3), dtype=np.float)/9.0
>>> t1= time.time();dst1 = cv2.filter2D(img,-1,kernel);print(time.time()-t1)
0.0235188007355
>>> t1= time.time();dst2 = signal.correlate(img,kernel,mode='valid',method='fft');print(time.time()-t1)
0.140458106995
>>> t1= time.time();dst3 = signal.convolve2d(img,kernel,mode='valid');print(time.time()-t1)
0.0548939704895
>>> t1= time.time();dst4 = signal.correlate2d(img,kernel,mode='valid');print(time.time()-t1)
0.0518119335175
>>> t1= time.time();dst5 = signal.fftconvolve(img,kernel,mode='valid');print(time.time()-t1)
0.13204407692
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文