Cython 额外的输入和 numpy 数组的 cimport 会降低性能吗?
下面是我编写的两个简单的 Cython 方法。在 g_cython() 方法中,我对 numpy 数组 a 和 b 使用了额外的类型,但令人惊讶的是 g_cython() 比 g_less_cython() 慢两倍。我想知道为什么会发生这种情况?我认为添加这将使 a 和 b 上的索引更快?
附言。我知道这两个函数都可以在 numpy 中矢量化——我只是在探索 cython 优化技巧。
import numpy as np;
cimport numpy as np;
def g_cython(np.ndarray[np.int_t, ndim = 1] a, percentile):
cdef int i
cdef int n = len(a)
cdef np.ndarray[np.int_t, ndim = 1] b = np.zeros(n, dtype = 'int')
for i in xrange(n):
b[i] = np.searchsorted(percentile, a[i])
return b
def g_less_cython(a, percentile):
cdef int i
b = np.zeros_like(a)
for i in xrange(len(a)):
b[i] = np.searchsorted(percentile, a[i])
return b
我的测试用例是当 len(a) == 1000000 且 len(percentile) = 100 时
def main3():
n = 100000
a = np.random.random_integers(0,10000000,n)
per = np.linspace(0, 10000000, 101)
q = time.time()
b = g_cython(a, per)
q = time.time() - q
print q
q = time.time()
bb = g_less_cython(a, per)
q = time.time() - q
print q
Below are two simple Cython methods I wrote. In g_cython() method I used additional typing for numpy array a and b, but surprisingly g_cython() is twice slower than g_less_cython(). I wonder why is this happening? I thought adding that would make indexing on a and b much faster?
PS. I understand both functions can be vectorized in numpy -- I am just exploring cython optimization tricks.
import numpy as np;
cimport numpy as np;
def g_cython(np.ndarray[np.int_t, ndim = 1] a, percentile):
cdef int i
cdef int n = len(a)
cdef np.ndarray[np.int_t, ndim = 1] b = np.zeros(n, dtype = 'int')
for i in xrange(n):
b[i] = np.searchsorted(percentile, a[i])
return b
def g_less_cython(a, percentile):
cdef int i
b = np.zeros_like(a)
for i in xrange(len(a)):
b[i] = np.searchsorted(percentile, a[i])
return b
my test case is when len(a) == 1000000 and len(percentile) = 100
def main3():
n = 100000
a = np.random.random_integers(0,10000000,n)
per = np.linspace(0, 10000000, 101)
q = time.time()
b = g_cython(a, per)
q = time.time() - q
print q
q = time.time()
bb = g_less_cython(a, per)
q = time.time() - q
print q
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我测试了你的代码,g_cython 比 g_less_cython 稍快。
这是测试代码,
输出是:
我关闭了边界检查和环绕标志:
差异并不明显,因为 np.searchsorted(percentile, a[i]) 的调用是使用大部分 CPU 的关键部分。
I tested you code, g_cython is a slightly faster than g_less_cython.
here is the test code
the output is:
I turned off the boundscheck and wraparound flag:
The difference is not notable because the call of np.searchsorted(percentile, a[i]) is the critical part that used most of CPU.