Cython 额外的输入和 numpy 数组的 cimport 会降低性能吗?

发布于 2024-12-24 03:13:21 字数 1057 浏览 1 评论 0原文

下面是我编写的两个简单的 Cython 方法。在 g_cython() 方法中,我对 numpy 数组 a 和 b 使用了额外的类型,但令人惊讶的是 g_cython() 比 g_less_cython() 慢两倍。我想知道为什么会发生这种情况?我认为添加这将使 a 和 b 上的索引更快?

附言。我知道这两个函数都可以在 numpy 中矢量化——我只是在探索 cython 优化技巧。

import numpy as np; 
cimport numpy as np;

def g_cython(np.ndarray[np.int_t, ndim = 1] a, percentile):
    cdef int i
    cdef int n = len(a)
    cdef np.ndarray[np.int_t, ndim = 1] b = np.zeros(n, dtype = 'int')
    for i in xrange(n):
        b[i] = np.searchsorted(percentile, a[i])
    return b


def g_less_cython(a, percentile):
    cdef int i
    b = np.zeros_like(a)
    for i in xrange(len(a)):
        b[i] = np.searchsorted(percentile, a[i])
    return b

我的测试用例是当 len(a) == 1000000 且 len(percentile) = 100 时

def main3():
    n = 100000
    a = np.random.random_integers(0,10000000,n)
    per = np.linspace(0, 10000000, 101)

    q = time.time()
    b = g_cython(a, per)
    q = time.time() - q
    print q

q = time.time()
bb = g_less_cython(a, per)
q = time.time() - q
print q

Below are two simple Cython methods I wrote. In g_cython() method I used additional typing for numpy array a and b, but surprisingly g_cython() is twice slower than g_less_cython(). I wonder why is this happening? I thought adding that would make indexing on a and b much faster?

PS. I understand both functions can be vectorized in numpy -- I am just exploring cython optimization tricks.

import numpy as np; 
cimport numpy as np;

def g_cython(np.ndarray[np.int_t, ndim = 1] a, percentile):
    cdef int i
    cdef int n = len(a)
    cdef np.ndarray[np.int_t, ndim = 1] b = np.zeros(n, dtype = 'int')
    for i in xrange(n):
        b[i] = np.searchsorted(percentile, a[i])
    return b


def g_less_cython(a, percentile):
    cdef int i
    b = np.zeros_like(a)
    for i in xrange(len(a)):
        b[i] = np.searchsorted(percentile, a[i])
    return b

my test case is when len(a) == 1000000 and len(percentile) = 100

def main3():
    n = 100000
    a = np.random.random_integers(0,10000000,n)
    per = np.linspace(0, 10000000, 101)

    q = time.time()
    b = g_cython(a, per)
    q = time.time() - q
    print q

q = time.time()
bb = g_less_cython(a, per)
q = time.time() - q
print q

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

客…行舟 2024-12-31 03:13:21

我测试了你的代码,g_cython 比 g_less_cython 稍快。

这是测试代码,

import pyximport; pyximport.install()
import search_sorted
import numpy as np
import time
x = np.arange(100000, dtype=np.int32)
y = np.random.randint(0, 100000, 100000)

start = time.clock()
search_sorted.g_cython(y, x)
print time.clock() - start

start = time.clock()
search_sorted.g_less_cython(y, x)
print time.clock() - start

输出是:

0.215430514708
0.259622599945

我关闭了边界检查和环绕标志:

@cython.boundscheck(False)
@cython.wraparound(False)
def g_cython(np.ndarray[np.int_t, ndim = 1] a, percentile):
    ....

差异并不明显,因为 np.searchsorted(percentile, a[i]) 的调用是使用大部分 CPU 的关键部分。

I tested you code, g_cython is a slightly faster than g_less_cython.

here is the test code

import pyximport; pyximport.install()
import search_sorted
import numpy as np
import time
x = np.arange(100000, dtype=np.int32)
y = np.random.randint(0, 100000, 100000)

start = time.clock()
search_sorted.g_cython(y, x)
print time.clock() - start

start = time.clock()
search_sorted.g_less_cython(y, x)
print time.clock() - start

the output is:

0.215430514708
0.259622599945

I turned off the boundscheck and wraparound flag:

@cython.boundscheck(False)
@cython.wraparound(False)
def g_cython(np.ndarray[np.int_t, ndim = 1] a, percentile):
    ....

The difference is not notable because the call of np.searchsorted(percentile, a[i]) is the critical part that used most of CPU.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文