C 和 python 中插入排序的性能差异

发布于 2024-08-07 10:01:54 字数 961 浏览 2 评论 0原文

我对使用 C 和 python 进行插入排序的性能感到好奇，但我得到的结果让我思考我是否做错了什么。我怀疑 C 会更快，但也没有那么快。

我已经分析了这两个代码，插入排序函数是花费时间最多的地方。

这是 C 函数：

void
insert_sort (vec_t * vec)
{
    int j;
    for (j = 1 ; j < vec->n ; j++){
        int key = vec->v[j];
        int i = j - 1;
        while (i >= 0 && vec->v[i] > key){
            vec->v[i+1] = vec->v[i];
            i--;
        }
        vec->v[i+1] = key;
    }
}

这是 python 函数：

def insert_sort (ln):
    for j in range(1, len(ln)):
        key = ln[j]
        i = j-1
        while i >= 0 and ln[i] > key:
            ln[i+1] = ln[i]
            i-=1
        ln[i+1] = key

使用 10000 个整数进行测试，每个整数在 0 到 10000 之间随机生成。

每个函数花费的时间结果为：

C 时间：0.13 秒
python 时间：8.104 秒

我在这里做错了什么吗？就像我说的，我期望看到 C 代码更快，但并没有那么快。

我不想使用内置函数或任何东西。我想实现该算法。有没有一种可以在插入排序中使用的pythonic方法？

原文

I was curious about the performance of insert-sort using C and python but the results I've got just make me think if I've done something wrong. I suspected that C would be faster, but not that much.

I've profiled both codes and the insert-sort function is the place where the time is most spent.

Here is the C function:

void
insert_sort (vec_t * vec)
{
    int j;
    for (j = 1 ; j < vec->n ; j++){
        int key = vec->v[j];
        int i = j - 1;
        while (i >= 0 && vec->v[i] > key){
            vec->v[i+1] = vec->v[i];
            i--;
        }
        vec->v[i+1] = key;
    }
}

Here is the python function:

def insert_sort (ln):
    for j in range(1, len(ln)):
        key = ln[j]
        i = j-1
        while i >= 0 and ln[i] > key:
            ln[i+1] = ln[i]
            i-=1
        ln[i+1] = key

The test was made with 10000 integers, each one randomly generated between 0 and 10000.

The results for the time spent in each function was:

C time: 0.13 seconds
python time: 8.104 seconds

Am I doing something wrong here? Like I said, I expected to see the C code being faster, but not that faster.

I don't want to use built-in functions or whatsoever. I would like to implement the algorithm. Is there a pythonic way to doing things that I could use in the insert-sort?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

独守阴晴ぅ圆缺 2024-08-14 10:01:54

Python 是一种动态语言，标准实现使用解释器来评估代码。这意味着编译后的 C 代码可以使用单个机器指令进行转义，例如分配给 vec->v[i+1]，Python 的解释器必须从本地范围查找序列变量，查找其类，找到类上的项目设置方法，调用该方法。比较、加法也是如此。更不用说执行几乎每个字节码都会导致 CPU 中的间接分支错误预测，从而导致管道泡沫。

这种代码将从 JIT 编译到本机代码和运行时类型专门化中受益匪浅，就像 unladen-swallow 和 PyPy 开始做的那样。

否则，代码几乎是Pythonic的，因为如果需要实现插入排序，这就是在Python中实现的方式。它也非常不Pythonic，因为你应该使用非常有效的内置排序。

回复收藏 0 原文

夜无邪 2024-08-14 10:01:54

我的第一个想法是，我现在手头的笔记本电脑，Macbook Pro，必须与你的机器相当，但略好一些——我没有足够的周围代码来尝试你的 C 示例（什么是 vec_t，等等），但是运行你编码的 Python 给我的结果是：

$ python -mtimeit -s'import inso' 'inso.insort(inso.li)'
10 loops, best of 3: 7.21 msec per loop

vs 你的 8.1 秒。这就是你将代码放入 insort.py 中，前面加上：

import random
li = [random.randrange(10000) for _ in xrange(10000)]

array 没有帮助——实际上会减慢速度。然后我安装了 psyco，Python JIT 帮助程序（仅限 x86，仅限 32 位），进一步添加：

import psyco
psyco.full()

结果是：

$ python -mtimeit -s'import inso' 'inso.insort(inso.li)'
10 loops, best of 3: 207 usec per loop

加速大约为 7.21 / 0.000207 = 34830 倍——而 8.04 / 0.13 = 62 倍让你大吃一惊;-)。

当然，问题是在第一次之后，列表已经排序了，所以 insort 变得必须更快。您没有向我们提供足够的周围测试工具来准确了解您测量的内容。一个更现实的例子（其中实际列表没有被触及，所以它保持无序，只有副本被排序......），没有psyco：

$ python -mtimeit -s'import inso' 'inso.insort(list(inso.li))'
10 loops, best of 3: 13.8 sec per loop

哎呀——所以你的机器比 Macbook Pro 快得多（记住，核心不算数：我们在这里只使用一个；-）——哇……否则，你测量错误了。无论如何，使用 psyco：

$ python -mtimeit -s'import inso' 'inso.insort(list(inso.li))'
10 loops, best of 3: 456 msec per loop

所以 psyco 的加速仅为 13.8 / 0.456，30 倍——大约是纯 C 编码的 60 多倍的一半。 IOW，你会期望 python + psyco 比纯 C 慢两倍。这是一个更现实和典型的评估。

如果我们编写相当高级的代码，psyco 的加速会从（比如说）30 倍降到更低——但 C 相对于 Python 的优势也会降低。例如，

$ python -mtimeit -s'import inso' 'sorted(inso.li)'
100 loops, best of 3: 8.72 msec per loop

如果没有 psyco（在这种情况下，psyco 实际上 - 略微 - 减慢执行速度；-），因此这是比 psyco 多 52 的另一个因数，总体比非 psyco 多 1582 倍。

但是，当由于某种原因您必须用 python 编写极低级的算法，而不是使用内置函数和 stdlib 的丰富支持时，psyco 可以帮助减轻痛苦。

另一点是，当您进行基准测试时，请发布所有代码，以便其他人可以准确看到您在做什么（并可能发现陷阱） - 您的“脚手架”很棘手并且可能隐藏陷阱，作为您认为正在测量的代码！-)

My first thought was that the laptop I have at hand right now, a Macbook Pro, must be comparable to but slightly better than your machine -- I don't have enough of your surrounding code to try your C example (what's a vec_t, etc, etc), but running the Python you coded gives me:

$ python -mtimeit -s'import inso' 'inso.insort(inso.li)'
10 loops, best of 3: 7.21 msec per loop

vs your 8.1 seconds. That's with you code put in insort.py, preceded by:

import random
li = [random.randrange(10000) for _ in xrange(10000)]

array doesn't help -- actually slows things down a bit. Then I installed psyco, the Python JIT helper (x86-only, 32-bit only), further added:

import psyco
psyco.full()

and got:

$ python -mtimeit -s'import inso' 'inso.insort(inso.li)'
10 loops, best of 3: 207 usec per loop

so a speedup of about 7.21 / 0.000207 = 34830 times -- vs the 8.04 / 0.13 = 62 times that surprised you so much;-).

Of course, the problem is that after the first time, the list is already sorted, so insort becomes must faster. You didn't give us enough of the surrounding test harness to know exactly what you measured. A more realisting example (where the actual list isn't touched so it stays disordered, only a copy is sorted...), without psyco:

$ python -mtimeit -s'import inso' 'inso.insort(list(inso.li))'
10 loops, best of 3: 13.8 sec per loop

Oops -- so your machine's WAY faster than a Macbook Pro (remembers, core don't count: we're using only one here;-) -- wow... or else, you're mismeasuring. Anyway, WITH psyco:

$ python -mtimeit -s'import inso' 'inso.insort(list(inso.li))'
10 loops, best of 3: 456 msec per loop

So psyco's speedup is only 13.8 / 0.456, 30 times -- about half as much as the 60+ times you get with pure-C coding. IOW, you'd expect python + psyco to be twice as slow as pure C. That's a more realistic and typical assessment.

If you we writing reasonably high-level code, psyco's speedup of it would degrade from (say) 30 times down to much less -- but so would C's advantage over Python. For example,

$ python -mtimeit -s'import inso' 'sorted(inso.li)'
100 loops, best of 3: 8.72 msec per loop

without psyco (in this case, psyco actually -- marginally -- slows down the execution;-), so that's another factor of 52 over psyco, 1582 overall over non-psyco insort.

But, when for some reason or other you have to write extremely low-level algorithms in python, rather than using the wealth of support from the builtins and stdlib, psyco can help reduce the pain.

Another point is, when you benchmark, please post ALL code so others can see exactly what you're doing (and possibly spot gotchas) -- your "scaffolding" is as tricky and likely to hide traps, as the code you think you're measuring!-)

回复收藏 0 原文

夏日浅笑〃 2024-08-14 10:01:54

因此，您应该从中吸取一些教训：

解释型 Python 速度较慢。不要尝试用 Python 编写自己的 FFT、MPEG 编码器等。
即使是缓慢的解释型 Python 对于小问题来说也可能足够快。 8 秒的运行时间并不可怕，编写和调试 C 语言比 Python 需要更长的时间，因此，如果您编写的东西只运行一次，Python 会获胜。
为了提高 Python 的速度，请尝试依赖内置功能和 C 模块。让别人的 C 代码来完成繁重的工作。我在一个嵌入式设备上工作，我们用 Python 完成工作；尽管嵌入式处理器速度较慢，但性能还是不错的，因为 C 库模块完成了大部分工作。

为了娱乐和教育目的，请重复您的 Python 测试，这次使用列表上的内置 .sort() 方法；它可能不会像C那么快，但也很接近。（尽管对于非常大的数据集，它会击败 C，因为插入排序很糟糕。如果您重写 C 以使用 C 库 qsort() 函数，那将是速度冠军。

）常见的 Python 设计“模式”是：首先，用 Python 编写应用程序。如果速度足够快，就停下来；你完成了。其次，尝试重写，提高速度；例如，看看是否有可以使用的 C 模块。如果还是不够快，可以考虑编写自己的C模块；或者，使用 Python 原型代码作为设计的基础编写 C 程序。

回复收藏 0 原文

友欢 2024-08-14 10:01:54

你用什么方法测量时间？
做这种事，我发现python比C至少慢30倍
C 编译器可能能够使用 Python 甚至不会尝试的一些优化。

如果尝试 psyco 可能会很有趣，那么这种类型的代码非常适合它。

根据亚历克斯的回答，我尝试了 cython。在他的例子中，cython 将 for 循环和所有内容都变成了纯 C，所以现在我可以比较 C、python 和 psyco

现在我有这个 insort.py


import psyco
import random
li = [random.randrange(10000) for _ in xrange(10000)]

def insort (ln):
    for j in range(1, len(ln)):
        key = ln[j]
        i = j-1
        while i >= 0 and ln[i] > key:
            ln[i+1] = ln[i]
            i-=1
        ln[i+1] = key

#psyco.bind(insort)

import pyximport; pyximport.install()
import pyxinsort

def pyx_setup():
    pyxinsort.setup(li)

def pyx_insort():
    pyxinsort.insort(li)

和这个 pyxinsort.pyx


cdef int ln[10000]

def insort(li):
    cdef int i,j,key
    for j in range(1, len(li)):
        key = ln[j]
        i = j-1
        while i >= 0 and ln[i] > key:
            ln[i+1] = ln[i]
            i-=1
        ln[i+1] = key

def setup(li):
    cdef int i
    for i in range(1, len(li)):
        ln[i]=li[i]

insort 的代码实际上是相同的。 li 传入其长度。 ln 是已排序并通过设置预填充的数组，因此我可以将构建列表与排序分开

python

$ python2.5 -mtimeit -s'import inso' 'list(inso.li)'
10000 loops, best of 3: 84.5 usec per loop
$ python2.5 -mtimeit -s'import inso' 'inso.insort(list(inso.li))'
10 loops, best of 3: 21.9 sec per loop

psyco

$ python2.5 -mtimeit -s'import inso' 'list(inso.li)'
10000 loops, best of 3: 85.6 usec per loop
$ python2.5 -mtimeit -s'import inso' 'inso.insort(list(inso.li))'
10 loops, best of 3: 578 msec per loop

cython （这与转换为 C 并编译的算法完全相同）

$ python2.5 -mtimeit -s'import inso' 'inso.pyx_setup()'
10000 loops, best of 3: 141 usec per loop
$ python2.5 -mtimeit -s'import inso' 'inso.pyx_setup();inso.pyx_insort()'
10 loops, best of 3: 46.6 msec per loop

cython 击败 psyco是 16 倍，Python 是 470 倍！

为了完整起见，我包含了 cython 生成的相应 C 代码片段


  for (__pyx_v_j = 1; __pyx_v_j < __pyx_1; __pyx_v_j+=1) {
    __pyx_v_key = (__pyx_v_9pyxinsort_ln[__pyx_v_j]);
    __pyx_v_i = (__pyx_v_j - 1);
    while (1) {
      __pyx_2 = (__pyx_v_i >= 0);
      if (__pyx_2) {
        __pyx_2 = ((__pyx_v_9pyxinsort_ln[__pyx_v_i]) > __pyx_v_key);
      }
      if (!__pyx_2) break;
      (__pyx_v_9pyxinsort_ln[(__pyx_v_i + 1)]) = (__pyx_v_9pyxinsort_ln[__pyx_v_i]);
      __pyx_v_i -= 1;
    }
    (__pyx_v_9pyxinsort_ln[(__pyx_v_i + 1)]) = __pyx_v_key;
  }

What method did you use to measure the time?
Doing this sort of thing, I find python is at least 30 times slower than C
The C compiler may be able to use some optimisations that Python doesn't even attempt

If might be interesting to try psyco, that type of code is well suited to it.

building on Alex's answer, I tried cython. In his case cython turns the for loop and everything into pure C, so now I can compare C, python and psyco

now i have this insort.py


import psyco
import random
li = [random.randrange(10000) for _ in xrange(10000)]

def insort (ln):
    for j in range(1, len(ln)):
        key = ln[j]
        i = j-1
        while i >= 0 and ln[i] > key:
            ln[i+1] = ln[i]
            i-=1
        ln[i+1] = key

#psyco.bind(insort)

import pyximport; pyximport.install()
import pyxinsort

def pyx_setup():
    pyxinsort.setup(li)

def pyx_insort():
    pyxinsort.insort(li)

and this pyxinsort.pyx


cdef int ln[10000]

def insort(li):
    cdef int i,j,key
    for j in range(1, len(li)):
        key = ln[j]
        i = j-1
        while i >= 0 and ln[i] > key:
            ln[i+1] = ln[i]
            i-=1
        ln[i+1] = key

def setup(li):
    cdef int i
    for i in range(1, len(li)):
        ln[i]=li[i]

The code for insort is virtually identical. li is passed in for its length. ln is the array that is sorted and is prepopulated by setup, so I can isolate building the list from the sort

python

$ python2.5 -mtimeit -s'import inso' 'list(inso.li)'
10000 loops, best of 3: 84.5 usec per loop
$ python2.5 -mtimeit -s'import inso' 'inso.insort(list(inso.li))'
10 loops, best of 3: 21.9 sec per loop

psyco

$ python2.5 -mtimeit -s'import inso' 'list(inso.li)'
10000 loops, best of 3: 85.6 usec per loop
$ python2.5 -mtimeit -s'import inso' 'inso.insort(list(inso.li))'
10 loops, best of 3: 578 msec per loop

cython ( this is running exactly the same algorithm converted to C and compiled )

$ python2.5 -mtimeit -s'import inso' 'inso.pyx_setup()'
10000 loops, best of 3: 141 usec per loop
$ python2.5 -mtimeit -s'import inso' 'inso.pyx_setup();inso.pyx_insort()'
10 loops, best of 3: 46.6 msec per loop

cython beats psyco by a factor of 16 and Python by a factor of 470!

For completeness, i've included the corresponding piece of C code generated by cython


  for (__pyx_v_j = 1; __pyx_v_j < __pyx_1; __pyx_v_j+=1) {
    __pyx_v_key = (__pyx_v_9pyxinsort_ln[__pyx_v_j]);
    __pyx_v_i = (__pyx_v_j - 1);
    while (1) {
      __pyx_2 = (__pyx_v_i >= 0);
      if (__pyx_2) {
        __pyx_2 = ((__pyx_v_9pyxinsort_ln[__pyx_v_i]) > __pyx_v_key);
      }
      if (!__pyx_2) break;
      (__pyx_v_9pyxinsort_ln[(__pyx_v_i + 1)]) = (__pyx_v_9pyxinsort_ln[__pyx_v_i]);
      __pyx_v_i -= 1;
    }
    (__pyx_v_9pyxinsort_ln[(__pyx_v_i + 1)]) = __pyx_v_key;
  }

回复收藏 0 原文