当前位置：文江博客话题详情

性能比较：插入与构建 Python 集合操作

发布于 2024-11-04 10:41:37 字数 232 浏览 3 评论 0原文

在Python中，是不是更快 a) 从 n 个项目的列表中构建一个集合 b) 将 n 个项目插入到一个集合中？

我找到了这个页面（http://wiki.python.org/moin/TimeComplexity），但它没有足够的信息来断定哪个更快。

看来，在最坏的情况下一次插入一个项目可能需要 O(n*n) 时间（假设它使用字典），而在平均情况下则需要 O(n*1) 时间。使用列表初始化集合是否可以提高性能？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

囍孤女 2024-11-11 10:41:37

就 O() 复杂度而言 - 绝对是相同的，因为两种方法的作用完全相同 - 将 n 项插入到集合中。

差异来自于实现：可迭代初始化的一个明显优点是可以节省大量 Python 级别的函数调用 - 可迭代的初始化完全在 C 级别完成 (**)。

事实上，对 5,000,000 个随机整数的列表进行的一些测试表明，逐一相加会更慢：

lst = [random.random() for i in xrange(5000000)]
set1 = set(lst)    # takes 2.4 seconds

set2 = set()       # takes 3.37 seconds
for item in lst:
    set2.add(item)

(**) 查看集合的代码 (Objects/setobject.c)，最终项目插入归结为调用 set_add_key。从可迭代对象初始化时，会在紧密的 C 循环中调用此函数：

while ((key = PyIter_Next(it)) != NULL) {
  if (set_add_key(so, key) == -1) {
    Py_DECREF(it);
    Py_DECREF(key);
    return -1;
  } 
  Py_DECREF(key);
}

另一方面，每次调用 set.add 都会调用属性查找，该查找会解析为 C set_add > 函数，该函数又调用 set_add_key。由于项目添加本身相对较快（Python 的哈希表实现非常高效），因此这些额外的调用都会累积起来。

In terms of O() complexity - it's definitely the same, because both approaches do exactly the same - insert n items into a set.

The difference comes from implementation: One clear advantage of initialization from an iterable is that you save a lot of Python-level function calls - the initialization from a iterable is done wholly on the C level (**).

Indeed, some tests on a list of 5,000,000 random integers shows that adding one by one is slower:

lst = [random.random() for i in xrange(5000000)]
set1 = set(lst)    # takes 2.4 seconds

set2 = set()       # takes 3.37 seconds
for item in lst:
    set2.add(item)

(**) Looking inside the code of sets (Objects/setobject.c), eventually item insertion boils down to a call to set_add_key. When initializing from an iterable, this function is called in a tight C loop:

while ((key = PyIter_Next(it)) != NULL) {
  if (set_add_key(so, key) == -1) {
    Py_DECREF(it);
    Py_DECREF(key);
    return -1;
  } 
  Py_DECREF(key);
}

On the other hand, each call to set.add invokes attribute lookup, which resolves to the C set_add function, which in turn calls set_add_key. Since the item addition itself is relatively quick (the hash table implementation of Python is very efficient), these extra calls all build up.

回复收藏 0 原文

爱殇璃 2024-11-11 10:41:37

$ python -V
Python 2.5.2
$ python -m timeit -s "l = range(1000)" "set(l)"
10000 loops, best of 3: 64.6 usec per loop
$ python -m timeit -s "l = range(1000)" "s = set()" "for i in l:s.add(i)"
1000 loops, best of 3: 307 usec per loop

$ python -V
Python 2.5.2
$ python -m timeit -s "l = range(1000)" "set(l)"
10000 loops, best of 3: 64.6 usec per loop
$ python -m timeit -s "l = range(1000)" "s = set()" "for i in l:s.add(i)"
1000 loops, best of 3: 307 usec per loop

回复收藏 0 原文

_蜘蛛 2024-11-11 10:41:37

在我的 Thinkpad 上：

In [37]: timeit.timeit('for a in x: y.add(a)',
                       'y=set(); x=range(10000)', number=10000)
Out[37]: 12.18006706237793

In [38]: timeit.timeit('y=set(x)', 'y=set(); x=range(10000)', number=10000)
Out[38]: 3.8137960433959961

On my Thinkpad:

In [37]: timeit.timeit('for a in x: y.add(a)',
                       'y=set(); x=range(10000)', number=10000)
Out[37]: 12.18006706237793

In [38]: timeit.timeit('y=set(x)', 'y=set(); x=range(10000)', number=10000)
Out[38]: 3.8137960433959961

回复收藏 0 原文

水晶透心 2024-11-11 10:41:37

以下是使用 timeit 运行比较的结果。似乎使用列表初始化集合更快，很想知道为什么会这样：

from timeit import timeit
timeit("set(a)","a=range(10)")
# 0.9944498532640864

timeit("for i in a:x.add(i)","a=range(10);x=set()")
# 1.6878826778265648

Python版本：2.7

Here are the results from running the comparison using timeit. Seems initialization of set using list to be faster, curious to know why it is so:

from timeit import timeit
timeit("set(a)","a=range(10)")
# 0.9944498532640864

timeit("for i in a:x.add(i)","a=range(10);x=set()")
# 1.6878826778265648

Python version: 2.7

回复收藏 0 原文

~没有更多了~