性能比较:插入与构建 Python 集合操作
在Python中,是不是更快 a) 从 n 个项目的列表中构建一个集合 b) 将 n 个项目插入到一个集合中?
我找到了这个页面(http://wiki.python.org/moin/TimeComplexity),但它没有足够的信息来断定哪个更快。
看来,在最坏的情况下一次插入一个项目可能需要 O(n*n) 时间(假设它使用字典),而在平均情况下则需要 O(n*1) 时间。使用列表初始化集合是否可以提高性能?
In python, is it faster to
a) Build a set from a list of n items
b) Insert n items into a set?
I found this page (http://wiki.python.org/moin/TimeComplexity) but it did not have enough information to conclude which was faster.
It seems, inserting items one at a time could in the worst case take O(n*n) time (given it uses dicts), and O(n*1) in the average case. Does initializing a set with a list offer any performance improvement?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
就
O()
复杂度而言 - 绝对是相同的,因为两种方法的作用完全相同 - 将n
项插入到集合中。差异来自于实现:可迭代初始化的一个明显优点是可以节省大量 Python 级别的函数调用 - 可迭代的初始化完全在 C 级别完成 (**)。
事实上,对 5,000,000 个随机整数的列表进行的一些测试表明,逐一相加会更慢:
(**) 查看集合的代码 (
Objects/setobject.c
),最终项目插入归结为调用set_add_key
。从可迭代对象初始化时,会在紧密的 C 循环中调用此函数:另一方面,每次调用
set.add
都会调用属性查找,该查找会解析为 Cset_add
> 函数,该函数又调用set_add_key
。由于项目添加本身相对较快(Python 的哈希表实现非常高效),因此这些额外的调用都会累积起来。In terms of
O()
complexity - it's definitely the same, because both approaches do exactly the same - insertn
items into a set.The difference comes from implementation: One clear advantage of initialization from an iterable is that you save a lot of Python-level function calls - the initialization from a iterable is done wholly on the C level (**).
Indeed, some tests on a list of 5,000,000 random integers shows that adding one by one is slower:
(**) Looking inside the code of sets (
Objects/setobject.c
), eventually item insertion boils down to a call toset_add_key
. When initializing from an iterable, this function is called in a tight C loop:On the other hand, each call to
set.add
invokes attribute lookup, which resolves to the Cset_add
function, which in turn callsset_add_key
. Since the item addition itself is relatively quick (the hash table implementation of Python is very efficient), these extra calls all build up.在我的 Thinkpad 上:
On my Thinkpad:
以下是使用
timeit
运行比较的结果。似乎使用列表初始化集合更快,很想知道为什么会这样:Python版本:2.7
Here are the results from running the comparison using
timeit
. Seems initialization of set using list to be faster, curious to know why it is so:Python version: 2.7